This is an archive of the discontinued LLVM Phabricator instance.

[AMDGPU] gfx1010 wave32 icmp/fcmp intrinsic changes for wave32
ClosedPublic

Authored by rampitec on Jun 13 2019, 3:06 PM.

Download Raw Diff

Details

Reviewers

kzhuravl
msearles
arsenm

Commits

rG68a2fef9ae5b: [AMDGPU] gfx1010 wave32 icmp/fcmp intrinsic changes for wave32
rL363339: [AMDGPU] gfx1010 wave32 icmp/fcmp intrinsic changes for wave32

Summary

Another independent part of wave32 support to reduce D63204

Diff Detail

Event Timeline

rampitec created this revision.Jun 13 2019, 3:06 PM

Herald added subscribers: t-tye, tpr, dstuttard and 5 others. · View Herald TranscriptJun 13 2019, 3:06 PM

arsenm added inline comments.Jun 13 2019, 3:13 PM

lib/Target/AMDGPU/AMDGPUAtomicOptimizer.cpp
252	This one should query the wavesize?
lib/Target/AMDGPU/SIInstructions.td
606–614	I don't think these should ever get here? These should have been turned into AMDGPUSsetcc

rampitec marked 2 inline comments as done.Jun 13 2019, 3:19 PM

rampitec added inline comments.

lib/Target/AMDGPU/AMDGPUAtomicOptimizer.cpp
252	They actually don't, they are anyint and 64 fits it.
lib/Target/AMDGPU/SIInstructions.td
606–614	That's an inplace replacement. We can explore if it still used or not, but keeping just wave64 version is clearly wrong.

LGTM

This revision is now accepted and ready to land.Jun 13 2019, 3:51 PM

Closed by commit rL363339: [AMDGPU] gfx1010 wave32 icmp/fcmp intrinsic changes for wave32 (authored by rampitec). · Explain WhyJun 13 2019, 4:44 PM

This revision was automatically updated to reflect the committed changes.

Herald added a project: Restricted Project. · View Herald TranscriptJun 13 2019, 4:44 PM

HI,

This change introduces a bunch of failures with CTS on GFX8/GFX9. Looks like the IR validation fail now, see below:

$ RADV_DEBUG=checkir,nocache ./deqp-vk --deqp-case=dEQP-VK.glsl.derivate.dfdxfine.dynamic_loop.vec2_highp
Writing test log into TestResults.qpa
dEQP Core git-c2a38f956feeb2e141c76829c6ecb41c9f68d253 (0xc2a38f95) starting..

target implementation = 'Default'

Test case 'dEQP-VK.glsl.derivate.dfdxfine.dynamic_loop.vec2_highp'..
Intrinsic name not mangled correctly for type arguments! Should be: llvm.amdgcn.icmp.i64.i32
i64 (i32, i32, i32)* @llvm.amdgcn.icmp.i32
in function main
LLVM ERROR: Broken function found, compilation aborted!

Can you investigate please?
Thanks!

In D63301#1543215, @hakzsam wrote:
HI,

This change introduces a bunch of failures with CTS on GFX8/GFX9. Looks like the IR validation fail now, see below:

$ RADV_DEBUG=checkir,nocache ./deqp-vk --deqp-case=dEQP-VK.glsl.derivate.dfdxfine.dynamic_loop.vec2_highp
Writing test log into TestResults.qpa
dEQP Core git-c2a38f956feeb2e141c76829c6ecb41c9f68d253 (0xc2a38f95) starting..
target implementation = 'Default'
Test case 'dEQP-VK.glsl.derivate.dfdxfine.dynamic_loop.vec2_highp'..
Intrinsic name not mangled correctly for type arguments! Should be: llvm.amdgcn.icmp.i64.i32
i64 (i32, i32, i32)* @llvm.amdgcn.icmp.i32
in function main
LLVM ERROR: Broken function found, compilation aborted!

Can you investigate please?
Thanks!

Intrinsic mangling has changed. Normally llvm can handle it gracefully. How do you load a module? Can you attach a testcase please? I assume autoupgrade code may be needed, but we need to understand the workflow of RADV.

Do we need to update the intrinsic name from mesa?
Btw, it's not related to RADV, RadeonSI is probably affected too because the LLVM backend is common.

Sorry for the noise.
Should be fixed with https://patchwork.freedesktop.org/patch/310419/?series=62097&rev=1

In D63301#1543248, @hakzsam wrote:

Sorry for the noise.
Should be fixed with https://patchwork.freedesktop.org/patch/310419/?series=62097&rev=1

Yes, thanks! If module with old name is created on the fly it has no chance to upgrade. Thus clang was updated and Mesa too.

Revision Contents

Path

Size

include/

llvm/

IR/

IntrinsicsAMDGPU.td

22 lines

lib/

Target/

AMDGPU/

AMDGPUAtomicOptimizer.cpp

3 lines

AMDGPUSubtarget.h

4 lines

SIAnnotateControlFlow.cpp

44 lines

SIISelLowering.cpp

32 lines

SIInstructions.td

7 lines

Transforms/

InstCombine/

InstCombineCalls.cpp

4 lines

test/

CodeGen/

AMDGPU/

diverge-switch-default.ll

4 lines

loop_break.ll

36 lines

multi-divergent-exit-region.ll

66 lines

multilevel-break.ll

8 lines

nested-loop-conditions.ll

32 lines

si-annotate-cf-unreachable.ll

4 lines

si-annotatecfg-multiple-backedges.ll

12 lines

Transforms/

InstCombine/

AMDGPU/

amdgcn-intrinsics.ll

274 lines

Verifier/

AMDGPU/

intrinsic-immarg.ll

12 lines

Diff 204635

include/llvm/IR/IntrinsicsAMDGPU.td

Show First 20 Lines • Show All 181 Lines • ▼ Show 20 Lines
// Set EXEC according to a thread count packed in an SGPR input:		// Set EXEC according to a thread count packed in an SGPR input:
// thread_count = (input >> bitoffset) & 0x7f;		// thread_count = (input >> bitoffset) & 0x7f;
// This is always moved to the beginning of the basic block.		// This is always moved to the beginning of the basic block.
def int_amdgcn_init_exec_from_input : Intrinsic<[],		def int_amdgcn_init_exec_from_input : Intrinsic<[],
[llvm_i32_ty, // 32-bit SGPR input		[llvm_i32_ty, // 32-bit SGPR input
llvm_i32_ty], // bit offset of the thread count		llvm_i32_ty], // bit offset of the thread count
[IntrConvergent]>;		[IntrConvergent]>;

		def int_amdgcn_wavefrontsize :
		GCCBuiltin<"__builtin_amdgcn_wavefrontsize">,
		Intrinsic<[llvm_i32_ty], [], [IntrNoMem, IntrSpeculatable]>;


//===----------------------------------------------------------------------===//		//===----------------------------------------------------------------------===//
// Instruction Intrinsics		// Instruction Intrinsics
//===----------------------------------------------------------------------===//		//===----------------------------------------------------------------------===//

// The first parameter is s_sendmsg immediate (i16),		// The first parameter is s_sendmsg immediate (i16),
// the second one is copied to m0		// the second one is copied to m0
def int_amdgcn_s_sendmsg : GCCBuiltin<"__builtin_amdgcn_s_sendmsg">,		def int_amdgcn_s_sendmsg : GCCBuiltin<"__builtin_amdgcn_s_sendmsg">,
▲ Show 20 Lines • Show All 1,099 Lines • ▼ Show 20 Lines

def int_amdgcn_cvt_pk_u8_f32 :		def int_amdgcn_cvt_pk_u8_f32 :
GCCBuiltin<"__builtin_amdgcn_cvt_pk_u8_f32">,		GCCBuiltin<"__builtin_amdgcn_cvt_pk_u8_f32">,
Intrinsic<[llvm_i32_ty], [llvm_float_ty, llvm_i32_ty, llvm_i32_ty],		Intrinsic<[llvm_i32_ty], [llvm_float_ty, llvm_i32_ty, llvm_i32_ty],
[IntrNoMem, IntrSpeculatable]		[IntrNoMem, IntrSpeculatable]
>;		>;

def int_amdgcn_icmp :		def int_amdgcn_icmp :
Intrinsic<[llvm_i64_ty], [llvm_anyint_ty, LLVMMatchType<0>, llvm_i32_ty],		Intrinsic<[llvm_anyint_ty], [llvm_anyint_ty, LLVMMatchType<1>, llvm_i32_ty],
[IntrNoMem, IntrConvergent, ImmArg<2>]>;		[IntrNoMem, IntrConvergent, ImmArg<2>]>;

def int_amdgcn_fcmp :		def int_amdgcn_fcmp :
Intrinsic<[llvm_i64_ty], [llvm_anyfloat_ty, LLVMMatchType<0>, llvm_i32_ty],		Intrinsic<[llvm_anyint_ty], [llvm_anyfloat_ty, LLVMMatchType<1>, llvm_i32_ty],
[IntrNoMem, IntrConvergent, ImmArg<2>]>;		[IntrNoMem, IntrConvergent, ImmArg<2>]>;

def int_amdgcn_readfirstlane :		def int_amdgcn_readfirstlane :
GCCBuiltin<"__builtin_amdgcn_readfirstlane">,		GCCBuiltin<"__builtin_amdgcn_readfirstlane">,
Intrinsic<[llvm_i32_ty], [llvm_i32_ty], [IntrNoMem, IntrConvergent]>;		Intrinsic<[llvm_i32_ty], [llvm_i32_ty], [IntrNoMem, IntrConvergent]>;

// The lane argument must be uniform across the currently active threads of the		// The lane argument must be uniform across the currently active threads of the
// current wave. Otherwise, the result is undefined.		// current wave. Otherwise, the result is undefined.
▲ Show 20 Lines • Show All 253 Lines • ▼ Show 20 Lines	Intrinsic<
],		],
[IntrNoMem, IntrSpeculatable, ImmArg<3>]		[IntrNoMem, IntrSpeculatable, ImmArg<3>]
>;		>;

//===----------------------------------------------------------------------===//		//===----------------------------------------------------------------------===//
// Special Intrinsics for backend internal use only. No frontend		// Special Intrinsics for backend internal use only. No frontend
// should emit calls to these.		// should emit calls to these.
// ===----------------------------------------------------------------------===//		// ===----------------------------------------------------------------------===//
def int_amdgcn_if : Intrinsic<[llvm_i1_ty, llvm_i64_ty],		def int_amdgcn_if : Intrinsic<[llvm_i1_ty, llvm_anyint_ty],
[llvm_i1_ty], [IntrConvergent]		[llvm_i1_ty], [IntrConvergent]
>;		>;

def int_amdgcn_else : Intrinsic<[llvm_i1_ty, llvm_i64_ty],		def int_amdgcn_else : Intrinsic<[llvm_i1_ty, llvm_anyint_ty],
[llvm_i64_ty], [IntrConvergent]		[llvm_anyint_ty], [IntrConvergent]
>;		>;

def int_amdgcn_if_break : Intrinsic<[llvm_i64_ty],		def int_amdgcn_if_break : Intrinsic<[llvm_anyint_ty],
[llvm_i1_ty, llvm_i64_ty], [IntrNoMem, IntrConvergent]		[llvm_i1_ty, llvm_anyint_ty], [IntrNoMem, IntrConvergent]
>;		>;

def int_amdgcn_loop : Intrinsic<[llvm_i1_ty],		def int_amdgcn_loop : Intrinsic<[llvm_i1_ty],
[llvm_i64_ty], [IntrConvergent]		[llvm_anyint_ty], [IntrConvergent]
>;		>;

def int_amdgcn_end_cf : Intrinsic<[], [llvm_i64_ty], [IntrConvergent]>;		def int_amdgcn_end_cf : Intrinsic<[], [llvm_anyint_ty], [IntrConvergent]>;

// Represent unreachable in a divergent region.		// Represent unreachable in a divergent region.
def int_amdgcn_unreachable : Intrinsic<[], [], [IntrConvergent]>;		def int_amdgcn_unreachable : Intrinsic<[], [], [IntrConvergent]>;

// Emit 2.5 ulp, no denormal division. Should only be inserted by		// Emit 2.5 ulp, no denormal division. Should only be inserted by
// pass based on !fpmath metadata.		// pass based on !fpmath metadata.
def int_amdgcn_fdiv_fast : Intrinsic<		def int_amdgcn_fdiv_fast : Intrinsic<
[llvm_float_ty], [llvm_float_ty, llvm_float_ty],		[llvm_float_ty], [llvm_float_ty, llvm_float_ty],
[IntrNoMem, IntrSpeculatable]		[IntrNoMem, IntrSpeculatable]
>;		>;
}		}

lib/Target/AMDGPU/AMDGPUAtomicOptimizer.cpp

Show First 20 Lines • Show All 243 Lines • ▼ Show 20 Lines	void AMDGPUAtomicOptimizer::optimizeAtomic(Instruction &I,

// This is the value in the atomic operation we need to combine in order to		// This is the value in the atomic operation we need to combine in order to
// reduce the number of atomic operations.		// reduce the number of atomic operations.
Value *const V = I.getOperand(ValIdx);		Value *const V = I.getOperand(ValIdx);

// We need to know how many lanes are active within the wavefront, and we do		// We need to know how many lanes are active within the wavefront, and we do
// this by doing a ballot of active lanes.		// this by doing a ballot of active lanes.
CallInst *const Ballot =		CallInst *const Ballot =
B.CreateIntrinsic(Intrinsic::amdgcn_icmp, {B.getInt32Ty()},		B.CreateIntrinsic(Intrinsic::amdgcn_icmp,
		arsenmUnsubmitted Not Done Reply Inline Actions This one should query the wavesize? arsenm: This one should query the wavesize?
		rampitecAuthorUnsubmitted Done Reply Inline Actions They actually don't, they are anyint and 64 fits it. rampitec: They actually don't, they are anyint and 64 fits it.
		{B.getInt64Ty(), B.getInt32Ty()},
{B.getInt32(1), B.getInt32(0), B.getInt32(33)});		{B.getInt32(1), B.getInt32(0), B.getInt32(33)});

// We need to know how many lanes are active within the wavefront that are		// We need to know how many lanes are active within the wavefront that are
// below us. If we counted each lane linearly starting from 0, a lane is		// below us. If we counted each lane linearly starting from 0, a lane is
// below us only if its associated index was less than ours. We do this by		// below us only if its associated index was less than ours. We do this by
// using the mbcnt intrinsic.		// using the mbcnt intrinsic.
Value *const BitCast = B.CreateBitCast(Ballot, VecTy);		Value *const BitCast = B.CreateBitCast(Ballot, VecTy);
Value *const ExtractLo = B.CreateExtractElement(BitCast, B.getInt32(0));		Value *const ExtractLo = B.CreateExtractElement(BitCast, B.getInt32(0));
▲ Show 20 Lines • Show All 178 Lines • Show Last 20 Lines

lib/Target/AMDGPU/AMDGPUSubtarget.h

Show First 20 Lines • Show All 1,022 Lines • ▼ Show 20 Lines	public:
/// subtarget's specifications, or does not meet number of waves per execution		/// subtarget's specifications, or does not meet number of waves per execution
/// unit requirement.		/// unit requirement.
unsigned getMaxNumVGPRs(const MachineFunction &MF) const;		unsigned getMaxNumVGPRs(const MachineFunction &MF) const;

void getPostRAMutations(		void getPostRAMutations(
std::vector<std::unique_ptr<ScheduleDAGMutation>> &Mutations)		std::vector<std::unique_ptr<ScheduleDAGMutation>> &Mutations)
const override;		const override;

		bool isWave32() const {
		return WavefrontSize == 32;
		}

/// \returns Maximum number of work groups per compute unit supported by the		/// \returns Maximum number of work groups per compute unit supported by the
/// subtarget and limited by given \p FlatWorkGroupSize.		/// subtarget and limited by given \p FlatWorkGroupSize.
unsigned getMaxWorkGroupsPerCU(unsigned FlatWorkGroupSize) const override {		unsigned getMaxWorkGroupsPerCU(unsigned FlatWorkGroupSize) const override {
return AMDGPU::IsaInfo::getMaxWorkGroupsPerCU(this, FlatWorkGroupSize);		return AMDGPU::IsaInfo::getMaxWorkGroupsPerCU(this, FlatWorkGroupSize);
}		}

/// \returns Minimum flat work group size supported by the subtarget.		/// \returns Minimum flat work group size supported by the subtarget.
unsigned getMinFlatWorkGroupSize() const override {		unsigned getMinFlatWorkGroupSize() const override {
▲ Show 20 Lines • Show All 161 Lines • Show Last 20 Lines

lib/Target/AMDGPU/SIAnnotateControlFlow.cpp

//===- SIAnnotateControlFlow.cpp ------------------------------------------===//		//===- SIAnnotateControlFlow.cpp ------------------------------------------===//
//		//
// Part of the LLVM Project, under the Apache License v2.0 with LLVM Exceptions.		// Part of the LLVM Project, under the Apache License v2.0 with LLVM Exceptions.
// See https://llvm.org/LICENSE.txt for license information.		// See https://llvm.org/LICENSE.txt for license information.
// SPDX-License-Identifier: Apache-2.0 WITH LLVM-exception		// SPDX-License-Identifier: Apache-2.0 WITH LLVM-exception
//		//
//===----------------------------------------------------------------------===//		//===----------------------------------------------------------------------===//
//		//
/// \file		/// \file
/// Annotates the control flow with hardware specific intrinsics.		/// Annotates the control flow with hardware specific intrinsics.
//		//
//===----------------------------------------------------------------------===//		//===----------------------------------------------------------------------===//

#include "AMDGPU.h"		#include "AMDGPU.h"
		#include "AMDGPUSubtarget.h"
#include "llvm/ADT/DepthFirstIterator.h"		#include "llvm/ADT/DepthFirstIterator.h"
#include "llvm/ADT/STLExtras.h"		#include "llvm/ADT/STLExtras.h"
#include "llvm/ADT/SmallVector.h"		#include "llvm/ADT/SmallVector.h"
#include "llvm/Analysis/LegacyDivergenceAnalysis.h"		#include "llvm/Analysis/LegacyDivergenceAnalysis.h"
#include "llvm/Analysis/LoopInfo.h"		#include "llvm/Analysis/LoopInfo.h"
		#include "llvm/CodeGen/TargetPassConfig.h"
#include "llvm/IR/BasicBlock.h"		#include "llvm/IR/BasicBlock.h"
#include "llvm/IR/CFG.h"		#include "llvm/IR/CFG.h"
#include "llvm/IR/Constant.h"		#include "llvm/IR/Constant.h"
#include "llvm/IR/Constants.h"		#include "llvm/IR/Constants.h"
#include "llvm/IR/DerivedTypes.h"		#include "llvm/IR/DerivedTypes.h"
#include "llvm/IR/Dominators.h"		#include "llvm/IR/Dominators.h"
#include "llvm/IR/Function.h"		#include "llvm/IR/Function.h"
#include "llvm/IR/Instruction.h"		#include "llvm/IR/Instruction.h"
Show All 22 Lines
using StackEntry = std::pair<BasicBlock , Value >;		using StackEntry = std::pair<BasicBlock , Value >;
using StackVector = SmallVector<StackEntry, 16>;		using StackVector = SmallVector<StackEntry, 16>;

class SIAnnotateControlFlow : public FunctionPass {		class SIAnnotateControlFlow : public FunctionPass {
LegacyDivergenceAnalysis *DA;		LegacyDivergenceAnalysis *DA;

Type *Boolean;		Type *Boolean;
Type *Void;		Type *Void;
Type *Int64;		Type *IntMask;
Type *ReturnStruct;		Type *ReturnStruct;

ConstantInt *BoolTrue;		ConstantInt *BoolTrue;
ConstantInt *BoolFalse;		ConstantInt *BoolFalse;
UndefValue *BoolUndef;		UndefValue *BoolUndef;
Constant *Int64Zero;		Constant *IntMaskZero;

Function *If;		Function *If;
Function *Else;		Function *Else;
Function *IfBreak;		Function *IfBreak;
Function *Loop;		Function *Loop;
Function *EndCf;		Function *EndCf;

DominatorTree *DT;		DominatorTree *DT;
StackVector Stack;		StackVector Stack;

LoopInfo *LI;		LoopInfo *LI;

		void initialize(Module &M, const GCNSubtarget &ST);

bool isUniform(BranchInst *T);		bool isUniform(BranchInst *T);

bool isTopOfStack(BasicBlock *BB);		bool isTopOfStack(BasicBlock *BB);

Value *popSaved();		Value *popSaved();

void push(BasicBlock BB, Value Saved);		void push(BasicBlock BB, Value Saved);

Show All 13 Lines	class SIAnnotateControlFlow : public FunctionPass {

void closeControlFlow(BasicBlock *BB);		void closeControlFlow(BasicBlock *BB);

public:		public:
static char ID;		static char ID;

SIAnnotateControlFlow() : FunctionPass(ID) {}		SIAnnotateControlFlow() : FunctionPass(ID) {}

bool doInitialization(Module &M) override;

bool runOnFunction(Function &F) override;		bool runOnFunction(Function &F) override;

StringRef getPassName() const override { return "SI annotate control flow"; }		StringRef getPassName() const override { return "SI annotate control flow"; }

void getAnalysisUsage(AnalysisUsage &AU) const override {		void getAnalysisUsage(AnalysisUsage &AU) const override {
AU.addRequired<LoopInfoWrapperPass>();		AU.addRequired<LoopInfoWrapperPass>();
AU.addRequired<DominatorTreeWrapperPass>();		AU.addRequired<DominatorTreeWrapperPass>();
AU.addRequired<LegacyDivergenceAnalysis>();		AU.addRequired<LegacyDivergenceAnalysis>();
AU.addPreserved<DominatorTreeWrapperPass>();		AU.addPreserved<DominatorTreeWrapperPass>();
		AU.addRequired<TargetPassConfig>();
FunctionPass::getAnalysisUsage(AU);		FunctionPass::getAnalysisUsage(AU);
}		}
};		};

} // end anonymous namespace		} // end anonymous namespace

INITIALIZE_PASS_BEGIN(SIAnnotateControlFlow, DEBUG_TYPE,		INITIALIZE_PASS_BEGIN(SIAnnotateControlFlow, DEBUG_TYPE,
"Annotate SI Control Flow", false, false)		"Annotate SI Control Flow", false, false)
INITIALIZE_PASS_DEPENDENCY(DominatorTreeWrapperPass)		INITIALIZE_PASS_DEPENDENCY(DominatorTreeWrapperPass)
INITIALIZE_PASS_DEPENDENCY(LegacyDivergenceAnalysis)		INITIALIZE_PASS_DEPENDENCY(LegacyDivergenceAnalysis)
		INITIALIZE_PASS_DEPENDENCY(TargetPassConfig)
INITIALIZE_PASS_END(SIAnnotateControlFlow, DEBUG_TYPE,		INITIALIZE_PASS_END(SIAnnotateControlFlow, DEBUG_TYPE,
"Annotate SI Control Flow", false, false)		"Annotate SI Control Flow", false, false)

char SIAnnotateControlFlow::ID = 0;		char SIAnnotateControlFlow::ID = 0;

/// Initialize all the types and constants used in the pass		/// Initialize all the types and constants used in the pass
bool SIAnnotateControlFlow::doInitialization(Module &M) {		void SIAnnotateControlFlow::initialize(Module &M, const GCNSubtarget &ST) {
LLVMContext &Context = M.getContext();		LLVMContext &Context = M.getContext();

Void = Type::getVoidTy(Context);		Void = Type::getVoidTy(Context);
Boolean = Type::getInt1Ty(Context);		Boolean = Type::getInt1Ty(Context);
Int64 = Type::getInt64Ty(Context);		IntMask = ST.isWave32() ? Type::getInt32Ty(Context)
ReturnStruct = StructType::get(Boolean, Int64);		: Type::getInt64Ty(Context);
		ReturnStruct = StructType::get(Boolean, IntMask);

BoolTrue = ConstantInt::getTrue(Context);		BoolTrue = ConstantInt::getTrue(Context);
BoolFalse = ConstantInt::getFalse(Context);		BoolFalse = ConstantInt::getFalse(Context);
BoolUndef = UndefValue::get(Boolean);		BoolUndef = UndefValue::get(Boolean);
Int64Zero = ConstantInt::get(Int64, 0);		IntMaskZero = ConstantInt::get(IntMask, 0);

If = Intrinsic::getDeclaration(&M, Intrinsic::amdgcn_if);		If = Intrinsic::getDeclaration(&M, Intrinsic::amdgcn_if, { IntMask });
Else = Intrinsic::getDeclaration(&M, Intrinsic::amdgcn_else);		Else = Intrinsic::getDeclaration(&M, Intrinsic::amdgcn_else,
IfBreak = Intrinsic::getDeclaration(&M, Intrinsic::amdgcn_if_break);		{ IntMask, IntMask });
Loop = Intrinsic::getDeclaration(&M, Intrinsic::amdgcn_loop);		IfBreak = Intrinsic::getDeclaration(&M, Intrinsic::amdgcn_if_break,
EndCf = Intrinsic::getDeclaration(&M, Intrinsic::amdgcn_end_cf);		{ IntMask, IntMask });
return false;		Loop = Intrinsic::getDeclaration(&M, Intrinsic::amdgcn_loop, { IntMask });
		EndCf = Intrinsic::getDeclaration(&M, Intrinsic::amdgcn_end_cf, { IntMask });
}		}

/// Is the branch condition uniform or did the StructurizeCFG pass		/// Is the branch condition uniform or did the StructurizeCFG pass
/// consider it as such?		/// consider it as such?
bool SIAnnotateControlFlow::isUniform(BranchInst *T) {		bool SIAnnotateControlFlow::isUniform(BranchInst *T) {
return DA->isUniform(T) \|\|		return DA->isUniform(T) \|\|
T->getMetadata("structurizecfg.uniform") != nullptr;		T->getMetadata("structurizecfg.uniform") != nullptr;
}		}
▲ Show 20 Lines • Show All 93 Lines • ▼ Show 20 Lines	if (isUniform(Term))
return;		return;

BasicBlock *BB = Term->getParent();		BasicBlock *BB = Term->getParent();
llvm::Loop *L = LI->getLoopFor(BB);		llvm::Loop *L = LI->getLoopFor(BB);
if (!L)		if (!L)
return;		return;

BasicBlock *Target = Term->getSuccessor(1);		BasicBlock *Target = Term->getSuccessor(1);
PHINode *Broken = PHINode::Create(Int64, 0, "phi.broken", &Target->front());		PHINode *Broken = PHINode::Create(IntMask, 0, "phi.broken", &Target->front());

Value *Cond = Term->getCondition();		Value *Cond = Term->getCondition();
Term->setCondition(BoolTrue);		Term->setCondition(BoolTrue);
Value *Arg = handleLoopCondition(Cond, Broken, L, Term);		Value *Arg = handleLoopCondition(Cond, Broken, L, Term);

for (BasicBlock *Pred : predecessors(Target)) {		for (BasicBlock *Pred : predecessors(Target)) {
Value *PHIValue = Int64Zero;		Value *PHIValue = IntMaskZero;
if (Pred == BB) // Remember the value of the previous iteration.		if (Pred == BB) // Remember the value of the previous iteration.
PHIValue = Arg;		PHIValue = Arg;
// If the backedge from Pred to Target could be executed before the exit		// If the backedge from Pred to Target could be executed before the exit
// of the loop at BB, it should not reset or change "Broken", which keeps		// of the loop at BB, it should not reset or change "Broken", which keeps
// track of the number of threads exited the loop at BB.		// track of the number of threads exited the loop at BB.
else if (L->contains(Pred) && DT->dominates(Pred, BB))		else if (L->contains(Pred) && DT->dominates(Pred, BB))
PHIValue = Broken;		PHIValue = Broken;
Broken->addIncoming(PHIValue, Pred);		Broken->addIncoming(PHIValue, Pred);
Show All 34 Lines
}		}

/// Annotate the control flow with intrinsics so the backend can		/// Annotate the control flow with intrinsics so the backend can
/// recognize if/then/else and loops.		/// recognize if/then/else and loops.
bool SIAnnotateControlFlow::runOnFunction(Function &F) {		bool SIAnnotateControlFlow::runOnFunction(Function &F) {
DT = &getAnalysis<DominatorTreeWrapperPass>().getDomTree();		DT = &getAnalysis<DominatorTreeWrapperPass>().getDomTree();
LI = &getAnalysis<LoopInfoWrapperPass>().getLoopInfo();		LI = &getAnalysis<LoopInfoWrapperPass>().getLoopInfo();
DA = &getAnalysis<LegacyDivergenceAnalysis>();		DA = &getAnalysis<LegacyDivergenceAnalysis>();
		TargetPassConfig &TPC = getAnalysis<TargetPassConfig>();
		const TargetMachine &TM = TPC.getTM<TargetMachine>();

		initialize(*F.getParent(), TM.getSubtarget<GCNSubtarget>(F));

for (df_iterator<BasicBlock *> I = df_begin(&F.getEntryBlock()),		for (df_iterator<BasicBlock *> I = df_begin(&F.getEntryBlock()),
E = df_end(&F.getEntryBlock()); I != E; ++I) {		E = df_end(&F.getEntryBlock()); I != E; ++I) {
BasicBlock BB = I;		BasicBlock BB = I;
BranchInst *Term = dyn_cast<BranchInst>(BB->getTerminator());		BranchInst *Term = dyn_cast<BranchInst>(BB->getTerminator());

if (!Term \|\| Term->isUnconditional()) {		if (!Term \|\| Term->isUnconditional()) {
if (isTopOfStack(BB))		if (isTopOfStack(BB))
Show All 39 Lines

lib/Target/AMDGPU/SIISelLowering.cpp

This file is larger than 256 KB, so syntax highlighting is disabled by default.

Show First 20 Lines • Show All 3,833 Lines • ▼ Show 20 Lines	static SDValue lowerICMPIntrinsic(const SITargetLowering &TLI,
const auto *CD = cast<ConstantSDNode>(N->getOperand(3));		const auto *CD = cast<ConstantSDNode>(N->getOperand(3));
int CondCode = CD->getSExtValue();		int CondCode = CD->getSExtValue();
if (CondCode < ICmpInst::Predicate::FIRST_ICMP_PREDICATE \|\|		if (CondCode < ICmpInst::Predicate::FIRST_ICMP_PREDICATE \|\|
CondCode > ICmpInst::Predicate::LAST_ICMP_PREDICATE)		CondCode > ICmpInst::Predicate::LAST_ICMP_PREDICATE)
return DAG.getUNDEF(VT);		return DAG.getUNDEF(VT);

ICmpInst::Predicate IcInput = static_cast<ICmpInst::Predicate>(CondCode);		ICmpInst::Predicate IcInput = static_cast<ICmpInst::Predicate>(CondCode);


SDValue LHS = N->getOperand(1);		SDValue LHS = N->getOperand(1);
SDValue RHS = N->getOperand(2);		SDValue RHS = N->getOperand(2);

SDLoc DL(N);		SDLoc DL(N);

EVT CmpVT = LHS.getValueType();		EVT CmpVT = LHS.getValueType();
if (CmpVT == MVT::i16 && !TLI.isTypeLegal(MVT::i16)) {		if (CmpVT == MVT::i16 && !TLI.isTypeLegal(MVT::i16)) {
unsigned PromoteOp = ICmpInst::isSigned(IcInput) ?		unsigned PromoteOp = ICmpInst::isSigned(IcInput) ?
ISD::SIGN_EXTEND : ISD::ZERO_EXTEND;		ISD::SIGN_EXTEND : ISD::ZERO_EXTEND;
LHS = DAG.getNode(PromoteOp, DL, MVT::i32, LHS);		LHS = DAG.getNode(PromoteOp, DL, MVT::i32, LHS);
RHS = DAG.getNode(PromoteOp, DL, MVT::i32, RHS);		RHS = DAG.getNode(PromoteOp, DL, MVT::i32, RHS);
}		}

ISD::CondCode CCOpcode = getICmpCondCode(IcInput);		ISD::CondCode CCOpcode = getICmpCondCode(IcInput);

return DAG.getNode(AMDGPUISD::SETCC, DL, VT, LHS, RHS,		unsigned WavefrontSize = TLI.getSubtarget()->getWavefrontSize();
		EVT CCVT = EVT::getIntegerVT(*DAG.getContext(), WavefrontSize);

		SDValue SetCC = DAG.getNode(AMDGPUISD::SETCC, DL, CCVT, LHS, RHS,
DAG.getCondCode(CCOpcode));		DAG.getCondCode(CCOpcode));
		if (VT.bitsEq(CCVT))
		return SetCC;
		return DAG.getZExtOrTrunc(SetCC, DL, VT);
}		}

static SDValue lowerFCMPIntrinsic(const SITargetLowering &TLI,		static SDValue lowerFCMPIntrinsic(const SITargetLowering &TLI,
SDNode *N, SelectionDAG &DAG) {		SDNode *N, SelectionDAG &DAG) {
EVT VT = N->getValueType(0);		EVT VT = N->getValueType(0);
const auto *CD = cast<ConstantSDNode>(N->getOperand(3));		const auto *CD = cast<ConstantSDNode>(N->getOperand(3));

int CondCode = CD->getSExtValue();		int CondCode = CD->getSExtValue();
Show All 9 Lines	static SDValue lowerFCMPIntrinsic(const SITargetLowering &TLI,

if (CmpVT == MVT::f16 && !TLI.isTypeLegal(CmpVT)) {		if (CmpVT == MVT::f16 && !TLI.isTypeLegal(CmpVT)) {
Src0 = DAG.getNode(ISD::FP_EXTEND, SL, MVT::f32, Src0);		Src0 = DAG.getNode(ISD::FP_EXTEND, SL, MVT::f32, Src0);
Src1 = DAG.getNode(ISD::FP_EXTEND, SL, MVT::f32, Src1);		Src1 = DAG.getNode(ISD::FP_EXTEND, SL, MVT::f32, Src1);
}		}

FCmpInst::Predicate IcInput = static_cast<FCmpInst::Predicate>(CondCode);		FCmpInst::Predicate IcInput = static_cast<FCmpInst::Predicate>(CondCode);
ISD::CondCode CCOpcode = getFCmpCondCode(IcInput);		ISD::CondCode CCOpcode = getFCmpCondCode(IcInput);
return DAG.getNode(AMDGPUISD::SETCC, SL, VT, Src0,		unsigned WavefrontSize = TLI.getSubtarget()->getWavefrontSize();
		EVT CCVT = EVT::getIntegerVT(*DAG.getContext(), WavefrontSize);
		SDValue SetCC = DAG.getNode(AMDGPUISD::SETCC, SL, CCVT, Src0,
Src1, DAG.getCondCode(CCOpcode));		Src1, DAG.getCondCode(CCOpcode));
		if (VT.bitsEq(CCVT))
		return SetCC;
		return DAG.getZExtOrTrunc(SetCC, SL, VT);
}		}

void SITargetLowering::ReplaceNodeResults(SDNode *N,		void SITargetLowering::ReplaceNodeResults(SDNode *N,
SmallVectorImpl<SDValue> &Results,		SmallVectorImpl<SDValue> &Results,
SelectionDAG &DAG) const {		SelectionDAG &DAG) const {
switch (N->getOpcode()) {		switch (N->getOpcode()) {
case ISD::INSERT_VECTOR_ELT: {		case ISD::INSERT_VECTOR_ELT: {
if (SDValue Res = lowerINSERT_VECTOR_ELT(SDValue(N, 0), DAG))		if (SDValue Res = lowerINSERT_VECTOR_ELT(SDValue(N, 0), DAG))
▲ Show 20 Lines • Show All 1,494 Lines • ▼ Show 20 Lines	case Intrinsic::r600_read_tidig_y:
return loadInputValue(DAG, &AMDGPU::VGPR_32RegClass, MVT::i32,		return loadInputValue(DAG, &AMDGPU::VGPR_32RegClass, MVT::i32,
SDLoc(DAG.getEntryNode()),		SDLoc(DAG.getEntryNode()),
MFI->getArgInfo().WorkItemIDY);		MFI->getArgInfo().WorkItemIDY);
case Intrinsic::amdgcn_workitem_id_z:		case Intrinsic::amdgcn_workitem_id_z:
case Intrinsic::r600_read_tidig_z:		case Intrinsic::r600_read_tidig_z:
return loadInputValue(DAG, &AMDGPU::VGPR_32RegClass, MVT::i32,		return loadInputValue(DAG, &AMDGPU::VGPR_32RegClass, MVT::i32,
SDLoc(DAG.getEntryNode()),		SDLoc(DAG.getEntryNode()),
MFI->getArgInfo().WorkItemIDZ);		MFI->getArgInfo().WorkItemIDZ);
		case Intrinsic::amdgcn_wavefrontsize:
		return DAG.getConstant(MF.getSubtarget<GCNSubtarget>().getWavefrontSize(),
		SDLoc(Op), MVT::i32);
case Intrinsic::amdgcn_s_buffer_load: {		case Intrinsic::amdgcn_s_buffer_load: {
unsigned Cache = cast<ConstantSDNode>(Op.getOperand(3))->getZExtValue();		unsigned Cache = cast<ConstantSDNode>(Op.getOperand(3))->getZExtValue();
return lowerSBuffer(VT, DL, Op.getOperand(1), Op.getOperand(2),		return lowerSBuffer(VT, DL, Op.getOperand(1), Op.getOperand(2),
DAG.getTargetConstant(Cache & 1, DL, MVT::i1), DAG);		DAG.getTargetConstant(Cache & 1, DL, MVT::i1), DAG);
}		}
case Intrinsic::amdgcn_fdiv_fast:		case Intrinsic::amdgcn_fdiv_fast:
return lowerFDIV_FAST(Op, DAG);		return lowerFDIV_FAST(Op, DAG);
case Intrinsic::amdgcn_interp_mov: {		case Intrinsic::amdgcn_interp_mov: {
▲ Show 20 Lines • Show All 188 Lines • ▼ Show 20 Lines	SDValue SITargetLowering::LowerINTRINSIC_WO_CHAIN(SDValue Op,
case Intrinsic::amdgcn_wwm: {		case Intrinsic::amdgcn_wwm: {
SDValue Src = Op.getOperand(1);		SDValue Src = Op.getOperand(1);
return SDValue(DAG.getMachineNode(AMDGPU::WWM, DL, Src.getValueType(), Src),		return SDValue(DAG.getMachineNode(AMDGPU::WWM, DL, Src.getValueType(), Src),
0);		0);
}		}
case Intrinsic::amdgcn_fmad_ftz:		case Intrinsic::amdgcn_fmad_ftz:
return DAG.getNode(AMDGPUISD::FMAD_FTZ, DL, VT, Op.getOperand(1),		return DAG.getNode(AMDGPUISD::FMAD_FTZ, DL, VT, Op.getOperand(1),
Op.getOperand(2), Op.getOperand(3));		Op.getOperand(2), Op.getOperand(3));

		case Intrinsic::amdgcn_if_break:
		return SDValue(DAG.getMachineNode(AMDGPU::SI_IF_BREAK, DL, VT,
		Op->getOperand(1), Op->getOperand(2)), 0);

default:		default:
if (const AMDGPU::ImageDimIntrinsicInfo *ImageDimIntr =		if (const AMDGPU::ImageDimIntrinsicInfo *ImageDimIntr =
AMDGPU::getImageDimIntrinsicInfo(IntrinsicID))		AMDGPU::getImageDimIntrinsicInfo(IntrinsicID))
return lowerImage(Op, ImageDimIntr, DAG);		return lowerImage(Op, ImageDimIntr, DAG);

return Op;		return Op;
}		}
}		}
▲ Show 20 Lines • Show All 881 Lines • ▼ Show 20 Lines	case Intrinsic::amdgcn_struct_buffer_store_format: {
EVT VDataType = VData.getValueType().getScalarType();		EVT VDataType = VData.getValueType().getScalarType();
if (VDataType == MVT::i8 \|\| VDataType == MVT::i16)		if (VDataType == MVT::i8 \|\| VDataType == MVT::i16)
return handleByteShortBufferStores(DAG, VDataType, DL, Ops, M);		return handleByteShortBufferStores(DAG, VDataType, DL, Ops, M);

return DAG.getMemIntrinsicNode(Opc, DL, Op->getVTList(), Ops,		return DAG.getMemIntrinsicNode(Opc, DL, Op->getVTList(), Ops,
M->getMemoryVT(), M->getMemOperand());		M->getMemoryVT(), M->getMemOperand());
}		}

		case Intrinsic::amdgcn_end_cf:
		return SDValue(DAG.getMachineNode(AMDGPU::SI_END_CF, DL, MVT::Other,
		Op->getOperand(2), Chain), 0);

default: {		default: {
if (const AMDGPU::ImageDimIntrinsicInfo *ImageDimIntr =		if (const AMDGPU::ImageDimIntrinsicInfo *ImageDimIntr =
AMDGPU::getImageDimIntrinsicInfo(IntrinsicID))		AMDGPU::getImageDimIntrinsicInfo(IntrinsicID))
return lowerImage(Op, ImageDimIntr, DAG);		return lowerImage(Op, ImageDimIntr, DAG);

return Op;		return Op;
}		}
}		}
▲ Show 20 Lines • Show All 3,722 Lines • Show Last 20 Lines

lib/Target/AMDGPU/SIInstructions.td

	Show First 20 Lines • Show All 597 Lines • ▼ Show 20 Lines

	def : Pat <			def : Pat <
	(int_amdgcn_kill (i1 (setcc f32:$src, InlineFPImm<f32>:$imm, cond:$cond))),			(int_amdgcn_kill (i1 (setcc f32:$src, InlineFPImm<f32>:$imm, cond:$cond))),
	(SI_KILL_F32_COND_IMM_PSEUDO $src, (bitcast_fpimm_to_i32 $imm), (cond_as_i32imm $cond))			(SI_KILL_F32_COND_IMM_PSEUDO $src, (bitcast_fpimm_to_i32 $imm), (cond_as_i32imm $cond))
	>;			>;

	// TODO: we could add more variants for other types of conditionals			// TODO: we could add more variants for other types of conditionals

	def : Pat <			def : Pat <
	(int_amdgcn_icmp i1:$src, (i1 0), (i32 33)),			(i64 (int_amdgcn_icmp i1:$src, (i1 0), (i32 33))),
				(COPY $src) // Return the SGPRs representing i1 src
				>;

				def : Pat <
				(i32 (int_amdgcn_icmp i1:$src, (i1 0), (i32 33))),
	(COPY $src) // Return the SGPRs representing i1 src			(COPY $src) // Return the SGPRs representing i1 src
	>;			>;
				arsenmUnsubmitted Not Done Reply Inline Actions I don't think these should ever get here? These should have been turned into AMDGPUSsetcc arsenm: I don't think these should ever get here? These should have been turned into AMDGPUSsetcc
				rampitecAuthorUnsubmitted Done Reply Inline Actions That's an inplace replacement. We can explore if it still used or not, but keeping just wave64 version is clearly wrong. rampitec: That's an inplace replacement. We can explore if it still used or not, but keeping just wave64…

	//===----------------------------------------------------------------------===//			//===----------------------------------------------------------------------===//
	// VOP1 Patterns			// VOP1 Patterns
	//===----------------------------------------------------------------------===//			//===----------------------------------------------------------------------===//

	let OtherPredicates = [UnsafeFPMath] in {			let OtherPredicates = [UnsafeFPMath] in {

	//def : RcpPat<V_RCP_F64_e32, f64>;			//def : RcpPat<V_RCP_F64_e32, f64>;
	▲ Show 20 Lines • Show All 1,146 Lines • Show Last 20 Lines

lib/Transforms/InstCombine/InstCombineCalls.cpp

Show First 20 Lines • Show All 3,727 Lines • ▼ Show 20 Lines	if (match(Src1, m_Zero()) &&
SrcLHS = Builder.CreateZExt(SrcLHS, CmpTy);		SrcLHS = Builder.CreateZExt(SrcLHS, CmpTy);
SrcRHS = Builder.CreateZExt(SrcRHS, CmpTy);		SrcRHS = Builder.CreateZExt(SrcRHS, CmpTy);
}		}
}		}
} else if (!Ty->isFloatTy() && !Ty->isDoubleTy() && !Ty->isHalfTy())		} else if (!Ty->isFloatTy() && !Ty->isDoubleTy() && !Ty->isHalfTy())
break;		break;

Function *NewF =		Function *NewF =
Intrinsic::getDeclaration(II->getModule(), NewIID, SrcLHS->getType());		Intrinsic::getDeclaration(II->getModule(), NewIID,
		{ II->getType(),
		SrcLHS->getType() });
Value *Args[] = { SrcLHS, SrcRHS,		Value *Args[] = { SrcLHS, SrcRHS,
ConstantInt::get(CC->getType(), SrcPred) };		ConstantInt::get(CC->getType(), SrcPred) };
CallInst *NewCall = Builder.CreateCall(NewF, Args);		CallInst *NewCall = Builder.CreateCall(NewF, Args);
NewCall->takeName(II);		NewCall->takeName(II);
return replaceInstUsesWith(*II, NewCall);		return replaceInstUsesWith(*II, NewCall);
}		}

break;		break;
▲ Show 20 Lines • Show All 950 Lines • Show Last 20 Lines

test/CodeGen/AMDGPU/diverge-switch-default.ll

	Show All 32 Lines
	; is sensitive to optimizations; so we just ensure that the relevant			; is sensitive to optimizations; so we just ensure that the relevant
	; operations in the block body are indeed in the same block.			; operations in the block body are indeed in the same block.

	; CHECK: [[PHI:%[a-zA-Z0-9._]+]] = phi i64			; CHECK: [[PHI:%[a-zA-Z0-9._]+]] = phi i64
	; CHECK-NOT: {{ br }}			; CHECK-NOT: {{ br }}
	; CHECK: load i8			; CHECK: load i8
	; CHECK-NOT: {{ br }}			; CHECK-NOT: {{ br }}
	; CHECK: [[ICMP:%[a-zA-Z0-9._]+]] = icmp eq			; CHECK: [[ICMP:%[a-zA-Z0-9._]+]] = icmp eq
	; CHECK: [[IF:%[a-zA-Z0-9._]+]] = call i64 @llvm.amdgcn.if.break(i1 [[ICMP]], i64 [[PHI]])			; CHECK: [[IF:%[a-zA-Z0-9._]+]] = call i64 @llvm.amdgcn.if.break.i64.i64(i1 [[ICMP]], i64 [[PHI]])
	; CHECK: [[LOOP:%[a-zA-Z0-9._]+]] = call i1 @llvm.amdgcn.loop(i64 [[IF]])			; CHECK: [[LOOP:%[a-zA-Z0-9._]+]] = call i1 @llvm.amdgcn.loop.i64(i64 [[IF]])
	; CHECK: br i1 [[LOOP]]			; CHECK: br i1 [[LOOP]]

	sw.while:			sw.while:
	%p = phi i8 addrspace(1)* [ %gep_in, %sw.epilog ], [ %incdec.ptr, %sw.while ]			%p = phi i8 addrspace(1)* [ %gep_in, %sw.epilog ], [ %incdec.ptr, %sw.while ]
	%count = phi i32 [ 0, %sw.epilog ], [ %count.inc, %sw.while ]			%count = phi i32 [ 0, %sw.epilog ], [ %count.inc, %sw.while ]
	%char = load i8, i8 addrspace(1)* %p, align 1			%char = load i8, i8 addrspace(1)* %p, align 1
	%tobool = icmp eq i8 %char, 0			%tobool = icmp eq i8 %char, 0
	%incdec.ptr = getelementptr inbounds i8, i8 addrspace(1)* %p, i64 1			%incdec.ptr = getelementptr inbounds i8, i8 addrspace(1)* %p, i64 1
	Show All 14 Lines

test/CodeGen/AMDGPU/loop_break.ll

	Show All 9 Lines

	; OPT: bb4:			; OPT: bb4:
	; OPT: load volatile			; OPT: load volatile
	; OPT: icmp slt i32			; OPT: icmp slt i32
	; OPT: xor i1 %cmp1			; OPT: xor i1 %cmp1
	; OPT: br label %Flow			; OPT: br label %Flow

	; OPT: Flow:			; OPT: Flow:
	; OPT: call i64 @llvm.amdgcn.if.break(			; OPT: call i64 @llvm.amdgcn.if.break.i64.i64(
	; OPT: call i1 @llvm.amdgcn.loop(i64			; OPT: call i1 @llvm.amdgcn.loop.i64(i64
	; OPT: br i1 %{{[0-9]+}}, label %bb9, label %bb1			; OPT: br i1 %{{[0-9]+}}, label %bb9, label %bb1

	; OPT: bb9:			; OPT: bb9:
	; OPT: call void @llvm.amdgcn.end.cf(i64			; OPT: call void @llvm.amdgcn.end.cf.i64(i64

	; GCN-LABEL: {{^}}break_loop:			; GCN-LABEL: {{^}}break_loop:
	; GCN: s_mov_b64 [[OUTER_MASK:s\[[0-9]+:[0-9]+\]]], 0{{$}}			; GCN: s_mov_b64 [[OUTER_MASK:s\[[0-9]+:[0-9]+\]]], 0{{$}}

	; GCN: [[LOOP_ENTRY:BB[0-9]+_[0-9]+]]: ; %bb1			; GCN: [[LOOP_ENTRY:BB[0-9]+_[0-9]+]]: ; %bb1
	; GCN: v_cmp_lt_i32_e32 vcc, -1			; GCN: v_cmp_lt_i32_e32 vcc, -1
	; GCN: s_and_b64 vcc, exec, vcc			; GCN: s_and_b64 vcc, exec, vcc
	; GCN: s_or_b64 [[INNER_MASK:s\[[0-9]+:[0-9]+\]]], [[INNER_MASK]], exec			; GCN: s_or_b64 [[INNER_MASK:s\[[0-9]+:[0-9]+\]]], [[INNER_MASK]], exec
	▲ Show 20 Lines • Show All 47 Lines • ▼ Show 20 Lines
	; OPT: bb4:			; OPT: bb4:
	; OPT-NEXT: %load = load volatile i32, i32 addrspace(1)* undef, align 4			; OPT-NEXT: %load = load volatile i32, i32 addrspace(1)* undef, align 4
	; OPT-NEXT: %cmp1 = icmp sge i32 %tmp, %load			; OPT-NEXT: %cmp1 = icmp sge i32 %tmp, %load
	; OPT-NEXT: br label %Flow			; OPT-NEXT: br label %Flow

	; OPT: Flow:			; OPT: Flow:
	; OPT-NEXT: %tmp2 = phi i32 [ %lsr.iv.next, %bb4 ], [ undef, %bb1 ]			; OPT-NEXT: %tmp2 = phi i32 [ %lsr.iv.next, %bb4 ], [ undef, %bb1 ]
	; OPT-NEXT: %tmp3 = phi i1 [ %cmp1, %bb4 ], [ undef, %bb1 ]			; OPT-NEXT: %tmp3 = phi i1 [ %cmp1, %bb4 ], [ undef, %bb1 ]
	; OPT-NEXT: %0 = call i64 @llvm.amdgcn.if.break(i1 %tmp3, i64 %phi.broken)			; OPT-NEXT: %0 = call i64 @llvm.amdgcn.if.break.i64.i64(i1 %tmp3, i64 %phi.broken)
	; OPT-NEXT: %1 = call i1 @llvm.amdgcn.loop(i64 %0)			; OPT-NEXT: %1 = call i1 @llvm.amdgcn.loop.i64(i64 %0)
	; OPT-NEXT: br i1 %1, label %bb9, label %bb1			; OPT-NEXT: br i1 %1, label %bb9, label %bb1

	; OPT: bb9: ; preds = %Flow			; OPT: bb9: ; preds = %Flow
	; OPT-NEXT: call void @llvm.amdgcn.end.cf(i64 %0)			; OPT-NEXT: call void @llvm.amdgcn.end.cf.i64(i64 %0)
	; OPT-NEXT: store volatile i32 7			; OPT-NEXT: store volatile i32 7
	; OPT-NEXT: ret void			; OPT-NEXT: ret void
	define amdgpu_kernel void @undef_phi_cond_break_loop(i32 %arg) #0 {			define amdgpu_kernel void @undef_phi_cond_break_loop(i32 %arg) #0 {
	bb:			bb:
	%id = call i32 @llvm.amdgcn.workitem.id.x()			%id = call i32 @llvm.amdgcn.workitem.id.x()
	%tmp = sub i32 %id, %arg			%tmp = sub i32 %id, %arg
	br label %bb1			br label %bb1

	Show All 32 Lines
	; OPT: bb4:			; OPT: bb4:
	; OPT-NEXT: %load = load volatile i32, i32 addrspace(1)* undef, align 4			; OPT-NEXT: %load = load volatile i32, i32 addrspace(1)* undef, align 4
	; OPT-NEXT: %cmp1 = icmp sge i32 %tmp, %load			; OPT-NEXT: %cmp1 = icmp sge i32 %tmp, %load
	; OPT-NEXT: br label %Flow			; OPT-NEXT: br label %Flow

	; OPT: Flow:			; OPT: Flow:
	; OPT-NEXT: %tmp2 = phi i32 [ %lsr.iv.next, %bb4 ], [ undef, %bb1 ]			; OPT-NEXT: %tmp2 = phi i32 [ %lsr.iv.next, %bb4 ], [ undef, %bb1 ]
	; OPT-NEXT: %tmp3 = phi i1 [ %cmp1, %bb4 ], [ icmp ne (i32 addrspace(3)* inttoptr (i32 4 to i32 addrspace(3)), i32 addrspace(3) @lds), %bb1 ]			; OPT-NEXT: %tmp3 = phi i1 [ %cmp1, %bb4 ], [ icmp ne (i32 addrspace(3)* inttoptr (i32 4 to i32 addrspace(3)), i32 addrspace(3) @lds), %bb1 ]
	; OPT-NEXT: %0 = call i64 @llvm.amdgcn.if.break(i1 %tmp3, i64 %phi.broken)			; OPT-NEXT: %0 = call i64 @llvm.amdgcn.if.break.i64.i64(i1 %tmp3, i64 %phi.broken)
	; OPT-NEXT: %1 = call i1 @llvm.amdgcn.loop(i64 %0)			; OPT-NEXT: %1 = call i1 @llvm.amdgcn.loop.i64(i64 %0)
	; OPT-NEXT: br i1 %1, label %bb9, label %bb1			; OPT-NEXT: br i1 %1, label %bb9, label %bb1

	; OPT: bb9: ; preds = %Flow			; OPT: bb9: ; preds = %Flow
	; OPT-NEXT: call void @llvm.amdgcn.end.cf(i64 %0)			; OPT-NEXT: call void @llvm.amdgcn.end.cf.i64(i64 %0)
	; OPT-NEXT: store volatile i32 7			; OPT-NEXT: store volatile i32 7
	; OPT-NEXT: ret void			; OPT-NEXT: ret void
	define amdgpu_kernel void @constexpr_phi_cond_break_loop(i32 %arg) #0 {			define amdgpu_kernel void @constexpr_phi_cond_break_loop(i32 %arg) #0 {
	bb:			bb:
	%id = call i32 @llvm.amdgcn.workitem.id.x()			%id = call i32 @llvm.amdgcn.workitem.id.x()
	%tmp = sub i32 %id, %arg			%tmp = sub i32 %id, %arg
	br label %bb1			br label %bb1

	Show All 29 Lines
	; OPT: bb4:			; OPT: bb4:
	; OPT-NEXT: %load = load volatile i32, i32 addrspace(1)* undef, align 4			; OPT-NEXT: %load = load volatile i32, i32 addrspace(1)* undef, align 4
	; OPT-NEXT: %cmp1 = icmp sge i32 %tmp, %load			; OPT-NEXT: %cmp1 = icmp sge i32 %tmp, %load
	; OPT-NEXT: br label %Flow			; OPT-NEXT: br label %Flow

	; OPT: Flow:			; OPT: Flow:
	; OPT-NEXT: %tmp2 = phi i32 [ %lsr.iv.next, %bb4 ], [ undef, %bb1 ]			; OPT-NEXT: %tmp2 = phi i32 [ %lsr.iv.next, %bb4 ], [ undef, %bb1 ]
	; OPT-NEXT: %tmp3 = phi i1 [ %cmp1, %bb4 ], [ true, %bb1 ]			; OPT-NEXT: %tmp3 = phi i1 [ %cmp1, %bb4 ], [ true, %bb1 ]
	; OPT-NEXT: %0 = call i64 @llvm.amdgcn.if.break(i1 %tmp3, i64 %phi.broken)			; OPT-NEXT: %0 = call i64 @llvm.amdgcn.if.break.i64.i64(i1 %tmp3, i64 %phi.broken)
	; OPT-NEXT: %1 = call i1 @llvm.amdgcn.loop(i64 %0)			; OPT-NEXT: %1 = call i1 @llvm.amdgcn.loop.i64(i64 %0)
	; OPT-NEXT: br i1 %1, label %bb9, label %bb1			; OPT-NEXT: br i1 %1, label %bb9, label %bb1

	; OPT: bb9: ; preds = %Flow			; OPT: bb9: ; preds = %Flow
	; OPT-NEXT: call void @llvm.amdgcn.end.cf(i64 %0)			; OPT-NEXT: call void @llvm.amdgcn.end.cf.i64(i64 %0)
	; OPT-NEXT: store volatile i32 7			; OPT-NEXT: store volatile i32 7
	; OPT-NEXT: ret void			; OPT-NEXT: ret void
	define amdgpu_kernel void @true_phi_cond_break_loop(i32 %arg) #0 {			define amdgpu_kernel void @true_phi_cond_break_loop(i32 %arg) #0 {
	bb:			bb:
	%id = call i32 @llvm.amdgcn.workitem.id.x()			%id = call i32 @llvm.amdgcn.workitem.id.x()
	%tmp = sub i32 %id, %arg			%tmp = sub i32 %id, %arg
	br label %bb1			br label %bb1

	Show All 28 Lines
	; OPT: bb4:			; OPT: bb4:
	; OPT-NEXT: %load = load volatile i32, i32 addrspace(1)* undef, align 4			; OPT-NEXT: %load = load volatile i32, i32 addrspace(1)* undef, align 4
	; OPT-NEXT: %cmp1 = icmp sge i32 %tmp, %load			; OPT-NEXT: %cmp1 = icmp sge i32 %tmp, %load
	; OPT-NEXT: br label %Flow			; OPT-NEXT: br label %Flow

	; OPT: Flow:			; OPT: Flow:
	; OPT-NEXT: %tmp2 = phi i32 [ %lsr.iv.next, %bb4 ], [ undef, %bb1 ]			; OPT-NEXT: %tmp2 = phi i32 [ %lsr.iv.next, %bb4 ], [ undef, %bb1 ]
	; OPT-NEXT: %tmp3 = phi i1 [ %cmp1, %bb4 ], [ false, %bb1 ]			; OPT-NEXT: %tmp3 = phi i1 [ %cmp1, %bb4 ], [ false, %bb1 ]
	; OPT-NEXT: %0 = call i64 @llvm.amdgcn.if.break(i1 %tmp3, i64 %phi.broken)			; OPT-NEXT: %0 = call i64 @llvm.amdgcn.if.break.i64.i64(i1 %tmp3, i64 %phi.broken)
	; OPT-NEXT: %1 = call i1 @llvm.amdgcn.loop(i64 %0)			; OPT-NEXT: %1 = call i1 @llvm.amdgcn.loop.i64(i64 %0)
	; OPT-NEXT: br i1 %1, label %bb9, label %bb1			; OPT-NEXT: br i1 %1, label %bb9, label %bb1

	; OPT: bb9: ; preds = %Flow			; OPT: bb9: ; preds = %Flow
	; OPT-NEXT: call void @llvm.amdgcn.end.cf(i64 %0)			; OPT-NEXT: call void @llvm.amdgcn.end.cf.i64(i64 %0)
	; OPT-NEXT: store volatile i32 7			; OPT-NEXT: store volatile i32 7
	; OPT-NEXT: ret void			; OPT-NEXT: ret void
	define amdgpu_kernel void @false_phi_cond_break_loop(i32 %arg) #0 {			define amdgpu_kernel void @false_phi_cond_break_loop(i32 %arg) #0 {
	bb:			bb:
	%id = call i32 @llvm.amdgcn.workitem.id.x()			%id = call i32 @llvm.amdgcn.workitem.id.x()
	%tmp = sub i32 %id, %arg			%tmp = sub i32 %id, %arg
	br label %bb1			br label %bb1

	Show All 33 Lines
	; OPT-NEXT: %load = load volatile i32, i32 addrspace(1)* undef, align 4			; OPT-NEXT: %load = load volatile i32, i32 addrspace(1)* undef, align 4
	; OPT-NEXT: %cmp1 = icmp sge i32 %tmp, %load			; OPT-NEXT: %cmp1 = icmp sge i32 %tmp, %load
	; OPT-NEXT: br label %Flow			; OPT-NEXT: br label %Flow

	; OPT: Flow:			; OPT: Flow:
	; OPT-NEXT: %tmp2 = phi i32 [ %lsr.iv.next, %bb4 ], [ undef, %bb1 ]			; OPT-NEXT: %tmp2 = phi i32 [ %lsr.iv.next, %bb4 ], [ undef, %bb1 ]
	; OPT-NEXT: %tmp3 = phi i1 [ %cmp1, %bb4 ], [ true, %bb1 ]			; OPT-NEXT: %tmp3 = phi i1 [ %cmp1, %bb4 ], [ true, %bb1 ]
	; OPT-NEXT: %0 = xor i1 %tmp3, true			; OPT-NEXT: %0 = xor i1 %tmp3, true
	; OPT-NEXT: %1 = call i64 @llvm.amdgcn.if.break(i1 %0, i64 %phi.broken)			; OPT-NEXT: %1 = call i64 @llvm.amdgcn.if.break.i64.i64(i1 %0, i64 %phi.broken)
	; OPT-NEXT: %2 = call i1 @llvm.amdgcn.loop(i64 %1)			; OPT-NEXT: %2 = call i1 @llvm.amdgcn.loop.i64(i64 %1)
	; OPT-NEXT: br i1 %2, label %bb9, label %bb1			; OPT-NEXT: br i1 %2, label %bb9, label %bb1

	; OPT: bb9:			; OPT: bb9:
	; OPT-NEXT: call void @llvm.amdgcn.end.cf(i64 %1)			; OPT-NEXT: call void @llvm.amdgcn.end.cf.i64(i64 %1)
	; OPT-NEXT: store volatile i32 7, i32 addrspace(3)* undef			; OPT-NEXT: store volatile i32 7, i32 addrspace(3)* undef
	; OPT-NEXT: ret void			; OPT-NEXT: ret void
	define amdgpu_kernel void @invert_true_phi_cond_break_loop(i32 %arg) #0 {			define amdgpu_kernel void @invert_true_phi_cond_break_loop(i32 %arg) #0 {
	bb:			bb:
	%id = call i32 @llvm.amdgcn.workitem.id.x()			%id = call i32 @llvm.amdgcn.workitem.id.x()
	%tmp = sub i32 %id, %arg			%tmp = sub i32 %id, %arg
	br label %bb1			br label %bb1

	Show All 25 Lines

test/CodeGen/AMDGPU/multi-divergent-exit-region.ll

; RUN: opt -mtriple=amdgcn-- -S -amdgpu-unify-divergent-exit-nodes -verify -structurizecfg -verify -si-annotate-control-flow %s \| FileCheck -check-prefix=IR %s		; RUN: opt -mtriple=amdgcn-- -S -amdgpu-unify-divergent-exit-nodes -verify -structurizecfg -verify -si-annotate-control-flow %s \| FileCheck -check-prefix=IR %s
; RUN: llc -march=amdgcn -verify-machineinstrs < %s \| FileCheck -check-prefix=GCN %s		; RUN: llc -march=amdgcn -verify-machineinstrs < %s \| FileCheck -check-prefix=GCN %s

; Add an extra verifier runs. There were some cases where invalid IR		; Add an extra verifier runs. There were some cases where invalid IR
; was produced but happened to be fixed by the later passes.		; was produced but happened to be fixed by the later passes.

; Make sure divergent control flow with multiple exits from a region		; Make sure divergent control flow with multiple exits from a region
; is properly handled. UnifyFunctionExitNodes should be run before		; is properly handled. UnifyFunctionExitNodes should be run before
; StructurizeCFG.		; StructurizeCFG.

; IR-LABEL: @multi_divergent_region_exit_ret_ret(		; IR-LABEL: @multi_divergent_region_exit_ret_ret(
; IR: %1 = call { i1, i64 } @llvm.amdgcn.if(i1 %0)		; IR: %1 = call { i1, i64 } @llvm.amdgcn.if.i64(i1 %0)
; IR: %2 = extractvalue { i1, i64 } %1, 0		; IR: %2 = extractvalue { i1, i64 } %1, 0
; IR: %3 = extractvalue { i1, i64 } %1, 1		; IR: %3 = extractvalue { i1, i64 } %1, 1
; IR: br i1 %2, label %LeafBlock1, label %Flow		; IR: br i1 %2, label %LeafBlock1, label %Flow

; IR: Flow:		; IR: Flow:
; IR: %4 = phi i1 [ true, %LeafBlock1 ], [ false, %entry ]		; IR: %4 = phi i1 [ true, %LeafBlock1 ], [ false, %entry ]
; IR: %5 = phi i1 [ %10, %LeafBlock1 ], [ false, %entry ]		; IR: %5 = phi i1 [ %10, %LeafBlock1 ], [ false, %entry ]
; IR: %6 = call { i1, i64 } @llvm.amdgcn.else(i64 %3)		; IR: %6 = call { i1, i64 } @llvm.amdgcn.else.i64.i64(i64 %3)
; IR: %7 = extractvalue { i1, i64 } %6, 0		; IR: %7 = extractvalue { i1, i64 } %6, 0
; IR: %8 = extractvalue { i1, i64 } %6, 1		; IR: %8 = extractvalue { i1, i64 } %6, 1
; IR: br i1 %7, label %LeafBlock, label %Flow1		; IR: br i1 %7, label %LeafBlock, label %Flow1

; IR: LeafBlock:		; IR: LeafBlock:
; IR: br label %Flow1		; IR: br label %Flow1

; IR: LeafBlock1:		; IR: LeafBlock1:
; IR: br label %Flow{{$}}		; IR: br label %Flow{{$}}

; IR: Flow2:		; IR: Flow2:
; IR: %11 = phi i1 [ false, %exit1 ], [ %15, %Flow1 ]		; IR: %11 = phi i1 [ false, %exit1 ], [ %15, %Flow1 ]
; IR: call void @llvm.amdgcn.end.cf(i64 %19)		; IR: call void @llvm.amdgcn.end.cf.i64(i64 %19)
; IR: %12 = call { i1, i64 } @llvm.amdgcn.if(i1 %11)		; IR: %12 = call { i1, i64 } @llvm.amdgcn.if.i64(i1 %11)
; IR: %13 = extractvalue { i1, i64 } %12, 0		; IR: %13 = extractvalue { i1, i64 } %12, 0
; IR: %14 = extractvalue { i1, i64 } %12, 1		; IR: %14 = extractvalue { i1, i64 } %12, 1
; IR: br i1 %13, label %exit0, label %UnifiedReturnBlock		; IR: br i1 %13, label %exit0, label %UnifiedReturnBlock

; IR: exit0:		; IR: exit0:
; IR: store volatile i32 9, i32 addrspace(1)* undef		; IR: store volatile i32 9, i32 addrspace(1)* undef
; IR: br label %UnifiedReturnBlock		; IR: br label %UnifiedReturnBlock

; IR: Flow1:		; IR: Flow1:
; IR: %15 = phi i1 [ %SwitchLeaf, %LeafBlock ], [ %4, %Flow ]		; IR: %15 = phi i1 [ %SwitchLeaf, %LeafBlock ], [ %4, %Flow ]
; IR: %16 = phi i1 [ %9, %LeafBlock ], [ %5, %Flow ]		; IR: %16 = phi i1 [ %9, %LeafBlock ], [ %5, %Flow ]
; IR: call void @llvm.amdgcn.end.cf(i64 %8)		; IR: call void @llvm.amdgcn.end.cf.i64(i64 %8)
; IR: %17 = call { i1, i64 } @llvm.amdgcn.if(i1 %16)		; IR: %17 = call { i1, i64 } @llvm.amdgcn.if.i64(i1 %16)
; IR: %18 = extractvalue { i1, i64 } %17, 0		; IR: %18 = extractvalue { i1, i64 } %17, 0
; IR: %19 = extractvalue { i1, i64 } %17, 1		; IR: %19 = extractvalue { i1, i64 } %17, 1
; IR: br i1 %18, label %exit1, label %Flow2		; IR: br i1 %18, label %exit1, label %Flow2

; IR: exit1:		; IR: exit1:
; IR: store volatile i32 17, i32 addrspace(3)* undef		; IR: store volatile i32 17, i32 addrspace(3)* undef
; IR: br label %Flow2		; IR: br label %Flow2

; IR: UnifiedReturnBlock:		; IR: UnifiedReturnBlock:
; IR: call void @llvm.amdgcn.end.cf(i64 %14)		; IR: call void @llvm.amdgcn.end.cf.i64(i64 %14)
; IR: ret void		; IR: ret void


; GCN-LABEL: {{^}}multi_divergent_region_exit_ret_ret:		; GCN-LABEL: {{^}}multi_divergent_region_exit_ret_ret:

; GCN-DAG: s_mov_b64 [[EXIT1:s\[[0-9]+:[0-9]+\]]], 0		; GCN-DAG: s_mov_b64 [[EXIT1:s\[[0-9]+:[0-9]+\]]], 0
; GCN-DAG: v_cmp_lt_i32_e32 vcc, 1,		; GCN-DAG: v_cmp_lt_i32_e32 vcc, 1,
; GCN-DAG: s_mov_b64 [[EXIT0:s\[[0-9]+:[0-9]+\]]], 0		; GCN-DAG: s_mov_b64 [[EXIT0:s\[[0-9]+:[0-9]+\]]], 0
▲ Show 20 Lines • Show All 70 Lines • ▼ Show 20 Lines	exit0: ; preds = %LeafBlock, %LeafBlock1
ret void		ret void

exit1: ; preds = %LeafBlock, %LeafBlock1		exit1: ; preds = %LeafBlock, %LeafBlock1
store volatile i32 17, i32 addrspace(3)* undef		store volatile i32 17, i32 addrspace(3)* undef
ret void		ret void
}		}

; IR-LABEL: @multi_divergent_region_exit_unreachable_unreachable(		; IR-LABEL: @multi_divergent_region_exit_unreachable_unreachable(
; IR: %1 = call { i1, i64 } @llvm.amdgcn.if(i1 %0)		; IR: %1 = call { i1, i64 } @llvm.amdgcn.if.i64(i1 %0)

; IR: %6 = call { i1, i64 } @llvm.amdgcn.else(i64 %3)		; IR: %6 = call { i1, i64 } @llvm.amdgcn.else.i64.i64(i64 %3)

; IR: %11 = phi i1 [ false, %exit1 ], [ %15, %Flow1 ]		; IR: %11 = phi i1 [ false, %exit1 ], [ %15, %Flow1 ]
; IR: call void @llvm.amdgcn.end.cf(i64 %19)		; IR: call void @llvm.amdgcn.end.cf.i64(i64 %19)
; IR: %12 = call { i1, i64 } @llvm.amdgcn.if(i1 %11)		; IR: %12 = call { i1, i64 } @llvm.amdgcn.if.i64(i1 %11)
; IR: br i1 %13, label %exit0, label %UnifiedUnreachableBlock		; IR: br i1 %13, label %exit0, label %UnifiedUnreachableBlock


; IR: UnifiedUnreachableBlock:		; IR: UnifiedUnreachableBlock:
; IR-NEXT: unreachable		; IR-NEXT: unreachable


; FIXME: Probably should insert an s_endpgm anyway.		; FIXME: Probably should insert an s_endpgm anyway.
Show All 39 Lines
; IR-LABEL: @multi_exit_region_divergent_ret_uniform_ret(		; IR-LABEL: @multi_exit_region_divergent_ret_uniform_ret(
; IR: %divergent.cond0 = icmp slt i32 %tmp16, 2		; IR: %divergent.cond0 = icmp slt i32 %tmp16, 2
; IR: llvm.amdgcn.if		; IR: llvm.amdgcn.if
; IR: br i1		; IR: br i1

; IR: {{^}}Flow:		; IR: {{^}}Flow:
; IR: %4 = phi i1 [ true, %LeafBlock1 ], [ false, %entry ]		; IR: %4 = phi i1 [ true, %LeafBlock1 ], [ false, %entry ]
; IR: %5 = phi i1 [ %10, %LeafBlock1 ], [ false, %entry ]		; IR: %5 = phi i1 [ %10, %LeafBlock1 ], [ false, %entry ]
; IR: %6 = call { i1, i64 } @llvm.amdgcn.else(i64 %3)		; IR: %6 = call { i1, i64 } @llvm.amdgcn.else.i64.i64(i64 %3)
; IR: br i1 %7, label %LeafBlock, label %Flow1		; IR: br i1 %7, label %LeafBlock, label %Flow1

; IR: {{^}}LeafBlock:		; IR: {{^}}LeafBlock:
; IR: %divergent.cond1 = icmp eq i32 %tmp16, 1		; IR: %divergent.cond1 = icmp eq i32 %tmp16, 1
; IR: %9 = xor i1 %divergent.cond1, true		; IR: %9 = xor i1 %divergent.cond1, true
; IR: br label %Flow1		; IR: br label %Flow1

; IR: LeafBlock1:		; IR: LeafBlock1:
; IR: %uniform.cond0 = icmp eq i32 %arg3, 2		; IR: %uniform.cond0 = icmp eq i32 %arg3, 2
; IR: %10 = xor i1 %uniform.cond0, true		; IR: %10 = xor i1 %uniform.cond0, true
; IR: br label %Flow		; IR: br label %Flow

; IR: Flow2:		; IR: Flow2:
; IR: %11 = phi i1 [ false, %exit1 ], [ %15, %Flow1 ]		; IR: %11 = phi i1 [ false, %exit1 ], [ %15, %Flow1 ]
; IR: call void @llvm.amdgcn.end.cf(i64 %19)		; IR: call void @llvm.amdgcn.end.cf.i64(i64 %19)
; IR: %12 = call { i1, i64 } @llvm.amdgcn.if(i1 %11)		; IR: %12 = call { i1, i64 } @llvm.amdgcn.if.i64(i1 %11)
; IR: br i1 %13, label %exit0, label %UnifiedReturnBlock		; IR: br i1 %13, label %exit0, label %UnifiedReturnBlock

; IR: exit0:		; IR: exit0:
; IR: store volatile i32 9, i32 addrspace(1)* undef		; IR: store volatile i32 9, i32 addrspace(1)* undef
; IR: br label %UnifiedReturnBlock		; IR: br label %UnifiedReturnBlock

; IR: {{^}}Flow1:		; IR: {{^}}Flow1:
; IR: %15 = phi i1 [ %divergent.cond1, %LeafBlock ], [ %4, %Flow ]		; IR: %15 = phi i1 [ %divergent.cond1, %LeafBlock ], [ %4, %Flow ]
; IR: %16 = phi i1 [ %9, %LeafBlock ], [ %5, %Flow ]		; IR: %16 = phi i1 [ %9, %LeafBlock ], [ %5, %Flow ]
; IR: call void @llvm.amdgcn.end.cf(i64 %8)		; IR: call void @llvm.amdgcn.end.cf.i64(i64 %8)
; IR: %17 = call { i1, i64 } @llvm.amdgcn.if(i1 %16)		; IR: %17 = call { i1, i64 } @llvm.amdgcn.if.i64(i1 %16)
; IR: %18 = extractvalue { i1, i64 } %17, 0		; IR: %18 = extractvalue { i1, i64 } %17, 0
; IR: %19 = extractvalue { i1, i64 } %17, 1		; IR: %19 = extractvalue { i1, i64 } %17, 1
; IR: br i1 %18, label %exit1, label %Flow2		; IR: br i1 %18, label %exit1, label %Flow2

; IR: exit1:		; IR: exit1:
; IR: store volatile i32 17, i32 addrspace(3)* undef		; IR: store volatile i32 17, i32 addrspace(3)* undef
; IR: br label %Flow2		; IR: br label %Flow2

; IR: UnifiedReturnBlock:		; IR: UnifiedReturnBlock:
; IR: call void @llvm.amdgcn.end.cf(i64 %14)		; IR: call void @llvm.amdgcn.end.cf.i64(i64 %14)
; IR: ret void		; IR: ret void
define amdgpu_kernel void @multi_exit_region_divergent_ret_uniform_ret(i32 addrspace(1)* nocapture %arg0, i32 addrspace(1)* nocapture %arg1, i32 addrspace(1)* nocapture %arg2, i32 %arg3) #0 {		define amdgpu_kernel void @multi_exit_region_divergent_ret_uniform_ret(i32 addrspace(1)* nocapture %arg0, i32 addrspace(1)* nocapture %arg1, i32 addrspace(1)* nocapture %arg2, i32 %arg3) #0 {
entry:		entry:
%tmp = tail call i32 @llvm.amdgcn.workitem.id.x() #1		%tmp = tail call i32 @llvm.amdgcn.workitem.id.x() #1
%tmp1 = add i32 0, %tmp		%tmp1 = add i32 0, %tmp
%tmp2 = zext i32 %tmp1 to i64		%tmp2 = zext i32 %tmp1 to i64
%tmp3 = add i64 0, %tmp2		%tmp3 = add i64 0, %tmp2
%tmp4 = shl i64 %tmp3, 32		%tmp4 = shl i64 %tmp3, 32
Show All 22 Lines	exit0: ; preds = %LeafBlock, %LeafBlock1
ret void		ret void

exit1: ; preds = %LeafBlock, %LeafBlock1		exit1: ; preds = %LeafBlock, %LeafBlock1
store volatile i32 17, i32 addrspace(3)* undef		store volatile i32 17, i32 addrspace(3)* undef
ret void		ret void
}		}

; IR-LABEL: @multi_exit_region_uniform_ret_divergent_ret(		; IR-LABEL: @multi_exit_region_uniform_ret_divergent_ret(
; IR: %1 = call { i1, i64 } @llvm.amdgcn.if(i1 %0)		; IR: %1 = call { i1, i64 } @llvm.amdgcn.if.i64(i1 %0)
; IR: br i1 %2, label %LeafBlock1, label %Flow		; IR: br i1 %2, label %LeafBlock1, label %Flow

; IR: Flow:		; IR: Flow:
; IR: %4 = phi i1 [ true, %LeafBlock1 ], [ false, %entry ]		; IR: %4 = phi i1 [ true, %LeafBlock1 ], [ false, %entry ]
; IR: %5 = phi i1 [ %10, %LeafBlock1 ], [ false, %entry ]		; IR: %5 = phi i1 [ %10, %LeafBlock1 ], [ false, %entry ]
; IR: %6 = call { i1, i64 } @llvm.amdgcn.else(i64 %3)		; IR: %6 = call { i1, i64 } @llvm.amdgcn.else.i64.i64(i64 %3)

; IR: %11 = phi i1 [ false, %exit1 ], [ %15, %Flow1 ]		; IR: %11 = phi i1 [ false, %exit1 ], [ %15, %Flow1 ]
; IR: call void @llvm.amdgcn.end.cf(i64 %19)		; IR: call void @llvm.amdgcn.end.cf.i64(i64 %19)
; IR: %12 = call { i1, i64 } @llvm.amdgcn.if(i1 %11)		; IR: %12 = call { i1, i64 } @llvm.amdgcn.if.i64(i1 %11)

define amdgpu_kernel void @multi_exit_region_uniform_ret_divergent_ret(i32 addrspace(1)* nocapture %arg0, i32 addrspace(1)* nocapture %arg1, i32 addrspace(1)* nocapture %arg2, i32 %arg3) #0 {		define amdgpu_kernel void @multi_exit_region_uniform_ret_divergent_ret(i32 addrspace(1)* nocapture %arg0, i32 addrspace(1)* nocapture %arg1, i32 addrspace(1)* nocapture %arg2, i32 %arg3) #0 {
entry:		entry:
%tmp = tail call i32 @llvm.amdgcn.workitem.id.x() #1		%tmp = tail call i32 @llvm.amdgcn.workitem.id.x() #1
%tmp1 = add i32 0, %tmp		%tmp1 = add i32 0, %tmp
%tmp2 = zext i32 %tmp1 to i64		%tmp2 = zext i32 %tmp1 to i64
%tmp3 = add i64 0, %tmp2		%tmp3 = add i64 0, %tmp2
%tmp4 = shl i64 %tmp3, 32		%tmp4 = shl i64 %tmp3, 32
Show All 24 Lines
exit1: ; preds = %LeafBlock, %LeafBlock1		exit1: ; preds = %LeafBlock, %LeafBlock1
store volatile i32 17, i32 addrspace(3)* undef		store volatile i32 17, i32 addrspace(3)* undef
ret void		ret void
}		}

; IR-LABEL: @multi_divergent_region_exit_ret_ret_return_value(		; IR-LABEL: @multi_divergent_region_exit_ret_ret_return_value(
; IR: Flow2:		; IR: Flow2:
; IR: %11 = phi i1 [ false, %exit1 ], [ %15, %Flow1 ]		; IR: %11 = phi i1 [ false, %exit1 ], [ %15, %Flow1 ]
; IR: call void @llvm.amdgcn.end.cf(i64 %19)		; IR: call void @llvm.amdgcn.end.cf.i64(i64 %19)

; IR: UnifiedReturnBlock:		; IR: UnifiedReturnBlock:
; IR: %UnifiedRetVal = phi float [ 2.000000e+00, %Flow2 ], [ 1.000000e+00, %exit0 ]		; IR: %UnifiedRetVal = phi float [ 2.000000e+00, %Flow2 ], [ 1.000000e+00, %exit0 ]
; IR: call void @llvm.amdgcn.end.cf(i64 %14)		; IR: call void @llvm.amdgcn.end.cf.i64(i64 %14)
; IR: ret float %UnifiedRetVal		; IR: ret float %UnifiedRetVal
define amdgpu_ps float @multi_divergent_region_exit_ret_ret_return_value(i32 %vgpr) #0 {		define amdgpu_ps float @multi_divergent_region_exit_ret_ret_return_value(i32 %vgpr) #0 {
entry:		entry:
%Pivot = icmp slt i32 %vgpr, 2		%Pivot = icmp slt i32 %vgpr, 2
br i1 %Pivot, label %LeafBlock, label %LeafBlock1		br i1 %Pivot, label %LeafBlock, label %LeafBlock1

LeafBlock: ; preds = %entry		LeafBlock: ; preds = %entry
%SwitchLeaf = icmp eq i32 %vgpr, 1		%SwitchLeaf = icmp eq i32 %vgpr, 1
▲ Show 20 Lines • Show All 51 Lines • ▼ Show 20 Lines	exit0: ; preds = %LeafBlock, %LeafBlock1
ret float 1.0		ret float 1.0

exit1: ; preds = %LeafBlock, %LeafBlock1		exit1: ; preds = %LeafBlock, %LeafBlock1
store i32 17, i32 addrspace(3)* undef		store i32 17, i32 addrspace(3)* undef
ret float 2.0		ret float 2.0
}		}

; IR-LABEL: @multi_divergent_region_exit_ret_unreachable(		; IR-LABEL: @multi_divergent_region_exit_ret_unreachable(
; IR: %1 = call { i1, i64 } @llvm.amdgcn.if(i1 %0)		; IR: %1 = call { i1, i64 } @llvm.amdgcn.if.i64(i1 %0)

; IR: Flow:		; IR: Flow:
; IR: %4 = phi i1 [ true, %LeafBlock1 ], [ false, %entry ]		; IR: %4 = phi i1 [ true, %LeafBlock1 ], [ false, %entry ]
; IR: %5 = phi i1 [ %10, %LeafBlock1 ], [ false, %entry ]		; IR: %5 = phi i1 [ %10, %LeafBlock1 ], [ false, %entry ]
; IR: %6 = call { i1, i64 } @llvm.amdgcn.else(i64 %3)		; IR: %6 = call { i1, i64 } @llvm.amdgcn.else.i64.i64(i64 %3)

; IR: Flow2:		; IR: Flow2:
; IR: %11 = phi i1 [ false, %exit1 ], [ %15, %Flow1 ]		; IR: %11 = phi i1 [ false, %exit1 ], [ %15, %Flow1 ]
; IR: call void @llvm.amdgcn.end.cf(i64 %19)		; IR: call void @llvm.amdgcn.end.cf.i64(i64 %19)
; IR: %12 = call { i1, i64 } @llvm.amdgcn.if(i1 %11)		; IR: %12 = call { i1, i64 } @llvm.amdgcn.if.i64(i1 %11)
; IR: br i1 %13, label %exit0, label %UnifiedReturnBlock		; IR: br i1 %13, label %exit0, label %UnifiedReturnBlock

; IR: exit0:		; IR: exit0:
; IR-NEXT: store volatile i32 17, i32 addrspace(3)* undef		; IR-NEXT: store volatile i32 17, i32 addrspace(3)* undef
; IR-NEXT: br label %UnifiedReturnBlock		; IR-NEXT: br label %UnifiedReturnBlock

; IR: Flow1:		; IR: Flow1:
; IR: %15 = phi i1 [ %SwitchLeaf, %LeafBlock ], [ %4, %Flow ]		; IR: %15 = phi i1 [ %SwitchLeaf, %LeafBlock ], [ %4, %Flow ]
; IR: %16 = phi i1 [ %9, %LeafBlock ], [ %5, %Flow ]		; IR: %16 = phi i1 [ %9, %LeafBlock ], [ %5, %Flow ]
; IR: call void @llvm.amdgcn.end.cf(i64 %8)		; IR: call void @llvm.amdgcn.end.cf.i64(i64 %8)
; IR: %17 = call { i1, i64 } @llvm.amdgcn.if(i1 %16)		; IR: %17 = call { i1, i64 } @llvm.amdgcn.if.i64(i1 %16)
; IR: %18 = extractvalue { i1, i64 } %17, 0		; IR: %18 = extractvalue { i1, i64 } %17, 0
; IR: %19 = extractvalue { i1, i64 } %17, 1		; IR: %19 = extractvalue { i1, i64 } %17, 1
; IR: br i1 %18, label %exit1, label %Flow2		; IR: br i1 %18, label %exit1, label %Flow2

; IR: exit1:		; IR: exit1:
; IR-NEXT: store volatile i32 9, i32 addrspace(1)* undef		; IR-NEXT: store volatile i32 9, i32 addrspace(1)* undef
; IR-NEXT: call void @llvm.amdgcn.unreachable()		; IR-NEXT: call void @llvm.amdgcn.unreachable()
; IR-NEXT: br label %Flow2		; IR-NEXT: br label %Flow2

; IR: UnifiedReturnBlock:		; IR: UnifiedReturnBlock:
; IR-NEXT: call void @llvm.amdgcn.end.cf(i64 %14)		; IR-NEXT: call void @llvm.amdgcn.end.cf.i64(i64 %14)
; IR-NEXT: ret void		; IR-NEXT: ret void
define amdgpu_kernel void @multi_divergent_region_exit_ret_unreachable(i32 addrspace(1)* nocapture %arg0, i32 addrspace(1)* nocapture %arg1, i32 addrspace(1)* nocapture %arg2) #0 {		define amdgpu_kernel void @multi_divergent_region_exit_ret_unreachable(i32 addrspace(1)* nocapture %arg0, i32 addrspace(1)* nocapture %arg1, i32 addrspace(1)* nocapture %arg2) #0 {
entry:		entry:
%tmp = tail call i32 @llvm.amdgcn.workitem.id.x() #1		%tmp = tail call i32 @llvm.amdgcn.workitem.id.x() #1
%tmp1 = add i32 0, %tmp		%tmp1 = add i32 0, %tmp
%tmp2 = zext i32 %tmp1 to i64		%tmp2 = zext i32 %tmp1 to i64
%tmp3 = add i64 0, %tmp2		%tmp3 = add i64 0, %tmp2
%tmp4 = shl i64 %tmp3, 32		%tmp4 = shl i64 %tmp3, 32
Show All 39 Lines
; IR: indirect.exit1:		; IR: indirect.exit1:
; IR: %load = load volatile i32, i32 addrspace(1)* undef		; IR: %load = load volatile i32, i32 addrspace(1)* undef
; IR: store volatile i32 %load, i32 addrspace(1)* undef		; IR: store volatile i32 %load, i32 addrspace(1)* undef
; IR: store volatile i32 9, i32 addrspace(1)* undef		; IR: store volatile i32 9, i32 addrspace(1)* undef
; IR: call void @llvm.amdgcn.unreachable()		; IR: call void @llvm.amdgcn.unreachable()
; IR-NEXT: br label %Flow2		; IR-NEXT: br label %Flow2

; IR: UnifiedReturnBlock: ; preds = %exit0, %Flow2		; IR: UnifiedReturnBlock: ; preds = %exit0, %Flow2
; IR-NEXT: call void @llvm.amdgcn.end.cf(i64 %14)		; IR-NEXT: call void @llvm.amdgcn.end.cf.i64(i64 %14)
; IR-NEXT: ret void		; IR-NEXT: ret void
define amdgpu_kernel void @indirect_multi_divergent_region_exit_ret_unreachable(i32 addrspace(1)* nocapture %arg0, i32 addrspace(1)* nocapture %arg1, i32 addrspace(1)* nocapture %arg2) #0 {		define amdgpu_kernel void @indirect_multi_divergent_region_exit_ret_unreachable(i32 addrspace(1)* nocapture %arg0, i32 addrspace(1)* nocapture %arg1, i32 addrspace(1)* nocapture %arg2) #0 {
entry:		entry:
%tmp = tail call i32 @llvm.amdgcn.workitem.id.x() #1		%tmp = tail call i32 @llvm.amdgcn.workitem.id.x() #1
%tmp1 = add i32 0, %tmp		%tmp1 = add i32 0, %tmp
%tmp2 = zext i32 %tmp1 to i64		%tmp2 = zext i32 %tmp1 to i64
%tmp3 = add i64 0, %tmp2		%tmp3 = add i64 0, %tmp2
%tmp4 = shl i64 %tmp3, 32		%tmp4 = shl i64 %tmp3, 32
▲ Show 20 Lines • Show All 138 Lines • ▼ Show 20 Lines
; IR: %8 = phi i1 [ false, %uniform.ret1 ], [ true, %uniform.multi.exit.region ]		; IR: %8 = phi i1 [ false, %uniform.ret1 ], [ true, %uniform.multi.exit.region ]
; IR: br i1 %8, label %uniform.if, label %Flow2		; IR: br i1 %8, label %uniform.if, label %Flow2

; IR: Flow: ; preds = %uniform.then, %uniform.if		; IR: Flow: ; preds = %uniform.then, %uniform.if
; IR: %11 = phi i1 [ %10, %uniform.then ], [ %9, %uniform.if ]		; IR: %11 = phi i1 [ %10, %uniform.then ], [ %9, %uniform.if ]
; IR: br i1 %11, label %uniform.endif, label %uniform.ret0		; IR: br i1 %11, label %uniform.endif, label %uniform.ret0

; IR: UnifiedReturnBlock: ; preds = %Flow3, %Flow2		; IR: UnifiedReturnBlock: ; preds = %Flow3, %Flow2
; IR-NEXT: call void @llvm.amdgcn.end.cf(i64 %6)		; IR-NEXT: call void @llvm.amdgcn.end.cf.i64(i64 %6)
; IR-NEXT: ret void		; IR-NEXT: ret void
define amdgpu_kernel void @uniform_complex_multi_ret_nest_in_divergent_triangle(i32 %arg0) #0 {		define amdgpu_kernel void @uniform_complex_multi_ret_nest_in_divergent_triangle(i32 %arg0) #0 {
entry:		entry:
%id.x = tail call i32 @llvm.amdgcn.workitem.id.x()		%id.x = tail call i32 @llvm.amdgcn.workitem.id.x()
%divergent.cond0 = icmp eq i32 %id.x, 0		%divergent.cond0 = icmp eq i32 %id.x, 0
br i1 %divergent.cond0, label %uniform.multi.exit.region, label %divergent.ret		br i1 %divergent.cond0, label %uniform.multi.exit.region, label %divergent.ret

uniform.multi.exit.region:		uniform.multi.exit.region:
Show All 29 Lines
}		}

; IR-LABEL: @multi_divergent_unreachable_exit(		; IR-LABEL: @multi_divergent_unreachable_exit(
; IR: UnifiedUnreachableBlock:		; IR: UnifiedUnreachableBlock:
; IR-NEXT: call void @llvm.amdgcn.unreachable()		; IR-NEXT: call void @llvm.amdgcn.unreachable()
; IR-NEXT: br label %UnifiedReturnBlock		; IR-NEXT: br label %UnifiedReturnBlock

; IR: UnifiedReturnBlock:		; IR: UnifiedReturnBlock:
; IR-NEXT: call void @llvm.amdgcn.end.cf(i64		; IR-NEXT: call void @llvm.amdgcn.end.cf.i64(i64
; IR-NEXT: ret void		; IR-NEXT: ret void
define amdgpu_kernel void @multi_divergent_unreachable_exit() #0 {		define amdgpu_kernel void @multi_divergent_unreachable_exit() #0 {
bb:		bb:
%tmp = tail call i32 @llvm.amdgcn.workitem.id.x()		%tmp = tail call i32 @llvm.amdgcn.workitem.id.x()
switch i32 %tmp, label %bb3 [		switch i32 %tmp, label %bb3 [
i32 2, label %bb1		i32 2, label %bb1
i32 0, label %bb2		i32 0, label %bb2
]		]
Show All 23 Lines

test/CodeGen/AMDGPU/multilevel-break.ll

	; RUN: opt -S -mtriple=amdgcn-- -structurizecfg -si-annotate-control-flow < %s \| FileCheck -check-prefix=OPT %s			; RUN: opt -S -mtriple=amdgcn-- -structurizecfg -si-annotate-control-flow < %s \| FileCheck -check-prefix=OPT %s
	; RUN: llc -march=amdgcn -verify-machineinstrs < %s \| FileCheck -check-prefix=GCN %s			; RUN: llc -march=amdgcn -verify-machineinstrs < %s \| FileCheck -check-prefix=GCN %s

	; OPT-LABEL: {{^}}define amdgpu_vs void @multi_else_break(			; OPT-LABEL: {{^}}define amdgpu_vs void @multi_else_break(
	; OPT: main_body:			; OPT: main_body:
	; OPT: LOOP.outer:			; OPT: LOOP.outer:
	; OPT: LOOP:			; OPT: LOOP:
	; OPT: [[if:%[0-9]+]] = call { i1, i64 } @llvm.amdgcn.if(			; OPT: [[if:%[0-9]+]] = call { i1, i64 } @llvm.amdgcn.if.i64(
	; OPT: [[if_exec:%[0-9]+]] = extractvalue { i1, i64 } [[if]], 1			; OPT: [[if_exec:%[0-9]+]] = extractvalue { i1, i64 } [[if]], 1
	;			;
	; OPT: Flow:			; OPT: Flow:
	;			;
	; Ensure two if.break calls, for both the inner and outer loops			; Ensure two if.break calls, for both the inner and outer loops

	; OPT: call void @llvm.amdgcn.end.cf			; OPT: call void @llvm.amdgcn.end.cf
	; OPT-NEXT: call i64 @llvm.amdgcn.if.break(i1			; OPT-NEXT: call i64 @llvm.amdgcn.if.break.i64.i64(i1
	; OPT-NEXT: call i1 @llvm.amdgcn.loop(i64			; OPT-NEXT: call i1 @llvm.amdgcn.loop.i64(i64
	; OPT-NEXT: call i64 @llvm.amdgcn.if.break(i1			; OPT-NEXT: call i64 @llvm.amdgcn.if.break.i64.i64(i1
	;			;
	; OPT: Flow1:			; OPT: Flow1:

	; GCN-LABEL: {{^}}multi_else_break:			; GCN-LABEL: {{^}}multi_else_break:

	; GCN: ; %main_body			; GCN: ; %main_body
	; GCN: s_mov_b64 [[LEFT_OUTER:s\[[0-9]+:[0-9]+\]]], 0{{$}}			; GCN: s_mov_b64 [[LEFT_OUTER:s\[[0-9]+:[0-9]+\]]], 0{{$}}

	▲ Show 20 Lines • Show All 134 Lines • Show Last 20 Lines

test/CodeGen/AMDGPU/nested-loop-conditions.ll

	; RUN: opt -mtriple=amdgcn-- -S -structurizecfg -si-annotate-control-flow %s \| FileCheck -check-prefix=IR %s			; RUN: opt -mtriple=amdgcn-- -S -structurizecfg -si-annotate-control-flow %s \| FileCheck -check-prefix=IR %s
	; RUN: llc -march=amdgcn -mcpu=hawaii -verify-machineinstrs < %s \| FileCheck -check-prefix=GCN %s			; RUN: llc -march=amdgcn -mcpu=hawaii -verify-machineinstrs < %s \| FileCheck -check-prefix=GCN %s

	; After structurizing, there are 3 levels of loops. The i1 phi			; After structurizing, there are 3 levels of loops. The i1 phi
	; conditions mutually depend on each other, so it isn't safe to delete			; conditions mutually depend on each other, so it isn't safe to delete
	; the condition that appears to have no uses until the loop is			; the condition that appears to have no uses until the loop is
	; completely processed.			; completely processed.


	; IR-LABEL: @reduced_nested_loop_conditions(			; IR-LABEL: @reduced_nested_loop_conditions(

	; IR: bb5:			; IR: bb5:
	; IR-NEXT: %phi.broken = phi i64 [ %3, %bb10 ], [ 0, %bb ]			; IR-NEXT: %phi.broken = phi i64 [ %3, %bb10 ], [ 0, %bb ]
	; IR-NEXT: %tmp6 = phi i32 [ 0, %bb ], [ %tmp11, %bb10 ]			; IR-NEXT: %tmp6 = phi i32 [ 0, %bb ], [ %tmp11, %bb10 ]
	; IR-NEXT: %tmp7 = icmp eq i32 %tmp6, 1			; IR-NEXT: %tmp7 = icmp eq i32 %tmp6, 1
	; IR-NEXT: %0 = call { i1, i64 } @llvm.amdgcn.if(i1 %tmp7)			; IR-NEXT: %0 = call { i1, i64 } @llvm.amdgcn.if.i64(i1 %tmp7)
	; IR-NEXT: %1 = extractvalue { i1, i64 } %0, 0			; IR-NEXT: %1 = extractvalue { i1, i64 } %0, 0
	; IR-NEXT: %2 = extractvalue { i1, i64 } %0, 1			; IR-NEXT: %2 = extractvalue { i1, i64 } %0, 1
	; IR-NEXT: br i1 %1, label %bb8, label %Flow			; IR-NEXT: br i1 %1, label %bb8, label %Flow

	; IR: bb8:			; IR: bb8:
	; IR-NEXT: br label %bb13			; IR-NEXT: br label %bb13

	; IR: bb10:			; IR: bb10:
	; IR-NEXT: %tmp11 = phi i32 [ %6, %Flow ]			; IR-NEXT: %tmp11 = phi i32 [ %6, %Flow ]
	; IR-NEXT: %tmp12 = phi i1 [ %5, %Flow ]			; IR-NEXT: %tmp12 = phi i1 [ %5, %Flow ]
	; IR-NEXT: %3 = call i64 @llvm.amdgcn.if.break(i1 %tmp12, i64 %phi.broken)			; IR-NEXT: %3 = call i64 @llvm.amdgcn.if.break.i64.i64(i1 %tmp12, i64 %phi.broken)
	; IR-NEXT: %4 = call i1 @llvm.amdgcn.loop(i64 %3)			; IR-NEXT: %4 = call i1 @llvm.amdgcn.loop.i64(i64 %3)
	; IR-NEXT: br i1 %4, label %bb23, label %bb5			; IR-NEXT: br i1 %4, label %bb23, label %bb5

	; IR: Flow:			; IR: Flow:
	; IR-NEXT: %5 = phi i1 [ %tmp22, %bb4 ], [ true, %bb5 ]			; IR-NEXT: %5 = phi i1 [ %tmp22, %bb4 ], [ true, %bb5 ]
	; IR-NEXT: %6 = phi i32 [ %tmp21, %bb4 ], [ undef, %bb5 ]			; IR-NEXT: %6 = phi i32 [ %tmp21, %bb4 ], [ undef, %bb5 ]
	; IR-NEXT: call void @llvm.amdgcn.end.cf(i64 %2)			; IR-NEXT: call void @llvm.amdgcn.end.cf.i64(i64 %2)
	; IR-NEXT: br label %bb10			; IR-NEXT: br label %bb10

	; IR: bb13:			; IR: bb13:
	; IR-NEXT: %tmp14 = phi i1 [ %tmp22, %bb3 ], [ true, %bb8 ]			; IR-NEXT: %tmp14 = phi i1 [ %tmp22, %bb3 ], [ true, %bb8 ]
	; IR-NEXT: %tmp15 = bitcast i64 %tmp2 to <2 x i32>			; IR-NEXT: %tmp15 = bitcast i64 %tmp2 to <2 x i32>
	; IR-NEXT: br i1 %tmp14, label %bb16, label %bb20			; IR-NEXT: br i1 %tmp14, label %bb16, label %bb20

	; IR: bb16:			; IR: bb16:
	; IR-NEXT: %tmp17 = extractelement <2 x i32> %tmp15, i64 1			; IR-NEXT: %tmp17 = extractelement <2 x i32> %tmp15, i64 1
	; IR-NEXT: %tmp18 = getelementptr inbounds i32, i32 addrspace(3)* undef, i32 %tmp17			; IR-NEXT: %tmp18 = getelementptr inbounds i32, i32 addrspace(3)* undef, i32 %tmp17
	; IR-NEXT: %tmp19 = load volatile i32, i32 addrspace(3)* %tmp18			; IR-NEXT: %tmp19 = load volatile i32, i32 addrspace(3)* %tmp18
	; IR-NEXT: br label %bb20			; IR-NEXT: br label %bb20

	; IR: bb20:			; IR: bb20:
	; IR-NEXT: %tmp21 = phi i32 [ %tmp19, %bb16 ], [ 0, %bb13 ]			; IR-NEXT: %tmp21 = phi i32 [ %tmp19, %bb16 ], [ 0, %bb13 ]
	; IR-NEXT: %tmp22 = phi i1 [ false, %bb16 ], [ %tmp14, %bb13 ]			; IR-NEXT: %tmp22 = phi i1 [ false, %bb16 ], [ %tmp14, %bb13 ]
	; IR-NEXT: br label %bb9			; IR-NEXT: br label %bb9

	; IR: bb23:			; IR: bb23:
	; IR-NEXT: call void @llvm.amdgcn.end.cf(i64 %3)			; IR-NEXT: call void @llvm.amdgcn.end.cf.i64(i64 %3)
	; IR-NEXT: ret void			; IR-NEXT: ret void

	; GCN-LABEL: {{^}}reduced_nested_loop_conditions:			; GCN-LABEL: {{^}}reduced_nested_loop_conditions:

	; GCN: s_cmp_lg_u32 s{{[0-9]+}}, 1			; GCN: s_cmp_lg_u32 s{{[0-9]+}}, 1
	; GCN-NEXT: s_cbranch_scc0			; GCN-NEXT: s_cbranch_scc0

	; FIXME: Should fold to unconditional branch?			; FIXME: Should fold to unconditional branch?
	▲ Show 20 Lines • Show All 53 Lines • ▼ Show 20 Lines
	bb23: ; preds = %bb10			bb23: ; preds = %bb10
	ret void			ret void
	}			}

	; Earlier version of above, before a run of the structurizer.			; Earlier version of above, before a run of the structurizer.
	; IR-LABEL: @nested_loop_conditions(			; IR-LABEL: @nested_loop_conditions(

	; IR: Flow3:			; IR: Flow3:
	; IR-NEXT: call void @llvm.amdgcn.end.cf(i64 %21)			; IR-NEXT: call void @llvm.amdgcn.end.cf.i64(i64 %21)
	; IR-NEXT: %0 = call { i1, i64 } @llvm.amdgcn.if(i1 %14)			; IR-NEXT: %0 = call { i1, i64 } @llvm.amdgcn.if.i64(i1 %14)
	; IR-NEXT: %1 = extractvalue { i1, i64 } %0, 0			; IR-NEXT: %1 = extractvalue { i1, i64 } %0, 0
	; IR-NEXT: %2 = extractvalue { i1, i64 } %0, 1			; IR-NEXT: %2 = extractvalue { i1, i64 } %0, 1
	; IR-NEXT: br i1 %1, label %bb4.bb13_crit_edge, label %Flow4			; IR-NEXT: br i1 %1, label %bb4.bb13_crit_edge, label %Flow4

	; IR: Flow4:			; IR: Flow4:
	; IR-NEXT: %3 = phi i1 [ true, %bb4.bb13_crit_edge ], [ false, %Flow3 ]			; IR-NEXT: %3 = phi i1 [ true, %bb4.bb13_crit_edge ], [ false, %Flow3 ]
	; IR-NEXT: call void @llvm.amdgcn.end.cf(i64 %2)			; IR-NEXT: call void @llvm.amdgcn.end.cf.i64(i64 %2)
	; IR-NEXT: br label %Flow			; IR-NEXT: br label %Flow

	; IR: Flow:			; IR: Flow:
	; IR-NEXT: %4 = phi i1 [ %3, %Flow4 ], [ true, %bb ]			; IR-NEXT: %4 = phi i1 [ %3, %Flow4 ], [ true, %bb ]
	; IR-NEXT: %5 = call { i1, i64 } @llvm.amdgcn.if(i1 %4)			; IR-NEXT: %5 = call { i1, i64 } @llvm.amdgcn.if.i64(i1 %4)
	; IR-NEXT: %6 = extractvalue { i1, i64 } %5, 0			; IR-NEXT: %6 = extractvalue { i1, i64 } %5, 0
	; IR-NEXT: %7 = extractvalue { i1, i64 } %5, 1			; IR-NEXT: %7 = extractvalue { i1, i64 } %5, 1
	; IR-NEXT: br i1 %6, label %bb13, label %bb31			; IR-NEXT: br i1 %6, label %bb13, label %bb31

	; IR: bb14:			; IR: bb14:
	; IR: %tmp15 = icmp eq i32 %tmp1037, 1			; IR: %tmp15 = icmp eq i32 %tmp1037, 1
	; IR-NEXT: %8 = call { i1, i64 } @llvm.amdgcn.if(i1 %tmp15)			; IR-NEXT: %8 = call { i1, i64 } @llvm.amdgcn.if.i64(i1 %tmp15)

	; IR: Flow1:			; IR: Flow1:
	; IR-NEXT: %11 = phi <4 x i32> [ %tmp9, %bb21 ], [ undef, %bb14 ]			; IR-NEXT: %11 = phi <4 x i32> [ %tmp9, %bb21 ], [ undef, %bb14 ]
	; IR-NEXT: %12 = phi i32 [ %tmp10, %bb21 ], [ undef, %bb14 ]			; IR-NEXT: %12 = phi i32 [ %tmp10, %bb21 ], [ undef, %bb14 ]
	; IR-NEXT: %13 = phi i1 [ %18, %bb21 ], [ true, %bb14 ]			; IR-NEXT: %13 = phi i1 [ %18, %bb21 ], [ true, %bb14 ]
	; IR-NEXT: %14 = phi i1 [ %18, %bb21 ], [ false, %bb14 ]			; IR-NEXT: %14 = phi i1 [ %18, %bb21 ], [ false, %bb14 ]
	; IR-NEXT: %15 = phi i1 [ false, %bb21 ], [ true, %bb14 ]			; IR-NEXT: %15 = phi i1 [ false, %bb21 ], [ true, %bb14 ]
	; IR-NEXT: call void @llvm.amdgcn.end.cf(i64 %10)			; IR-NEXT: call void @llvm.amdgcn.end.cf.i64(i64 %10)
	; IR-NEXT: %16 = call i64 @llvm.amdgcn.if.break(i1 %13, i64 %phi.broken)			; IR-NEXT: %16 = call i64 @llvm.amdgcn.if.break.i64.i64(i1 %13, i64 %phi.broken)
	; IR-NEXT: %17 = call i1 @llvm.amdgcn.loop(i64 %16)			; IR-NEXT: %17 = call i1 @llvm.amdgcn.loop.i64(i64 %16)
	; IR-NEXT: br i1 %17, label %Flow2, label %bb14			; IR-NEXT: br i1 %17, label %Flow2, label %bb14

	; IR: bb21:			; IR: bb21:
	; IR: %tmp12 = icmp slt i32 %tmp11, 9			; IR: %tmp12 = icmp slt i32 %tmp11, 9
	; IR-NEXT: %18 = xor i1 %tmp12, true			; IR-NEXT: %18 = xor i1 %tmp12, true
	; IR-NEXT: br label %Flow1			; IR-NEXT: br label %Flow1

	; IR: Flow2:			; IR: Flow2:
	; IR-NEXT: call void @llvm.amdgcn.end.cf(i64 %16)			; IR-NEXT: call void @llvm.amdgcn.end.cf.i64(i64 %16)
	; IR-NEXT: %19 = call { i1, i64 } @llvm.amdgcn.if(i1 %15)			; IR-NEXT: %19 = call { i1, i64 } @llvm.amdgcn.if.i64(i1 %15)
	; IR-NEXT: %20 = extractvalue { i1, i64 } %19, 0			; IR-NEXT: %20 = extractvalue { i1, i64 } %19, 0
	; IR-NEXT: %21 = extractvalue { i1, i64 } %19, 1			; IR-NEXT: %21 = extractvalue { i1, i64 } %19, 1
	; IR-NEXT: br i1 %20, label %bb31.loopexit, label %Flow3			; IR-NEXT: br i1 %20, label %bb31.loopexit, label %Flow3

	; IR: bb31:			; IR: bb31:
	; IR-NEXT: call void @llvm.amdgcn.end.cf(i64 %7)			; IR-NEXT: call void @llvm.amdgcn.end.cf.i64(i64 %7)
	; IR-NEXT: store volatile i32 0, i32 addrspace(1)* undef			; IR-NEXT: store volatile i32 0, i32 addrspace(1)* undef
	; IR-NEXT: ret void			; IR-NEXT: ret void


	; GCN-LABEL: {{^}}nested_loop_conditions:			; GCN-LABEL: {{^}}nested_loop_conditions:

	; GCN: v_cmp_lt_i32_e32 vcc, 8, v			; GCN: v_cmp_lt_i32_e32 vcc, 8, v
	; GCN: s_and_b64 vcc, exec, vcc			; GCN: s_and_b64 vcc, exec, vcc
	▲ Show 20 Lines • Show All 88 Lines • Show Last 20 Lines

test/CodeGen/AMDGPU/si-annotate-cf-unreachable.ll

	; RUN: opt -mtriple=amdgcn-- -S -structurizecfg -si-annotate-control-flow %s \| FileCheck -check-prefix=OPT %s			; RUN: opt -mtriple=amdgcn-- -S -structurizecfg -si-annotate-control-flow %s \| FileCheck -check-prefix=OPT %s
	; RUN: llc -march=amdgcn -verify-machineinstrs < %s \| FileCheck -check-prefix=GCN %s			; RUN: llc -march=amdgcn -verify-machineinstrs < %s \| FileCheck -check-prefix=GCN %s


	; OPT-LABEL: @annotate_unreachable(			; OPT-LABEL: @annotate_unreachable(
	; OPT: call { i1, i64 } @llvm.amdgcn.if(			; OPT: call { i1, i64 } @llvm.amdgcn.if.i64(
	; OPT-NOT: call void @llvm.amdgcn.end.cf(			; OPT-NOT: call void @llvm.amdgcn.end.cf


	; GCN-LABEL: {{^}}annotate_unreachable:			; GCN-LABEL: {{^}}annotate_unreachable:
	; GCN: s_and_saveexec_b64			; GCN: s_and_saveexec_b64
	; GCN-NOT: s_endpgm			; GCN-NOT: s_endpgm
	; GCN: .Lfunc_end0			; GCN: .Lfunc_end0
	define amdgpu_kernel void @annotate_unreachable(<4 x float> addrspace(1)* noalias nocapture readonly %arg) #0 {			define amdgpu_kernel void @annotate_unreachable(<4 x float> addrspace(1)* noalias nocapture readonly %arg) #0 {
	bb:			bb:
	Show All 25 Lines

test/CodeGen/AMDGPU/si-annotatecfg-multiple-backedges.ll

	Show All 11 Lines
	; OPT-NEXT: [[TMP2:%.]] = shl nsw i32 [[ARG:%.]], 1			; OPT-NEXT: [[TMP2:%.]] = shl nsw i32 [[ARG:%.]], 1
	; OPT-NEXT: br label [[LOOP:%.*]]			; OPT-NEXT: br label [[LOOP:%.*]]
	; OPT: loop:			; OPT: loop:
	; OPT-NEXT: [[PHI_BROKEN1:%.]] = phi i64 [ [[TMP7:%.]], [[LOOP_END:%.]] ], [ [[PHI_BROKEN1]], [[LOOP]] ], [ 0, [[ENTRY:%.]] ]			; OPT-NEXT: [[PHI_BROKEN1:%.]] = phi i64 [ [[TMP7:%.]], [[LOOP_END:%.]] ], [ [[PHI_BROKEN1]], [[LOOP]] ], [ 0, [[ENTRY:%.]] ]
	; OPT-NEXT: [[PHI_BROKEN:%.]] = phi i64 [ 0, [[LOOP_END]] ], [ [[TMP0:%.]], [[LOOP]] ], [ 0, [[ENTRY]] ]			; OPT-NEXT: [[PHI_BROKEN:%.]] = phi i64 [ 0, [[LOOP_END]] ], [ [[TMP0:%.]], [[LOOP]] ], [ 0, [[ENTRY]] ]
	; OPT-NEXT: [[TMP4:%.]] = phi i32 [ 0, [[ENTRY]] ], [ [[TMP5:%.]], [[LOOP]] ], [ 0, [[LOOP_END]] ]			; OPT-NEXT: [[TMP4:%.]] = phi i32 [ 0, [[ENTRY]] ], [ [[TMP5:%.]], [[LOOP]] ], [ 0, [[LOOP_END]] ]
	; OPT-NEXT: [[TMP5]] = add nsw i32 [[TMP4]], [[TMP]]			; OPT-NEXT: [[TMP5]] = add nsw i32 [[TMP4]], [[TMP]]
	; OPT-NEXT: [[TMP6:%.*]] = icmp slt i32 [[ARG]], [[TMP5]]			; OPT-NEXT: [[TMP6:%.*]] = icmp slt i32 [[ARG]], [[TMP5]]
	; OPT-NEXT: [[TMP0]] = call i64 @llvm.amdgcn.if.break(i1 [[TMP6]], i64 [[PHI_BROKEN]])			; OPT-NEXT: [[TMP0]] = call i64 @llvm.amdgcn.if.break.i64.i64(i1 [[TMP6]], i64 [[PHI_BROKEN]])
	; OPT-NEXT: [[TMP1:%.*]] = call i1 @llvm.amdgcn.loop(i64 [[TMP0]])			; OPT-NEXT: [[TMP1:%.*]] = call i1 @llvm.amdgcn.loop.i64(i64 [[TMP0]])
	; OPT-NEXT: br i1 [[TMP1]], label [[LOOP_END]], label [[LOOP]]			; OPT-NEXT: br i1 [[TMP1]], label [[LOOP_END]], label [[LOOP]]
	; OPT: loop_end:			; OPT: loop_end:
	; OPT-NEXT: call void @llvm.amdgcn.end.cf(i64 [[TMP0]])			; OPT-NEXT: call void @llvm.amdgcn.end.cf.i64(i64 [[TMP0]])
	; OPT-NEXT: [[EXIT:%.*]] = icmp sgt i32 [[TMP5]], [[TMP2]]			; OPT-NEXT: [[EXIT:%.*]] = icmp sgt i32 [[TMP5]], [[TMP2]]
	; OPT-NEXT: [[TMP7]] = call i64 @llvm.amdgcn.if.break(i1 [[EXIT]], i64 [[PHI_BROKEN1]])			; OPT-NEXT: [[TMP7]] = call i64 @llvm.amdgcn.if.break.i64.i64(i1 [[EXIT]], i64 [[PHI_BROKEN1]])
	; OPT-NEXT: [[TMP3:%.*]] = call i1 @llvm.amdgcn.loop(i64 [[TMP7]])			; OPT-NEXT: [[TMP3:%.*]] = call i1 @llvm.amdgcn.loop.i64(i64 [[TMP7]])
	; OPT-NEXT: br i1 [[TMP3]], label [[LOOP_EXIT:%.*]], label [[LOOP]]			; OPT-NEXT: br i1 [[TMP3]], label [[LOOP_EXIT:%.*]], label [[LOOP]]
	; OPT: loop_exit:			; OPT: loop_exit:
	; OPT-NEXT: call void @llvm.amdgcn.end.cf(i64 [[TMP7]])			; OPT-NEXT: call void @llvm.amdgcn.end.cf.i64(i64 [[TMP7]])
	; OPT-NEXT: [[TMP12:%.*]] = zext i32 [[TMP]] to i64			; OPT-NEXT: [[TMP12:%.*]] = zext i32 [[TMP]] to i64
	; OPT-NEXT: [[TMP13:%.]] = getelementptr inbounds i32, i32 [[ARG1:%.*]], i64 [[TMP12]]			; OPT-NEXT: [[TMP13:%.]] = getelementptr inbounds i32, i32 [[ARG1:%.*]], i64 [[TMP12]]
	; OPT-NEXT: [[TMP14:%.]] = addrspacecast i32 [[TMP13]] to i32 addrspace(1)*			; OPT-NEXT: [[TMP14:%.]] = addrspacecast i32 [[TMP13]] to i32 addrspace(1)*
	; OPT-NEXT: store i32 [[TMP5]], i32 addrspace(1)* [[TMP14]], align 4			; OPT-NEXT: store i32 [[TMP5]], i32 addrspace(1)* [[TMP14]], align 4
	; OPT-NEXT: ret void			; OPT-NEXT: ret void
	;			;
	entry:			entry:
	%tmp = tail call i32 @llvm.amdgcn.workitem.id.x()			%tmp = tail call i32 @llvm.amdgcn.workitem.id.x()
	Show All 23 Lines

test/Transforms/InstCombine/AMDGPU/amdgcn-intrinsics.ll

Show First 20 Lines • Show All 1,622 Lines • ▼ Show 20 Lines	;
%med = call float @llvm.amdgcn.fmed3.f32(float 0.0, float 1.0, float undef)		%med = call float @llvm.amdgcn.fmed3.f32(float 0.0, float 1.0, float undef)
ret float %med		ret float %med
}		}

; --------------------------------------------------------------------		; --------------------------------------------------------------------
; llvm.amdgcn.icmp		; llvm.amdgcn.icmp
; --------------------------------------------------------------------		; --------------------------------------------------------------------

declare i64 @llvm.amdgcn.icmp.i32(i32, i32, i32 immarg) nounwind readnone convergent		declare i64 @llvm.amdgcn.icmp.i64.i32(i32, i32, i32 immarg) nounwind readnone convergent
declare i64 @llvm.amdgcn.icmp.i64(i64, i64, i32 immarg) nounwind readnone convergent		declare i64 @llvm.amdgcn.icmp.i64.i64(i64, i64, i32 immarg) nounwind readnone convergent
declare i64 @llvm.amdgcn.icmp.i1(i1, i1, i32 immarg) nounwind readnone convergent		declare i64 @llvm.amdgcn.icmp.i64.i1(i1, i1, i32 immarg) nounwind readnone convergent

define i64 @invalid_icmp_code(i32 %a, i32 %b) {		define i64 @invalid_icmp_code(i32 %a, i32 %b) {
; CHECK-LABEL: @invalid_icmp_code(		; CHECK-LABEL: @invalid_icmp_code(
; CHECK-NEXT: [[UNDER:%.]] = call i64 @llvm.amdgcn.icmp.i32(i32 [[A:%.]], i32 [[B:%.*]], i32 31)		; CHECK-NEXT: [[UNDER:%.]] = call i64 @llvm.amdgcn.icmp.i64.i32(i32 [[A:%.]], i32 [[B:%.*]], i32 31)
; CHECK-NEXT: [[OVER:%.*]] = call i64 @llvm.amdgcn.icmp.i32(i32 [[A]], i32 [[B]], i32 42)		; CHECK-NEXT: [[OVER:%.*]] = call i64 @llvm.amdgcn.icmp.i64.i32(i32 [[A]], i32 [[B]], i32 42)
; CHECK-NEXT: [[OR:%.*]] = or i64 [[UNDER]], [[OVER]]		; CHECK-NEXT: [[OR:%.*]] = or i64 [[UNDER]], [[OVER]]
; CHECK-NEXT: ret i64 [[OR]]		; CHECK-NEXT: ret i64 [[OR]]
;		;
%under = call i64 @llvm.amdgcn.icmp.i32(i32 %a, i32 %b, i32 31)		%under = call i64 @llvm.amdgcn.icmp.i64.i32(i32 %a, i32 %b, i32 31)
%over = call i64 @llvm.amdgcn.icmp.i32(i32 %a, i32 %b, i32 42)		%over = call i64 @llvm.amdgcn.icmp.i64.i32(i32 %a, i32 %b, i32 42)
%or = or i64 %under, %over		%or = or i64 %under, %over
ret i64 %or		ret i64 %or
}		}

define i64 @icmp_constant_inputs_false() {		define i64 @icmp_constant_inputs_false() {
; CHECK-LABEL: @icmp_constant_inputs_false(		; CHECK-LABEL: @icmp_constant_inputs_false(
; CHECK-NEXT: ret i64 0		; CHECK-NEXT: ret i64 0
;		;
%result = call i64 @llvm.amdgcn.icmp.i32(i32 9, i32 8, i32 32)		%result = call i64 @llvm.amdgcn.icmp.i64.i32(i32 9, i32 8, i32 32)
ret i64 %result		ret i64 %result
}		}

define i64 @icmp_constant_inputs_true() {		define i64 @icmp_constant_inputs_true() {
; CHECK-LABEL: @icmp_constant_inputs_true(		; CHECK-LABEL: @icmp_constant_inputs_true(
; CHECK-NEXT: [[RESULT:%.*]] = call i64 @llvm.read_register.i64(metadata !0) #5		; CHECK-NEXT: [[RESULT:%.*]] = call i64 @llvm.read_register.i64(metadata !0) #5
; CHECK-NEXT: ret i64 [[RESULT]]		; CHECK-NEXT: ret i64 [[RESULT]]
;		;
%result = call i64 @llvm.amdgcn.icmp.i32(i32 9, i32 8, i32 34)		%result = call i64 @llvm.amdgcn.icmp.i64.i32(i32 9, i32 8, i32 34)
ret i64 %result		ret i64 %result
}		}

define i64 @icmp_constant_to_rhs_slt(i32 %x) {		define i64 @icmp_constant_to_rhs_slt(i32 %x) {
; CHECK-LABEL: @icmp_constant_to_rhs_slt(		; CHECK-LABEL: @icmp_constant_to_rhs_slt(
; CHECK-NEXT: [[RESULT:%.]] = call i64 @llvm.amdgcn.icmp.i32(i32 [[X:%.]], i32 9, i32 38)		; CHECK-NEXT: [[RESULT:%.]] = call i64 @llvm.amdgcn.icmp.i64.i32(i32 [[X:%.]], i32 9, i32 38)
; CHECK-NEXT: ret i64 [[RESULT]]		; CHECK-NEXT: ret i64 [[RESULT]]
;		;
%result = call i64 @llvm.amdgcn.icmp.i32(i32 9, i32 %x, i32 40)		%result = call i64 @llvm.amdgcn.icmp.i64.i32(i32 9, i32 %x, i32 40)
ret i64 %result		ret i64 %result
}		}

define i64 @fold_icmp_ne_0_zext_icmp_eq_i32(i32 %a, i32 %b) {		define i64 @fold_icmp_ne_0_zext_icmp_eq_i32(i32 %a, i32 %b) {
; CHECK-LABEL: @fold_icmp_ne_0_zext_icmp_eq_i32(		; CHECK-LABEL: @fold_icmp_ne_0_zext_icmp_eq_i32(
; CHECK-NEXT: [[MASK:%.]] = call i64 @llvm.amdgcn.icmp.i32(i32 [[A:%.]], i32 [[B:%.*]], i32 32)		; CHECK-NEXT: [[MASK:%.]] = call i64 @llvm.amdgcn.icmp.i64.i32(i32 [[A:%.]], i32 [[B:%.*]], i32 32)
; CHECK-NEXT: ret i64 [[MASK]]		; CHECK-NEXT: ret i64 [[MASK]]
;		;
%cmp = icmp eq i32 %a, %b		%cmp = icmp eq i32 %a, %b
%zext.cmp = zext i1 %cmp to i32		%zext.cmp = zext i1 %cmp to i32
%mask = call i64 @llvm.amdgcn.icmp.i32(i32 %zext.cmp, i32 0, i32 33)		%mask = call i64 @llvm.amdgcn.icmp.i64.i32(i32 %zext.cmp, i32 0, i32 33)
ret i64 %mask		ret i64 %mask
}		}

define i64 @fold_icmp_ne_0_zext_icmp_ne_i32(i32 %a, i32 %b) {		define i64 @fold_icmp_ne_0_zext_icmp_ne_i32(i32 %a, i32 %b) {
; CHECK-LABEL: @fold_icmp_ne_0_zext_icmp_ne_i32(		; CHECK-LABEL: @fold_icmp_ne_0_zext_icmp_ne_i32(
; CHECK-NEXT: [[MASK:%.]] = call i64 @llvm.amdgcn.icmp.i32(i32 [[A:%.]], i32 [[B:%.*]], i32 33)		; CHECK-NEXT: [[MASK:%.]] = call i64 @llvm.amdgcn.icmp.i64.i32(i32 [[A:%.]], i32 [[B:%.*]], i32 33)
; CHECK-NEXT: ret i64 [[MASK]]		; CHECK-NEXT: ret i64 [[MASK]]
;		;
%cmp = icmp ne i32 %a, %b		%cmp = icmp ne i32 %a, %b
%zext.cmp = zext i1 %cmp to i32		%zext.cmp = zext i1 %cmp to i32
%mask = call i64 @llvm.amdgcn.icmp.i32(i32 %zext.cmp, i32 0, i32 33)		%mask = call i64 @llvm.amdgcn.icmp.i64.i32(i32 %zext.cmp, i32 0, i32 33)
ret i64 %mask		ret i64 %mask
}		}

define i64 @fold_icmp_ne_0_zext_icmp_sle_i32(i32 %a, i32 %b) {		define i64 @fold_icmp_ne_0_zext_icmp_sle_i32(i32 %a, i32 %b) {
; CHECK-LABEL: @fold_icmp_ne_0_zext_icmp_sle_i32(		; CHECK-LABEL: @fold_icmp_ne_0_zext_icmp_sle_i32(
; CHECK-NEXT: [[MASK:%.]] = call i64 @llvm.amdgcn.icmp.i32(i32 [[A:%.]], i32 [[B:%.*]], i32 41)		; CHECK-NEXT: [[MASK:%.]] = call i64 @llvm.amdgcn.icmp.i64.i32(i32 [[A:%.]], i32 [[B:%.*]], i32 41)
; CHECK-NEXT: ret i64 [[MASK]]		; CHECK-NEXT: ret i64 [[MASK]]
;		;
%cmp = icmp sle i32 %a, %b		%cmp = icmp sle i32 %a, %b
%zext.cmp = zext i1 %cmp to i32		%zext.cmp = zext i1 %cmp to i32
%mask = call i64 @llvm.amdgcn.icmp.i32(i32 %zext.cmp, i32 0, i32 33)		%mask = call i64 @llvm.amdgcn.icmp.i64.i32(i32 %zext.cmp, i32 0, i32 33)
ret i64 %mask		ret i64 %mask
}		}

define i64 @fold_icmp_ne_0_zext_icmp_ugt_i64(i64 %a, i64 %b) {		define i64 @fold_icmp_ne_0_zext_icmp_ugt_i64(i64 %a, i64 %b) {
; CHECK-LABEL: @fold_icmp_ne_0_zext_icmp_ugt_i64(		; CHECK-LABEL: @fold_icmp_ne_0_zext_icmp_ugt_i64(
; CHECK-NEXT: [[MASK:%.]] = call i64 @llvm.amdgcn.icmp.i64(i64 [[A:%.]], i64 [[B:%.*]], i32 34)		; CHECK-NEXT: [[MASK:%.]] = call i64 @llvm.amdgcn.icmp.i64.i64(i64 [[A:%.]], i64 [[B:%.*]], i32 34)
; CHECK-NEXT: ret i64 [[MASK]]		; CHECK-NEXT: ret i64 [[MASK]]
;		;
%cmp = icmp ugt i64 %a, %b		%cmp = icmp ugt i64 %a, %b
%zext.cmp = zext i1 %cmp to i32		%zext.cmp = zext i1 %cmp to i32
%mask = call i64 @llvm.amdgcn.icmp.i32(i32 %zext.cmp, i32 0, i32 33)		%mask = call i64 @llvm.amdgcn.icmp.i64.i32(i32 %zext.cmp, i32 0, i32 33)
ret i64 %mask		ret i64 %mask
}		}

define i64 @fold_icmp_ne_0_zext_icmp_ult_swap_i64(i64 %a, i64 %b) {		define i64 @fold_icmp_ne_0_zext_icmp_ult_swap_i64(i64 %a, i64 %b) {
; CHECK-LABEL: @fold_icmp_ne_0_zext_icmp_ult_swap_i64(		; CHECK-LABEL: @fold_icmp_ne_0_zext_icmp_ult_swap_i64(
; CHECK-NEXT: [[MASK:%.]] = call i64 @llvm.amdgcn.icmp.i64(i64 [[A:%.]], i64 [[B:%.*]], i32 34)		; CHECK-NEXT: [[MASK:%.]] = call i64 @llvm.amdgcn.icmp.i64.i64(i64 [[A:%.]], i64 [[B:%.*]], i32 34)
; CHECK-NEXT: ret i64 [[MASK]]		; CHECK-NEXT: ret i64 [[MASK]]
;		;
%cmp = icmp ugt i64 %a, %b		%cmp = icmp ugt i64 %a, %b
%zext.cmp = zext i1 %cmp to i32		%zext.cmp = zext i1 %cmp to i32
%mask = call i64 @llvm.amdgcn.icmp.i32(i32 0, i32 %zext.cmp, i32 33)		%mask = call i64 @llvm.amdgcn.icmp.i64.i32(i32 0, i32 %zext.cmp, i32 33)
ret i64 %mask		ret i64 %mask
}		}

define i64 @fold_icmp_ne_0_zext_fcmp_oeq_f32(float %a, float %b) {		define i64 @fold_icmp_ne_0_zext_fcmp_oeq_f32(float %a, float %b) {
; CHECK-LABEL: @fold_icmp_ne_0_zext_fcmp_oeq_f32(		; CHECK-LABEL: @fold_icmp_ne_0_zext_fcmp_oeq_f32(
; CHECK-NEXT: [[MASK:%.]] = call i64 @llvm.amdgcn.fcmp.f32(float [[A:%.]], float [[B:%.*]], i32 1)		; CHECK-NEXT: [[MASK:%.]] = call i64 @llvm.amdgcn.fcmp.i64.f32(float [[A:%.]], float [[B:%.*]], i32 1)
; CHECK-NEXT: ret i64 [[MASK]]		; CHECK-NEXT: ret i64 [[MASK]]
;		;
%cmp = fcmp oeq float %a, %b		%cmp = fcmp oeq float %a, %b
%zext.cmp = zext i1 %cmp to i32		%zext.cmp = zext i1 %cmp to i32
%mask = call i64 @llvm.amdgcn.icmp.i32(i32 %zext.cmp, i32 0, i32 33)		%mask = call i64 @llvm.amdgcn.icmp.i64.i32(i32 %zext.cmp, i32 0, i32 33)
ret i64 %mask		ret i64 %mask
}		}

define i64 @fold_icmp_ne_0_zext_fcmp_une_f32(float %a, float %b) {		define i64 @fold_icmp_ne_0_zext_fcmp_une_f32(float %a, float %b) {
; CHECK-LABEL: @fold_icmp_ne_0_zext_fcmp_une_f32(		; CHECK-LABEL: @fold_icmp_ne_0_zext_fcmp_une_f32(
; CHECK-NEXT: [[MASK:%.]] = call i64 @llvm.amdgcn.fcmp.f32(float [[A:%.]], float [[B:%.*]], i32 14)		; CHECK-NEXT: [[MASK:%.]] = call i64 @llvm.amdgcn.fcmp.i64.f32(float [[A:%.]], float [[B:%.*]], i32 14)
; CHECK-NEXT: ret i64 [[MASK]]		; CHECK-NEXT: ret i64 [[MASK]]
;		;
%cmp = fcmp une float %a, %b		%cmp = fcmp une float %a, %b
%zext.cmp = zext i1 %cmp to i32		%zext.cmp = zext i1 %cmp to i32
%mask = call i64 @llvm.amdgcn.icmp.i32(i32 %zext.cmp, i32 0, i32 33)		%mask = call i64 @llvm.amdgcn.icmp.i64.i32(i32 %zext.cmp, i32 0, i32 33)
ret i64 %mask		ret i64 %mask
}		}

define i64 @fold_icmp_ne_0_zext_fcmp_olt_f64(double %a, double %b) {		define i64 @fold_icmp_ne_0_zext_fcmp_olt_f64(double %a, double %b) {
; CHECK-LABEL: @fold_icmp_ne_0_zext_fcmp_olt_f64(		; CHECK-LABEL: @fold_icmp_ne_0_zext_fcmp_olt_f64(
; CHECK-NEXT: [[MASK:%.]] = call i64 @llvm.amdgcn.fcmp.f64(double [[A:%.]], double [[B:%.*]], i32 4)		; CHECK-NEXT: [[MASK:%.]] = call i64 @llvm.amdgcn.fcmp.i64.f64(double [[A:%.]], double [[B:%.*]], i32 4)
; CHECK-NEXT: ret i64 [[MASK]]		; CHECK-NEXT: ret i64 [[MASK]]
;		;
%cmp = fcmp olt double %a, %b		%cmp = fcmp olt double %a, %b
%zext.cmp = zext i1 %cmp to i32		%zext.cmp = zext i1 %cmp to i32
%mask = call i64 @llvm.amdgcn.icmp.i32(i32 %zext.cmp, i32 0, i32 33)		%mask = call i64 @llvm.amdgcn.icmp.i64.i32(i32 %zext.cmp, i32 0, i32 33)
ret i64 %mask		ret i64 %mask
}		}

define i64 @fold_icmp_sext_icmp_ne_0_i32(i32 %a, i32 %b) {		define i64 @fold_icmp_sext_icmp_ne_0_i32(i32 %a, i32 %b) {
; CHECK-LABEL: @fold_icmp_sext_icmp_ne_0_i32(		; CHECK-LABEL: @fold_icmp_sext_icmp_ne_0_i32(
; CHECK-NEXT: [[MASK:%.]] = call i64 @llvm.amdgcn.icmp.i32(i32 [[A:%.]], i32 [[B:%.*]], i32 32)		; CHECK-NEXT: [[MASK:%.]] = call i64 @llvm.amdgcn.icmp.i64.i32(i32 [[A:%.]], i32 [[B:%.*]], i32 32)
; CHECK-NEXT: ret i64 [[MASK]]		; CHECK-NEXT: ret i64 [[MASK]]
;		;
%cmp = icmp eq i32 %a, %b		%cmp = icmp eq i32 %a, %b
%sext.cmp = sext i1 %cmp to i32		%sext.cmp = sext i1 %cmp to i32
%mask = call i64 @llvm.amdgcn.icmp.i32(i32 %sext.cmp, i32 0, i32 33)		%mask = call i64 @llvm.amdgcn.icmp.i64.i32(i32 %sext.cmp, i32 0, i32 33)
ret i64 %mask		ret i64 %mask
}		}

define i64 @fold_icmp_eq_0_zext_icmp_eq_i32(i32 %a, i32 %b) {		define i64 @fold_icmp_eq_0_zext_icmp_eq_i32(i32 %a, i32 %b) {
; CHECK-LABEL: @fold_icmp_eq_0_zext_icmp_eq_i32(		; CHECK-LABEL: @fold_icmp_eq_0_zext_icmp_eq_i32(
; CHECK-NEXT: [[MASK:%.]] = call i64 @llvm.amdgcn.icmp.i32(i32 [[A:%.]], i32 [[B:%.*]], i32 33)		; CHECK-NEXT: [[MASK:%.]] = call i64 @llvm.amdgcn.icmp.i64.i32(i32 [[A:%.]], i32 [[B:%.*]], i32 33)
; CHECK-NEXT: ret i64 [[MASK]]		; CHECK-NEXT: ret i64 [[MASK]]
;		;
%cmp = icmp eq i32 %a, %b		%cmp = icmp eq i32 %a, %b
%zext.cmp = zext i1 %cmp to i32		%zext.cmp = zext i1 %cmp to i32
%mask = call i64 @llvm.amdgcn.icmp.i32(i32 %zext.cmp, i32 0, i32 32)		%mask = call i64 @llvm.amdgcn.icmp.i64.i32(i32 %zext.cmp, i32 0, i32 32)
ret i64 %mask		ret i64 %mask
}		}

define i64 @fold_icmp_eq_0_zext_icmp_slt_i32(i32 %a, i32 %b) {		define i64 @fold_icmp_eq_0_zext_icmp_slt_i32(i32 %a, i32 %b) {
; CHECK-LABEL: @fold_icmp_eq_0_zext_icmp_slt_i32(		; CHECK-LABEL: @fold_icmp_eq_0_zext_icmp_slt_i32(
; CHECK-NEXT: [[MASK:%.]] = call i64 @llvm.amdgcn.icmp.i32(i32 [[A:%.]], i32 [[B:%.*]], i32 39)		; CHECK-NEXT: [[MASK:%.]] = call i64 @llvm.amdgcn.icmp.i64.i32(i32 [[A:%.]], i32 [[B:%.*]], i32 39)
; CHECK-NEXT: ret i64 [[MASK]]		; CHECK-NEXT: ret i64 [[MASK]]
;		;
%cmp = icmp slt i32 %a, %b		%cmp = icmp slt i32 %a, %b
%zext.cmp = zext i1 %cmp to i32		%zext.cmp = zext i1 %cmp to i32
%mask = call i64 @llvm.amdgcn.icmp.i32(i32 %zext.cmp, i32 0, i32 32)		%mask = call i64 @llvm.amdgcn.icmp.i64.i32(i32 %zext.cmp, i32 0, i32 32)
ret i64 %mask		ret i64 %mask
}		}

define i64 @fold_icmp_eq_0_zext_fcmp_oeq_f32(float %a, float %b) {		define i64 @fold_icmp_eq_0_zext_fcmp_oeq_f32(float %a, float %b) {
; CHECK-LABEL: @fold_icmp_eq_0_zext_fcmp_oeq_f32(		; CHECK-LABEL: @fold_icmp_eq_0_zext_fcmp_oeq_f32(
; CHECK-NEXT: [[MASK:%.]] = call i64 @llvm.amdgcn.fcmp.f32(float [[A:%.]], float [[B:%.*]], i32 14)		; CHECK-NEXT: [[MASK:%.]] = call i64 @llvm.amdgcn.fcmp.i64.f32(float [[A:%.]], float [[B:%.*]], i32 14)
; CHECK-NEXT: ret i64 [[MASK]]		; CHECK-NEXT: ret i64 [[MASK]]
;		;
%cmp = fcmp oeq float %a, %b		%cmp = fcmp oeq float %a, %b
%zext.cmp = zext i1 %cmp to i32		%zext.cmp = zext i1 %cmp to i32
%mask = call i64 @llvm.amdgcn.icmp.i32(i32 %zext.cmp, i32 0, i32 32)		%mask = call i64 @llvm.amdgcn.icmp.i64.i32(i32 %zext.cmp, i32 0, i32 32)
ret i64 %mask		ret i64 %mask
}		}

define i64 @fold_icmp_eq_0_zext_fcmp_ule_f32(float %a, float %b) {		define i64 @fold_icmp_eq_0_zext_fcmp_ule_f32(float %a, float %b) {
; CHECK-LABEL: @fold_icmp_eq_0_zext_fcmp_ule_f32(		; CHECK-LABEL: @fold_icmp_eq_0_zext_fcmp_ule_f32(
; CHECK-NEXT: [[MASK:%.]] = call i64 @llvm.amdgcn.fcmp.f32(float [[A:%.]], float [[B:%.*]], i32 2)		; CHECK-NEXT: [[MASK:%.]] = call i64 @llvm.amdgcn.fcmp.i64.f32(float [[A:%.]], float [[B:%.*]], i32 2)
; CHECK-NEXT: ret i64 [[MASK]]		; CHECK-NEXT: ret i64 [[MASK]]
;		;
%cmp = fcmp ule float %a, %b		%cmp = fcmp ule float %a, %b
%zext.cmp = zext i1 %cmp to i32		%zext.cmp = zext i1 %cmp to i32
%mask = call i64 @llvm.amdgcn.icmp.i32(i32 %zext.cmp, i32 0, i32 32)		%mask = call i64 @llvm.amdgcn.icmp.i64.i32(i32 %zext.cmp, i32 0, i32 32)
ret i64 %mask		ret i64 %mask
}		}

define i64 @fold_icmp_eq_0_zext_fcmp_ogt_f32(float %a, float %b) {		define i64 @fold_icmp_eq_0_zext_fcmp_ogt_f32(float %a, float %b) {
; CHECK-LABEL: @fold_icmp_eq_0_zext_fcmp_ogt_f32(		; CHECK-LABEL: @fold_icmp_eq_0_zext_fcmp_ogt_f32(
; CHECK-NEXT: [[MASK:%.]] = call i64 @llvm.amdgcn.fcmp.f32(float [[A:%.]], float [[B:%.*]], i32 13)		; CHECK-NEXT: [[MASK:%.]] = call i64 @llvm.amdgcn.fcmp.i64.f32(float [[A:%.]], float [[B:%.*]], i32 13)
; CHECK-NEXT: ret i64 [[MASK]]		; CHECK-NEXT: ret i64 [[MASK]]
;		;
%cmp = fcmp ogt float %a, %b		%cmp = fcmp ogt float %a, %b
%zext.cmp = zext i1 %cmp to i32		%zext.cmp = zext i1 %cmp to i32
%mask = call i64 @llvm.amdgcn.icmp.i32(i32 %zext.cmp, i32 0, i32 32)		%mask = call i64 @llvm.amdgcn.icmp.i64.i32(i32 %zext.cmp, i32 0, i32 32)
ret i64 %mask		ret i64 %mask
}		}

define i64 @fold_icmp_zext_icmp_eq_1_i32(i32 %a, i32 %b) {		define i64 @fold_icmp_zext_icmp_eq_1_i32(i32 %a, i32 %b) {
; CHECK-LABEL: @fold_icmp_zext_icmp_eq_1_i32(		; CHECK-LABEL: @fold_icmp_zext_icmp_eq_1_i32(
; CHECK-NEXT: [[MASK:%.]] = call i64 @llvm.amdgcn.icmp.i32(i32 [[A:%.]], i32 [[B:%.*]], i32 32)		; CHECK-NEXT: [[MASK:%.]] = call i64 @llvm.amdgcn.icmp.i64.i32(i32 [[A:%.]], i32 [[B:%.*]], i32 32)
; CHECK-NEXT: ret i64 [[MASK]]		; CHECK-NEXT: ret i64 [[MASK]]
;		;
%cmp = icmp eq i32 %a, %b		%cmp = icmp eq i32 %a, %b
%zext.cmp = zext i1 %cmp to i32		%zext.cmp = zext i1 %cmp to i32
%mask = call i64 @llvm.amdgcn.icmp.i32(i32 %zext.cmp, i32 1, i32 32)		%mask = call i64 @llvm.amdgcn.icmp.i64.i32(i32 %zext.cmp, i32 1, i32 32)
ret i64 %mask		ret i64 %mask
}		}

define i64 @fold_icmp_zext_argi1_eq_1_i32(i1 %cond) {		define i64 @fold_icmp_zext_argi1_eq_1_i32(i1 %cond) {
; CHECK-LABEL: @fold_icmp_zext_argi1_eq_1_i32(		; CHECK-LABEL: @fold_icmp_zext_argi1_eq_1_i32(
; CHECK-NEXT: [[ZEXT_COND:%.]] = zext i1 [[COND:%.]] to i32		; CHECK-NEXT: [[ZEXT_COND:%.]] = zext i1 [[COND:%.]] to i32
; CHECK-NEXT: [[MASK:%.*]] = call i64 @llvm.amdgcn.icmp.i32(i32 [[ZEXT_COND]], i32 0, i32 33)		; CHECK-NEXT: [[MASK:%.*]] = call i64 @llvm.amdgcn.icmp.i64.i32(i32 [[ZEXT_COND]], i32 0, i32 33)
; CHECK-NEXT: ret i64 [[MASK]]		; CHECK-NEXT: ret i64 [[MASK]]
;		;
%zext.cond = zext i1 %cond to i32		%zext.cond = zext i1 %cond to i32
%mask = call i64 @llvm.amdgcn.icmp.i32(i32 %zext.cond, i32 1, i32 32)		%mask = call i64 @llvm.amdgcn.icmp.i64.i32(i32 %zext.cond, i32 1, i32 32)
ret i64 %mask		ret i64 %mask
}		}

define i64 @fold_icmp_zext_argi1_eq_neg1_i32(i1 %cond) {		define i64 @fold_icmp_zext_argi1_eq_neg1_i32(i1 %cond) {
; CHECK-LABEL: @fold_icmp_zext_argi1_eq_neg1_i32(		; CHECK-LABEL: @fold_icmp_zext_argi1_eq_neg1_i32(
; CHECK-NEXT: [[ZEXT_COND:%.]] = zext i1 [[COND:%.]] to i32		; CHECK-NEXT: [[ZEXT_COND:%.]] = zext i1 [[COND:%.]] to i32
; CHECK-NEXT: [[MASK:%.*]] = call i64 @llvm.amdgcn.icmp.i32(i32 [[ZEXT_COND]], i32 -1, i32 32)		; CHECK-NEXT: [[MASK:%.*]] = call i64 @llvm.amdgcn.icmp.i64.i32(i32 [[ZEXT_COND]], i32 -1, i32 32)
; CHECK-NEXT: ret i64 [[MASK]]		; CHECK-NEXT: ret i64 [[MASK]]
;		;
%zext.cond = zext i1 %cond to i32		%zext.cond = zext i1 %cond to i32
%mask = call i64 @llvm.amdgcn.icmp.i32(i32 %zext.cond, i32 -1, i32 32)		%mask = call i64 @llvm.amdgcn.icmp.i64.i32(i32 %zext.cond, i32 -1, i32 32)
ret i64 %mask		ret i64 %mask
}		}

define i64 @fold_icmp_sext_argi1_eq_1_i32(i1 %cond) {		define i64 @fold_icmp_sext_argi1_eq_1_i32(i1 %cond) {
; CHECK-LABEL: @fold_icmp_sext_argi1_eq_1_i32(		; CHECK-LABEL: @fold_icmp_sext_argi1_eq_1_i32(
; CHECK-NEXT: [[SEXT_COND:%.]] = sext i1 [[COND:%.]] to i32		; CHECK-NEXT: [[SEXT_COND:%.]] = sext i1 [[COND:%.]] to i32
; CHECK-NEXT: [[MASK:%.*]] = call i64 @llvm.amdgcn.icmp.i32(i32 [[SEXT_COND]], i32 1, i32 32)		; CHECK-NEXT: [[MASK:%.*]] = call i64 @llvm.amdgcn.icmp.i64.i32(i32 [[SEXT_COND]], i32 1, i32 32)
; CHECK-NEXT: ret i64 [[MASK]]		; CHECK-NEXT: ret i64 [[MASK]]
;		;
%sext.cond = sext i1 %cond to i32		%sext.cond = sext i1 %cond to i32
%mask = call i64 @llvm.amdgcn.icmp.i32(i32 %sext.cond, i32 1, i32 32)		%mask = call i64 @llvm.amdgcn.icmp.i64.i32(i32 %sext.cond, i32 1, i32 32)
ret i64 %mask		ret i64 %mask
}		}

define i64 @fold_icmp_sext_argi1_eq_neg1_i32(i1 %cond) {		define i64 @fold_icmp_sext_argi1_eq_neg1_i32(i1 %cond) {
; CHECK-LABEL: @fold_icmp_sext_argi1_eq_neg1_i32(		; CHECK-LABEL: @fold_icmp_sext_argi1_eq_neg1_i32(
; CHECK-NEXT: [[SEXT_COND:%.]] = sext i1 [[COND:%.]] to i32		; CHECK-NEXT: [[SEXT_COND:%.]] = sext i1 [[COND:%.]] to i32
; CHECK-NEXT: [[MASK:%.*]] = call i64 @llvm.amdgcn.icmp.i32(i32 [[SEXT_COND]], i32 0, i32 33)		; CHECK-NEXT: [[MASK:%.*]] = call i64 @llvm.amdgcn.icmp.i64.i32(i32 [[SEXT_COND]], i32 0, i32 33)
; CHECK-NEXT: ret i64 [[MASK]]		; CHECK-NEXT: ret i64 [[MASK]]
;		;
%sext.cond = sext i1 %cond to i32		%sext.cond = sext i1 %cond to i32
%mask = call i64 @llvm.amdgcn.icmp.i32(i32 %sext.cond, i32 -1, i32 32)		%mask = call i64 @llvm.amdgcn.icmp.i64.i32(i32 %sext.cond, i32 -1, i32 32)
ret i64 %mask		ret i64 %mask
}		}

define i64 @fold_icmp_sext_argi1_eq_neg1_i64(i1 %cond) {		define i64 @fold_icmp_sext_argi1_eq_neg1_i64(i1 %cond) {
; CHECK-LABEL: @fold_icmp_sext_argi1_eq_neg1_i64(		; CHECK-LABEL: @fold_icmp_sext_argi1_eq_neg1_i64(
; CHECK-NEXT: [[SEXT_COND:%.]] = sext i1 [[COND:%.]] to i64		; CHECK-NEXT: [[SEXT_COND:%.]] = sext i1 [[COND:%.]] to i64
; CHECK-NEXT: [[MASK:%.*]] = call i64 @llvm.amdgcn.icmp.i64(i64 [[SEXT_COND]], i64 0, i32 33)		; CHECK-NEXT: [[MASK:%.*]] = call i64 @llvm.amdgcn.icmp.i64.i64(i64 [[SEXT_COND]], i64 0, i32 33)
; CHECK-NEXT: ret i64 [[MASK]]		; CHECK-NEXT: ret i64 [[MASK]]
;		;
%sext.cond = sext i1 %cond to i64		%sext.cond = sext i1 %cond to i64
%mask = call i64 @llvm.amdgcn.icmp.i64(i64 %sext.cond, i64 -1, i32 32)		%mask = call i64 @llvm.amdgcn.icmp.i64.i64(i64 %sext.cond, i64 -1, i32 32)
ret i64 %mask		ret i64 %mask
}		}

; TODO: Should be able to fold to false		; TODO: Should be able to fold to false
define i64 @fold_icmp_sext_icmp_eq_1_i32(i32 %a, i32 %b) {		define i64 @fold_icmp_sext_icmp_eq_1_i32(i32 %a, i32 %b) {
; CHECK-LABEL: @fold_icmp_sext_icmp_eq_1_i32(		; CHECK-LABEL: @fold_icmp_sext_icmp_eq_1_i32(
; CHECK-NEXT: [[CMP:%.]] = icmp eq i32 [[A:%.]], [[B:%.*]]		; CHECK-NEXT: [[CMP:%.]] = icmp eq i32 [[A:%.]], [[B:%.*]]
; CHECK-NEXT: [[SEXT_CMP:%.*]] = sext i1 [[CMP]] to i32		; CHECK-NEXT: [[SEXT_CMP:%.*]] = sext i1 [[CMP]] to i32
; CHECK-NEXT: [[MASK:%.*]] = call i64 @llvm.amdgcn.icmp.i32(i32 [[SEXT_CMP]], i32 1, i32 32)		; CHECK-NEXT: [[MASK:%.*]] = call i64 @llvm.amdgcn.icmp.i64.i32(i32 [[SEXT_CMP]], i32 1, i32 32)
; CHECK-NEXT: ret i64 [[MASK]]		; CHECK-NEXT: ret i64 [[MASK]]
;		;
%cmp = icmp eq i32 %a, %b		%cmp = icmp eq i32 %a, %b
%sext.cmp = sext i1 %cmp to i32		%sext.cmp = sext i1 %cmp to i32
%mask = call i64 @llvm.amdgcn.icmp.i32(i32 %sext.cmp, i32 1, i32 32)		%mask = call i64 @llvm.amdgcn.icmp.i64.i32(i32 %sext.cmp, i32 1, i32 32)
ret i64 %mask		ret i64 %mask
}		}

define i64 @fold_icmp_sext_icmp_eq_neg1_i32(i32 %a, i32 %b) {		define i64 @fold_icmp_sext_icmp_eq_neg1_i32(i32 %a, i32 %b) {
; CHECK-LABEL: @fold_icmp_sext_icmp_eq_neg1_i32(		; CHECK-LABEL: @fold_icmp_sext_icmp_eq_neg1_i32(
; CHECK-NEXT: [[MASK:%.]] = call i64 @llvm.amdgcn.icmp.i32(i32 [[A:%.]], i32 [[B:%.*]], i32 32)		; CHECK-NEXT: [[MASK:%.]] = call i64 @llvm.amdgcn.icmp.i64.i32(i32 [[A:%.]], i32 [[B:%.*]], i32 32)
; CHECK-NEXT: ret i64 [[MASK]]		; CHECK-NEXT: ret i64 [[MASK]]
;		;
%cmp = icmp eq i32 %a, %b		%cmp = icmp eq i32 %a, %b
%sext.cmp = sext i1 %cmp to i32		%sext.cmp = sext i1 %cmp to i32
%mask = call i64 @llvm.amdgcn.icmp.i32(i32 %sext.cmp, i32 -1, i32 32)		%mask = call i64 @llvm.amdgcn.icmp.i64.i32(i32 %sext.cmp, i32 -1, i32 32)
ret i64 %mask		ret i64 %mask
}		}

define i64 @fold_icmp_sext_icmp_sge_neg1_i32(i32 %a, i32 %b) {		define i64 @fold_icmp_sext_icmp_sge_neg1_i32(i32 %a, i32 %b) {
; CHECK-LABEL: @fold_icmp_sext_icmp_sge_neg1_i32(		; CHECK-LABEL: @fold_icmp_sext_icmp_sge_neg1_i32(
; CHECK-NEXT: [[MASK:%.]] = call i64 @llvm.amdgcn.icmp.i32(i32 [[A:%.]], i32 [[B:%.*]], i32 39)		; CHECK-NEXT: [[MASK:%.]] = call i64 @llvm.amdgcn.icmp.i64.i32(i32 [[A:%.]], i32 [[B:%.*]], i32 39)
; CHECK-NEXT: ret i64 [[MASK]]		; CHECK-NEXT: ret i64 [[MASK]]
;		;
%cmp = icmp sge i32 %a, %b		%cmp = icmp sge i32 %a, %b
%sext.cmp = sext i1 %cmp to i32		%sext.cmp = sext i1 %cmp to i32
%mask = call i64 @llvm.amdgcn.icmp.i32(i32 %sext.cmp, i32 -1, i32 32)		%mask = call i64 @llvm.amdgcn.icmp.i64.i32(i32 %sext.cmp, i32 -1, i32 32)
ret i64 %mask		ret i64 %mask
}		}

define i64 @fold_not_icmp_ne_0_zext_icmp_sle_i32(i32 %a, i32 %b) {		define i64 @fold_not_icmp_ne_0_zext_icmp_sle_i32(i32 %a, i32 %b) {
; CHECK-LABEL: @fold_not_icmp_ne_0_zext_icmp_sle_i32(		; CHECK-LABEL: @fold_not_icmp_ne_0_zext_icmp_sle_i32(
; CHECK-NEXT: [[MASK:%.]] = call i64 @llvm.amdgcn.icmp.i32(i32 [[A:%.]], i32 [[B:%.*]], i32 38)		; CHECK-NEXT: [[MASK:%.]] = call i64 @llvm.amdgcn.icmp.i64.i32(i32 [[A:%.]], i32 [[B:%.*]], i32 38)
; CHECK-NEXT: ret i64 [[MASK]]		; CHECK-NEXT: ret i64 [[MASK]]
;		;
%cmp = icmp sle i32 %a, %b		%cmp = icmp sle i32 %a, %b
%not = xor i1 %cmp, true		%not = xor i1 %cmp, true
%zext.cmp = zext i1 %not to i32		%zext.cmp = zext i1 %not to i32
%mask = call i64 @llvm.amdgcn.icmp.i32(i32 %zext.cmp, i32 0, i32 33)		%mask = call i64 @llvm.amdgcn.icmp.i64.i32(i32 %zext.cmp, i32 0, i32 33)
ret i64 %mask		ret i64 %mask
}		}

define i64 @fold_icmp_ne_0_zext_icmp_eq_i4(i4 %a, i4 %b) {		define i64 @fold_icmp_ne_0_zext_icmp_eq_i4(i4 %a, i4 %b) {
; CHECK-LABEL: @fold_icmp_ne_0_zext_icmp_eq_i4(		; CHECK-LABEL: @fold_icmp_ne_0_zext_icmp_eq_i4(
; CHECK-NEXT: [[TMP1:%.]] = zext i4 [[A:%.]] to i16		; CHECK-NEXT: [[TMP1:%.]] = zext i4 [[A:%.]] to i16
; CHECK-NEXT: [[TMP2:%.]] = zext i4 [[B:%.]] to i16		; CHECK-NEXT: [[TMP2:%.]] = zext i4 [[B:%.]] to i16
; CHECK-NEXT: [[MASK:%.*]] = call i64 @llvm.amdgcn.icmp.i16(i16 [[TMP1]], i16 [[TMP2]], i32 32)		; CHECK-NEXT: [[MASK:%.*]] = call i64 @llvm.amdgcn.icmp.i64.i16(i16 [[TMP1]], i16 [[TMP2]], i32 32)
; CHECK-NEXT: ret i64 [[MASK]]		; CHECK-NEXT: ret i64 [[MASK]]
;		;
%cmp = icmp eq i4 %a, %b		%cmp = icmp eq i4 %a, %b
%zext.cmp = zext i1 %cmp to i32		%zext.cmp = zext i1 %cmp to i32
%mask = call i64 @llvm.amdgcn.icmp.i32(i32 %zext.cmp, i32 0, i32 33)		%mask = call i64 @llvm.amdgcn.icmp.i64.i32(i32 %zext.cmp, i32 0, i32 33)
ret i64 %mask		ret i64 %mask
}		}

define i64 @fold_icmp_ne_0_zext_icmp_eq_i8(i8 %a, i8 %b) {		define i64 @fold_icmp_ne_0_zext_icmp_eq_i8(i8 %a, i8 %b) {
; CHECK-LABEL: @fold_icmp_ne_0_zext_icmp_eq_i8(		; CHECK-LABEL: @fold_icmp_ne_0_zext_icmp_eq_i8(
; CHECK-NEXT: [[TMP1:%.]] = zext i8 [[A:%.]] to i16		; CHECK-NEXT: [[TMP1:%.]] = zext i8 [[A:%.]] to i16
; CHECK-NEXT: [[TMP2:%.]] = zext i8 [[B:%.]] to i16		; CHECK-NEXT: [[TMP2:%.]] = zext i8 [[B:%.]] to i16
; CHECK-NEXT: [[MASK:%.*]] = call i64 @llvm.amdgcn.icmp.i16(i16 [[TMP1]], i16 [[TMP2]], i32 32)		; CHECK-NEXT: [[MASK:%.*]] = call i64 @llvm.amdgcn.icmp.i64.i16(i16 [[TMP1]], i16 [[TMP2]], i32 32)
; CHECK-NEXT: ret i64 [[MASK]]		; CHECK-NEXT: ret i64 [[MASK]]
;		;
%cmp = icmp eq i8 %a, %b		%cmp = icmp eq i8 %a, %b
%zext.cmp = zext i1 %cmp to i32		%zext.cmp = zext i1 %cmp to i32
%mask = call i64 @llvm.amdgcn.icmp.i32(i32 %zext.cmp, i32 0, i32 33)		%mask = call i64 @llvm.amdgcn.icmp.i64.i32(i32 %zext.cmp, i32 0, i32 33)
ret i64 %mask		ret i64 %mask
}		}

define i64 @fold_icmp_ne_0_zext_icmp_eq_i16(i16 %a, i16 %b) {		define i64 @fold_icmp_ne_0_zext_icmp_eq_i16(i16 %a, i16 %b) {
; CHECK-LABEL: @fold_icmp_ne_0_zext_icmp_eq_i16(		; CHECK-LABEL: @fold_icmp_ne_0_zext_icmp_eq_i16(
; CHECK-NEXT: [[MASK:%.]] = call i64 @llvm.amdgcn.icmp.i16(i16 [[A:%.]], i16 [[B:%.*]], i32 32)		; CHECK-NEXT: [[MASK:%.]] = call i64 @llvm.amdgcn.icmp.i64.i16(i16 [[A:%.]], i16 [[B:%.*]], i32 32)
; CHECK-NEXT: ret i64 [[MASK]]		; CHECK-NEXT: ret i64 [[MASK]]
;		;
%cmp = icmp eq i16 %a, %b		%cmp = icmp eq i16 %a, %b
%zext.cmp = zext i1 %cmp to i32		%zext.cmp = zext i1 %cmp to i32
%mask = call i64 @llvm.amdgcn.icmp.i32(i32 %zext.cmp, i32 0, i32 33)		%mask = call i64 @llvm.amdgcn.icmp.i64.i32(i32 %zext.cmp, i32 0, i32 33)
ret i64 %mask		ret i64 %mask
}		}

define i64 @fold_icmp_ne_0_zext_icmp_eq_i36(i36 %a, i36 %b) {		define i64 @fold_icmp_ne_0_zext_icmp_eq_i36(i36 %a, i36 %b) {
; CHECK-LABEL: @fold_icmp_ne_0_zext_icmp_eq_i36(		; CHECK-LABEL: @fold_icmp_ne_0_zext_icmp_eq_i36(
; CHECK-NEXT: [[TMP1:%.]] = zext i36 [[A:%.]] to i64		; CHECK-NEXT: [[TMP1:%.]] = zext i36 [[A:%.]] to i64
; CHECK-NEXT: [[TMP2:%.]] = zext i36 [[B:%.]] to i64		; CHECK-NEXT: [[TMP2:%.]] = zext i36 [[B:%.]] to i64
; CHECK-NEXT: [[MASK:%.*]] = call i64 @llvm.amdgcn.icmp.i64(i64 [[TMP1]], i64 [[TMP2]], i32 32)		; CHECK-NEXT: [[MASK:%.*]] = call i64 @llvm.amdgcn.icmp.i64.i64(i64 [[TMP1]], i64 [[TMP2]], i32 32)
; CHECK-NEXT: ret i64 [[MASK]]		; CHECK-NEXT: ret i64 [[MASK]]
;		;
%cmp = icmp eq i36 %a, %b		%cmp = icmp eq i36 %a, %b
%zext.cmp = zext i1 %cmp to i32		%zext.cmp = zext i1 %cmp to i32
%mask = call i64 @llvm.amdgcn.icmp.i32(i32 %zext.cmp, i32 0, i32 33)		%mask = call i64 @llvm.amdgcn.icmp.i64.i32(i32 %zext.cmp, i32 0, i32 33)
ret i64 %mask		ret i64 %mask
}		}

define i64 @fold_icmp_ne_0_zext_icmp_eq_i128(i128 %a, i128 %b) {		define i64 @fold_icmp_ne_0_zext_icmp_eq_i128(i128 %a, i128 %b) {
; CHECK-LABEL: @fold_icmp_ne_0_zext_icmp_eq_i128(		; CHECK-LABEL: @fold_icmp_ne_0_zext_icmp_eq_i128(
; CHECK-NEXT: [[CMP:%.]] = icmp eq i128 [[A:%.]], [[B:%.*]]		; CHECK-NEXT: [[CMP:%.]] = icmp eq i128 [[A:%.]], [[B:%.*]]
; CHECK-NEXT: [[ZEXT_CMP:%.*]] = zext i1 [[CMP]] to i32		; CHECK-NEXT: [[ZEXT_CMP:%.*]] = zext i1 [[CMP]] to i32
; CHECK-NEXT: [[MASK:%.*]] = call i64 @llvm.amdgcn.icmp.i32(i32 [[ZEXT_CMP]], i32 0, i32 33)		; CHECK-NEXT: [[MASK:%.*]] = call i64 @llvm.amdgcn.icmp.i64.i32(i32 [[ZEXT_CMP]], i32 0, i32 33)
; CHECK-NEXT: ret i64 [[MASK]]		; CHECK-NEXT: ret i64 [[MASK]]
;		;
%cmp = icmp eq i128 %a, %b		%cmp = icmp eq i128 %a, %b
%zext.cmp = zext i1 %cmp to i32		%zext.cmp = zext i1 %cmp to i32
%mask = call i64 @llvm.amdgcn.icmp.i32(i32 %zext.cmp, i32 0, i32 33)		%mask = call i64 @llvm.amdgcn.icmp.i64.i32(i32 %zext.cmp, i32 0, i32 33)
ret i64 %mask		ret i64 %mask
}		}

define i64 @fold_icmp_ne_0_zext_fcmp_oeq_f16(half %a, half %b) {		define i64 @fold_icmp_ne_0_zext_fcmp_oeq_f16(half %a, half %b) {
; CHECK-LABEL: @fold_icmp_ne_0_zext_fcmp_oeq_f16(		; CHECK-LABEL: @fold_icmp_ne_0_zext_fcmp_oeq_f16(
; CHECK-NEXT: [[MASK:%.]] = call i64 @llvm.amdgcn.fcmp.f16(half [[A:%.]], half [[B:%.*]], i32 1)		; CHECK-NEXT: [[MASK:%.]] = call i64 @llvm.amdgcn.fcmp.i64.f16(half [[A:%.]], half [[B:%.*]], i32 1)
; CHECK-NEXT: ret i64 [[MASK]]		; CHECK-NEXT: ret i64 [[MASK]]
;		;
%cmp = fcmp oeq half %a, %b		%cmp = fcmp oeq half %a, %b
%zext.cmp = zext i1 %cmp to i32		%zext.cmp = zext i1 %cmp to i32
%mask = call i64 @llvm.amdgcn.icmp.i32(i32 %zext.cmp, i32 0, i32 33)		%mask = call i64 @llvm.amdgcn.icmp.i64.i32(i32 %zext.cmp, i32 0, i32 33)
ret i64 %mask		ret i64 %mask
}		}

define i64 @fold_icmp_ne_0_zext_fcmp_oeq_f128(fp128 %a, fp128 %b) {		define i64 @fold_icmp_ne_0_zext_fcmp_oeq_f128(fp128 %a, fp128 %b) {
;
; CHECK-LABEL: @fold_icmp_ne_0_zext_fcmp_oeq_f128(		; CHECK-LABEL: @fold_icmp_ne_0_zext_fcmp_oeq_f128(
; CHECK-NEXT: [[CMP:%.]] = fcmp oeq fp128 [[A:%.]], [[B:%.*]]		; CHECK-NEXT: [[CMP:%.]] = fcmp oeq fp128 [[A:%.]], [[B:%.*]]
; CHECK-NEXT: [[ZEXT_CMP:%.*]] = zext i1 [[CMP]] to i32		; CHECK-NEXT: [[ZEXT_CMP:%.*]] = zext i1 [[CMP]] to i32
; CHECK-NEXT: [[MASK:%.*]] = call i64 @llvm.amdgcn.icmp.i32(i32 [[ZEXT_CMP]], i32 0, i32 33)		; CHECK-NEXT: [[MASK:%.*]] = call i64 @llvm.amdgcn.icmp.i64.i32(i32 [[ZEXT_CMP]], i32 0, i32 33)
; CHECK-NEXT: ret i64 [[MASK]]		; CHECK-NEXT: ret i64 [[MASK]]
;		;
%cmp = fcmp oeq fp128 %a, %b		%cmp = fcmp oeq fp128 %a, %b
%zext.cmp = zext i1 %cmp to i32		%zext.cmp = zext i1 %cmp to i32
%mask = call i64 @llvm.amdgcn.icmp.i32(i32 %zext.cmp, i32 0, i32 33)		%mask = call i64 @llvm.amdgcn.icmp.i64.i32(i32 %zext.cmp, i32 0, i32 33)
ret i64 %mask		ret i64 %mask
}		}

define i64 @fold_icmp_ne_0_zext_icmp_slt_i4(i4 %a, i4 %b) {		define i64 @fold_icmp_ne_0_zext_icmp_slt_i4(i4 %a, i4 %b) {
; CHECK-LABEL: @fold_icmp_ne_0_zext_icmp_slt_i4(		; CHECK-LABEL: @fold_icmp_ne_0_zext_icmp_slt_i4(
; CHECK-NEXT: [[TMP1:%.]] = sext i4 [[A:%.]] to i16		; CHECK-NEXT: [[TMP1:%.]] = sext i4 [[A:%.]] to i16
; CHECK-NEXT: [[TMP2:%.]] = sext i4 [[B:%.]] to i16		; CHECK-NEXT: [[TMP2:%.]] = sext i4 [[B:%.]] to i16
; CHECK-NEXT: [[MASK:%.*]] = call i64 @llvm.amdgcn.icmp.i16(i16 [[TMP1]], i16 [[TMP2]], i32 40)		; CHECK-NEXT: [[MASK:%.*]] = call i64 @llvm.amdgcn.icmp.i64.i16(i16 [[TMP1]], i16 [[TMP2]], i32 40)
; CHECK-NEXT: ret i64 [[MASK]]		; CHECK-NEXT: ret i64 [[MASK]]
;		;
%cmp = icmp slt i4 %a, %b		%cmp = icmp slt i4 %a, %b
%zext.cmp = zext i1 %cmp to i32		%zext.cmp = zext i1 %cmp to i32
%mask = call i64 @llvm.amdgcn.icmp.i32(i32 %zext.cmp, i32 0, i32 33)		%mask = call i64 @llvm.amdgcn.icmp.i64.i32(i32 %zext.cmp, i32 0, i32 33)
ret i64 %mask		ret i64 %mask
}		}

define i64 @fold_icmp_ne_0_zext_icmp_slt_i8(i8 %a, i8 %b) {		define i64 @fold_icmp_ne_0_zext_icmp_slt_i8(i8 %a, i8 %b) {
; CHECK-LABEL: @fold_icmp_ne_0_zext_icmp_slt_i8(		; CHECK-LABEL: @fold_icmp_ne_0_zext_icmp_slt_i8(
; CHECK-NEXT: [[TMP1:%.]] = sext i8 [[A:%.]] to i16		; CHECK-NEXT: [[TMP1:%.]] = sext i8 [[A:%.]] to i16
; CHECK-NEXT: [[TMP2:%.]] = sext i8 [[B:%.]] to i16		; CHECK-NEXT: [[TMP2:%.]] = sext i8 [[B:%.]] to i16
; CHECK-NEXT: [[MASK:%.*]] = call i64 @llvm.amdgcn.icmp.i16(i16 [[TMP1]], i16 [[TMP2]], i32 40)		; CHECK-NEXT: [[MASK:%.*]] = call i64 @llvm.amdgcn.icmp.i64.i16(i16 [[TMP1]], i16 [[TMP2]], i32 40)
; CHECK-NEXT: ret i64 [[MASK]]		; CHECK-NEXT: ret i64 [[MASK]]
;		;
%cmp = icmp slt i8 %a, %b		%cmp = icmp slt i8 %a, %b
%zext.cmp = zext i1 %cmp to i32		%zext.cmp = zext i1 %cmp to i32
%mask = call i64 @llvm.amdgcn.icmp.i32(i32 %zext.cmp, i32 0, i32 33)		%mask = call i64 @llvm.amdgcn.icmp.i64.i32(i32 %zext.cmp, i32 0, i32 33)
ret i64 %mask		ret i64 %mask
}		}

define i64 @fold_icmp_ne_0_zext_icmp_slt_i16(i16 %a, i16 %b) {		define i64 @fold_icmp_ne_0_zext_icmp_slt_i16(i16 %a, i16 %b) {
; CHECK-LABEL: @fold_icmp_ne_0_zext_icmp_slt_i16(		; CHECK-LABEL: @fold_icmp_ne_0_zext_icmp_slt_i16(
; CHECK-NEXT: [[MASK:%.]] = call i64 @llvm.amdgcn.icmp.i16(i16 [[A:%.]], i16 [[B:%.*]], i32 40)		; CHECK-NEXT: [[MASK:%.]] = call i64 @llvm.amdgcn.icmp.i64.i16(i16 [[A:%.]], i16 [[B:%.*]], i32 40)
; CHECK-NEXT: ret i64 [[MASK]]		; CHECK-NEXT: ret i64 [[MASK]]
;		;
%cmp = icmp slt i16 %a, %b		%cmp = icmp slt i16 %a, %b
%zext.cmp = zext i1 %cmp to i32		%zext.cmp = zext i1 %cmp to i32
%mask = call i64 @llvm.amdgcn.icmp.i32(i32 %zext.cmp, i32 0, i32 33)		%mask = call i64 @llvm.amdgcn.icmp.i64.i32(i32 %zext.cmp, i32 0, i32 33)
ret i64 %mask		ret i64 %mask
}		}

define i64 @fold_icmp_ne_0_zext_icmp_ult_i4(i4 %a, i4 %b) {		define i64 @fold_icmp_ne_0_zext_icmp_ult_i4(i4 %a, i4 %b) {
; CHECK-LABEL: @fold_icmp_ne_0_zext_icmp_ult_i4(		; CHECK-LABEL: @fold_icmp_ne_0_zext_icmp_ult_i4(
; CHECK-NEXT: [[TMP1:%.]] = zext i4 [[A:%.]] to i16		; CHECK-NEXT: [[TMP1:%.]] = zext i4 [[A:%.]] to i16
; CHECK-NEXT: [[TMP2:%.]] = zext i4 [[B:%.]] to i16		; CHECK-NEXT: [[TMP2:%.]] = zext i4 [[B:%.]] to i16
; CHECK-NEXT: [[MASK:%.*]] = call i64 @llvm.amdgcn.icmp.i16(i16 [[TMP1]], i16 [[TMP2]], i32 36)		; CHECK-NEXT: [[MASK:%.*]] = call i64 @llvm.amdgcn.icmp.i64.i16(i16 [[TMP1]], i16 [[TMP2]], i32 36)
; CHECK-NEXT: ret i64 [[MASK]]		; CHECK-NEXT: ret i64 [[MASK]]
;		;
%cmp = icmp ult i4 %a, %b		%cmp = icmp ult i4 %a, %b
%zext.cmp = zext i1 %cmp to i32		%zext.cmp = zext i1 %cmp to i32
%mask = call i64 @llvm.amdgcn.icmp.i32(i32 %zext.cmp, i32 0, i32 33)		%mask = call i64 @llvm.amdgcn.icmp.i64.i32(i32 %zext.cmp, i32 0, i32 33)
ret i64 %mask		ret i64 %mask
}		}

define i64 @fold_icmp_ne_0_zext_icmp_ult_i8(i8 %a, i8 %b) {		define i64 @fold_icmp_ne_0_zext_icmp_ult_i8(i8 %a, i8 %b) {
; CHECK-LABEL: @fold_icmp_ne_0_zext_icmp_ult_i8(		; CHECK-LABEL: @fold_icmp_ne_0_zext_icmp_ult_i8(
; CHECK-NEXT: [[TMP1:%.]] = zext i8 [[A:%.]] to i16		; CHECK-NEXT: [[TMP1:%.]] = zext i8 [[A:%.]] to i16
; CHECK-NEXT: [[TMP2:%.]] = zext i8 [[B:%.]] to i16		; CHECK-NEXT: [[TMP2:%.]] = zext i8 [[B:%.]] to i16
; CHECK-NEXT: [[MASK:%.*]] = call i64 @llvm.amdgcn.icmp.i16(i16 [[TMP1]], i16 [[TMP2]], i32 36)		; CHECK-NEXT: [[MASK:%.*]] = call i64 @llvm.amdgcn.icmp.i64.i16(i16 [[TMP1]], i16 [[TMP2]], i32 36)
; CHECK-NEXT: ret i64 [[MASK]]		; CHECK-NEXT: ret i64 [[MASK]]
;		;
%cmp = icmp ult i8 %a, %b		%cmp = icmp ult i8 %a, %b
%zext.cmp = zext i1 %cmp to i32		%zext.cmp = zext i1 %cmp to i32
%mask = call i64 @llvm.amdgcn.icmp.i32(i32 %zext.cmp, i32 0, i32 33)		%mask = call i64 @llvm.amdgcn.icmp.i64.i32(i32 %zext.cmp, i32 0, i32 33)
ret i64 %mask		ret i64 %mask
}		}

define i64 @fold_icmp_ne_0_zext_icmp_ult_i16(i16 %a, i16 %b) {		define i64 @fold_icmp_ne_0_zext_icmp_ult_i16(i16 %a, i16 %b) {
; CHECK-LABEL: @fold_icmp_ne_0_zext_icmp_ult_i16(		; CHECK-LABEL: @fold_icmp_ne_0_zext_icmp_ult_i16(
; CHECK-NEXT: [[MASK:%.]] = call i64 @llvm.amdgcn.icmp.i16(i16 [[A:%.]], i16 [[B:%.*]], i32 36)		; CHECK-NEXT: [[MASK:%.]] = call i64 @llvm.amdgcn.icmp.i64.i16(i16 [[A:%.]], i16 [[B:%.*]], i32 36)
; CHECK-NEXT: ret i64 [[MASK]]		; CHECK-NEXT: ret i64 [[MASK]]
;		;
%cmp = icmp ult i16 %a, %b		%cmp = icmp ult i16 %a, %b
%zext.cmp = zext i1 %cmp to i32		%zext.cmp = zext i1 %cmp to i32
%mask = call i64 @llvm.amdgcn.icmp.i32(i32 %zext.cmp, i32 0, i32 33)		%mask = call i64 @llvm.amdgcn.icmp.i64.i32(i32 %zext.cmp, i32 0, i32 33)
ret i64 %mask		ret i64 %mask
}		}

; 1-bit NE comparisons		; 1-bit NE comparisons

define i64 @fold_icmp_i1_ne_0_icmp_eq_i1(i32 %a, i32 %b) {		define i64 @fold_icmp_i1_ne_0_icmp_eq_i1(i32 %a, i32 %b) {
; CHECK-LABEL: @fold_icmp_i1_ne_0_icmp_eq_i1(		; CHECK-LABEL: @fold_icmp_i1_ne_0_icmp_eq_i1(
; CHECK-NEXT: [[CMP:%.]] = icmp eq i32 [[A:%.]], [[B:%.*]]		; CHECK-NEXT: [[CMP:%.]] = icmp eq i32 [[A:%.]], [[B:%.*]]
; CHECK-NEXT: [[MASK:%.*]] = call i64 @llvm.amdgcn.icmp.i1(i1 [[CMP]], i1 false, i32 33)		; CHECK-NEXT: [[MASK:%.*]] = call i64 @llvm.amdgcn.icmp.i64.i1(i1 [[CMP]], i1 false, i32 33)
; CHECK-NEXT: ret i64 [[MASK]]		; CHECK-NEXT: ret i64 [[MASK]]
;		;
%cmp = icmp eq i32 %a, %b		%cmp = icmp eq i32 %a, %b
%mask = call i64 @llvm.amdgcn.icmp.i1(i1 %cmp, i1 false, i32 33)		%mask = call i64 @llvm.amdgcn.icmp.i64.i1(i1 %cmp, i1 false, i32 33)
ret i64 %mask		ret i64 %mask
}		}

define i64 @fold_icmp_i1_ne_0_icmp_ne_i1(i32 %a, i32 %b) {		define i64 @fold_icmp_i1_ne_0_icmp_ne_i1(i32 %a, i32 %b) {
; CHECK-LABEL: @fold_icmp_i1_ne_0_icmp_ne_i1(		; CHECK-LABEL: @fold_icmp_i1_ne_0_icmp_ne_i1(
; CHECK-NEXT: [[CMP:%.]] = icmp ne i32 [[A:%.]], [[B:%.*]]		; CHECK-NEXT: [[CMP:%.]] = icmp ne i32 [[A:%.]], [[B:%.*]]
; CHECK-NEXT: [[MASK:%.*]] = call i64 @llvm.amdgcn.icmp.i1(i1 [[CMP]], i1 false, i32 33)		; CHECK-NEXT: [[MASK:%.*]] = call i64 @llvm.amdgcn.icmp.i64.i1(i1 [[CMP]], i1 false, i32 33)
; CHECK-NEXT: ret i64 [[MASK]]		; CHECK-NEXT: ret i64 [[MASK]]
;		;
%cmp = icmp ne i32 %a, %b		%cmp = icmp ne i32 %a, %b
%mask = call i64 @llvm.amdgcn.icmp.i1(i1 %cmp, i1 false, i32 33)		%mask = call i64 @llvm.amdgcn.icmp.i64.i1(i1 %cmp, i1 false, i32 33)
ret i64 %mask		ret i64 %mask
}		}

define i64 @fold_icmp_i1_ne_0_icmp_sle_i1(i32 %a, i32 %b) {		define i64 @fold_icmp_i1_ne_0_icmp_sle_i1(i32 %a, i32 %b) {
; CHECK-LABEL: @fold_icmp_i1_ne_0_icmp_sle_i1(		; CHECK-LABEL: @fold_icmp_i1_ne_0_icmp_sle_i1(
; CHECK-NEXT: [[CMP:%.]] = icmp sle i32 [[A:%.]], [[B:%.*]]		; CHECK-NEXT: [[CMP:%.]] = icmp sle i32 [[A:%.]], [[B:%.*]]
; CHECK-NEXT: [[MASK:%.*]] = call i64 @llvm.amdgcn.icmp.i1(i1 [[CMP]], i1 false, i32 33)		; CHECK-NEXT: [[MASK:%.*]] = call i64 @llvm.amdgcn.icmp.i64.i1(i1 [[CMP]], i1 false, i32 33)
; CHECK-NEXT: ret i64 [[MASK]]		; CHECK-NEXT: ret i64 [[MASK]]
;		;
%cmp = icmp sle i32 %a, %b		%cmp = icmp sle i32 %a, %b
%mask = call i64 @llvm.amdgcn.icmp.i1(i1 %cmp, i1 false, i32 33)		%mask = call i64 @llvm.amdgcn.icmp.i64.i1(i1 %cmp, i1 false, i32 33)
ret i64 %mask		ret i64 %mask
}		}

define i64 @fold_icmp_i1_ne_0_icmp_ugt_i64(i64 %a, i64 %b) {		define i64 @fold_icmp_i1_ne_0_icmp_ugt_i64(i64 %a, i64 %b) {
; CHECK-LABEL: @fold_icmp_i1_ne_0_icmp_ugt_i64(		; CHECK-LABEL: @fold_icmp_i1_ne_0_icmp_ugt_i64(
; CHECK-NEXT: [[CMP:%.]] = icmp ugt i64 [[A:%.]], [[B:%.*]]		; CHECK-NEXT: [[CMP:%.]] = icmp ugt i64 [[A:%.]], [[B:%.*]]
; CHECK-NEXT: [[MASK:%.*]] = call i64 @llvm.amdgcn.icmp.i1(i1 [[CMP]], i1 false, i32 33)		; CHECK-NEXT: [[MASK:%.*]] = call i64 @llvm.amdgcn.icmp.i64.i1(i1 [[CMP]], i1 false, i32 33)
; CHECK-NEXT: ret i64 [[MASK]]		; CHECK-NEXT: ret i64 [[MASK]]
;		;
%cmp = icmp ugt i64 %a, %b		%cmp = icmp ugt i64 %a, %b
%mask = call i64 @llvm.amdgcn.icmp.i1(i1 %cmp, i1 false, i32 33)		%mask = call i64 @llvm.amdgcn.icmp.i64.i1(i1 %cmp, i1 false, i32 33)
ret i64 %mask		ret i64 %mask
}		}

define i64 @fold_icmp_i1_ne_0_icmp_ult_swap_i64(i64 %a, i64 %b) {		define i64 @fold_icmp_i1_ne_0_icmp_ult_swap_i64(i64 %a, i64 %b) {
; CHECK-LABEL: @fold_icmp_i1_ne_0_icmp_ult_swap_i64(		; CHECK-LABEL: @fold_icmp_i1_ne_0_icmp_ult_swap_i64(
; CHECK-NEXT: [[CMP:%.]] = icmp ugt i64 [[A:%.]], [[B:%.*]]		; CHECK-NEXT: [[CMP:%.]] = icmp ugt i64 [[A:%.]], [[B:%.*]]
; CHECK-NEXT: [[MASK:%.*]] = call i64 @llvm.amdgcn.icmp.i1(i1 [[CMP]], i1 false, i32 33)		; CHECK-NEXT: [[MASK:%.*]] = call i64 @llvm.amdgcn.icmp.i64.i1(i1 [[CMP]], i1 false, i32 33)
; CHECK-NEXT: ret i64 [[MASK]]		; CHECK-NEXT: ret i64 [[MASK]]
;		;
%cmp = icmp ugt i64 %a, %b		%cmp = icmp ugt i64 %a, %b
%mask = call i64 @llvm.amdgcn.icmp.i1(i1 false, i1 %cmp, i32 33)		%mask = call i64 @llvm.amdgcn.icmp.i64.i1(i1 false, i1 %cmp, i32 33)
ret i64 %mask		ret i64 %mask
}		}

define i64 @fold_icmp_i1_ne_0_fcmp_oeq_f32(float %a, float %b) {		define i64 @fold_icmp_i1_ne_0_fcmp_oeq_f32(float %a, float %b) {
; CHECK-LABEL: @fold_icmp_i1_ne_0_fcmp_oeq_f32(		; CHECK-LABEL: @fold_icmp_i1_ne_0_fcmp_oeq_f32(
; CHECK-NEXT: [[CMP:%.]] = fcmp oeq float [[A:%.]], [[B:%.*]]		; CHECK-NEXT: [[CMP:%.]] = fcmp oeq float [[A:%.]], [[B:%.*]]
; CHECK-NEXT: [[MASK:%.*]] = call i64 @llvm.amdgcn.icmp.i1(i1 [[CMP]], i1 false, i32 33)		; CHECK-NEXT: [[MASK:%.*]] = call i64 @llvm.amdgcn.icmp.i64.i1(i1 [[CMP]], i1 false, i32 33)
; CHECK-NEXT: ret i64 [[MASK]]		; CHECK-NEXT: ret i64 [[MASK]]
;		;
%cmp = fcmp oeq float %a, %b		%cmp = fcmp oeq float %a, %b
%mask = call i64 @llvm.amdgcn.icmp.i1(i1 %cmp, i1 false, i32 33)		%mask = call i64 @llvm.amdgcn.icmp.i64.i1(i1 %cmp, i1 false, i32 33)
ret i64 %mask		ret i64 %mask
}		}

define i64 @fold_icmp_i1_ne_0_fcmp_une_f32(float %a, float %b) {		define i64 @fold_icmp_i1_ne_0_fcmp_une_f32(float %a, float %b) {
; CHECK-LABEL: @fold_icmp_i1_ne_0_fcmp_une_f32(		; CHECK-LABEL: @fold_icmp_i1_ne_0_fcmp_une_f32(
; CHECK-NEXT: [[CMP:%.]] = fcmp une float [[A:%.]], [[B:%.*]]		; CHECK-NEXT: [[CMP:%.]] = fcmp une float [[A:%.]], [[B:%.*]]
; CHECK-NEXT: [[MASK:%.*]] = call i64 @llvm.amdgcn.icmp.i1(i1 [[CMP]], i1 false, i32 33)		; CHECK-NEXT: [[MASK:%.*]] = call i64 @llvm.amdgcn.icmp.i64.i1(i1 [[CMP]], i1 false, i32 33)
; CHECK-NEXT: ret i64 [[MASK]]		; CHECK-NEXT: ret i64 [[MASK]]
;		;
%cmp = fcmp une float %a, %b		%cmp = fcmp une float %a, %b
%mask = call i64 @llvm.amdgcn.icmp.i1(i1 %cmp, i1 false, i32 33)		%mask = call i64 @llvm.amdgcn.icmp.i64.i1(i1 %cmp, i1 false, i32 33)
ret i64 %mask		ret i64 %mask
}		}

define i64 @fold_icmp_i1_ne_0_fcmp_olt_f64(double %a, double %b) {		define i64 @fold_icmp_i1_ne_0_fcmp_olt_f64(double %a, double %b) {
; CHECK-LABEL: @fold_icmp_i1_ne_0_fcmp_olt_f64(		; CHECK-LABEL: @fold_icmp_i1_ne_0_fcmp_olt_f64(
; CHECK-NEXT: [[CMP:%.]] = fcmp olt double [[A:%.]], [[B:%.*]]		; CHECK-NEXT: [[CMP:%.]] = fcmp olt double [[A:%.]], [[B:%.*]]
; CHECK-NEXT: [[MASK:%.*]] = call i64 @llvm.amdgcn.icmp.i1(i1 [[CMP]], i1 false, i32 33)		; CHECK-NEXT: [[MASK:%.*]] = call i64 @llvm.amdgcn.icmp.i64.i1(i1 [[CMP]], i1 false, i32 33)
; CHECK-NEXT: ret i64 [[MASK]]		; CHECK-NEXT: ret i64 [[MASK]]
;		;
%cmp = fcmp olt double %a, %b		%cmp = fcmp olt double %a, %b
%mask = call i64 @llvm.amdgcn.icmp.i1(i1 %cmp, i1 false, i32 33)		%mask = call i64 @llvm.amdgcn.icmp.i64.i1(i1 %cmp, i1 false, i32 33)
ret i64 %mask		ret i64 %mask
}		}

define i64 @fold_icmp_i1_ne_0_icmp_eq_i4(i4 %a, i4 %b) {		define i64 @fold_icmp_i1_ne_0_icmp_eq_i4(i4 %a, i4 %b) {
; CHECK-LABEL: @fold_icmp_i1_ne_0_icmp_eq_i4(		; CHECK-LABEL: @fold_icmp_i1_ne_0_icmp_eq_i4(
; CHECK-NEXT: [[CMP:%.]] = icmp eq i4 [[A:%.]], [[B:%.*]]		; CHECK-NEXT: [[CMP:%.]] = icmp eq i4 [[A:%.]], [[B:%.*]]
; CHECK-NEXT: [[MASK:%.*]] = call i64 @llvm.amdgcn.icmp.i1(i1 [[CMP]], i1 false, i32 33)		; CHECK-NEXT: [[MASK:%.*]] = call i64 @llvm.amdgcn.icmp.i64.i1(i1 [[CMP]], i1 false, i32 33)
; CHECK-NEXT: ret i64 [[MASK]]		; CHECK-NEXT: ret i64 [[MASK]]
;		;
%cmp = icmp eq i4 %a, %b		%cmp = icmp eq i4 %a, %b
%mask = call i64 @llvm.amdgcn.icmp.i1(i1 %cmp, i1 false, i32 33)		%mask = call i64 @llvm.amdgcn.icmp.i64.i1(i1 %cmp, i1 false, i32 33)
ret i64 %mask		ret i64 %mask
}		}

define i64 @fold_icmp_i1_ne_0_icmp_eq_i8(i8 %a, i8 %b) {		define i64 @fold_icmp_i1_ne_0_icmp_eq_i8(i8 %a, i8 %b) {
; CHECK-LABEL: @fold_icmp_i1_ne_0_icmp_eq_i8(		; CHECK-LABEL: @fold_icmp_i1_ne_0_icmp_eq_i8(
; CHECK-NEXT: [[CMP:%.]] = icmp eq i8 [[A:%.]], [[B:%.*]]		; CHECK-NEXT: [[CMP:%.]] = icmp eq i8 [[A:%.]], [[B:%.*]]
; CHECK-NEXT: [[MASK:%.*]] = call i64 @llvm.amdgcn.icmp.i1(i1 [[CMP]], i1 false, i32 33)		; CHECK-NEXT: [[MASK:%.*]] = call i64 @llvm.amdgcn.icmp.i64.i1(i1 [[CMP]], i1 false, i32 33)
; CHECK-NEXT: ret i64 [[MASK]]		; CHECK-NEXT: ret i64 [[MASK]]
;		;
%cmp = icmp eq i8 %a, %b		%cmp = icmp eq i8 %a, %b
%mask = call i64 @llvm.amdgcn.icmp.i1(i1 %cmp, i1 false, i32 33)		%mask = call i64 @llvm.amdgcn.icmp.i64.i1(i1 %cmp, i1 false, i32 33)
ret i64 %mask		ret i64 %mask
}		}

define i64 @fold_icmp_i1_ne_0_icmp_eq_i16(i16 %a, i16 %b) {		define i64 @fold_icmp_i1_ne_0_icmp_eq_i16(i16 %a, i16 %b) {
; CHECK-LABEL: @fold_icmp_i1_ne_0_icmp_eq_i16(		; CHECK-LABEL: @fold_icmp_i1_ne_0_icmp_eq_i16(
; CHECK-NEXT: [[CMP:%.]] = icmp eq i16 [[A:%.]], [[B:%.*]]		; CHECK-NEXT: [[CMP:%.]] = icmp eq i16 [[A:%.]], [[B:%.*]]
; CHECK-NEXT: [[MASK:%.*]] = call i64 @llvm.amdgcn.icmp.i1(i1 [[CMP]], i1 false, i32 33)		; CHECK-NEXT: [[MASK:%.*]] = call i64 @llvm.amdgcn.icmp.i64.i1(i1 [[CMP]], i1 false, i32 33)
; CHECK-NEXT: ret i64 [[MASK]]		; CHECK-NEXT: ret i64 [[MASK]]
;		;
%cmp = icmp eq i16 %a, %b		%cmp = icmp eq i16 %a, %b
%mask = call i64 @llvm.amdgcn.icmp.i1(i1 %cmp, i1 false, i32 33)		%mask = call i64 @llvm.amdgcn.icmp.i64.i1(i1 %cmp, i1 false, i32 33)
ret i64 %mask		ret i64 %mask
}		}

define i64 @fold_icmp_i1_ne_0_icmp_eq_i36(i36 %a, i36 %b) {		define i64 @fold_icmp_i1_ne_0_icmp_eq_i36(i36 %a, i36 %b) {
; CHECK-LABEL: @fold_icmp_i1_ne_0_icmp_eq_i36(		; CHECK-LABEL: @fold_icmp_i1_ne_0_icmp_eq_i36(
; CHECK-NEXT: [[CMP:%.]] = icmp eq i36 [[A:%.]], [[B:%.*]]		; CHECK-NEXT: [[CMP:%.]] = icmp eq i36 [[A:%.]], [[B:%.*]]
; CHECK-NEXT: [[MASK:%.*]] = call i64 @llvm.amdgcn.icmp.i1(i1 [[CMP]], i1 false, i32 33)		; CHECK-NEXT: [[MASK:%.*]] = call i64 @llvm.amdgcn.icmp.i64.i1(i1 [[CMP]], i1 false, i32 33)
; CHECK-NEXT: ret i64 [[MASK]]		; CHECK-NEXT: ret i64 [[MASK]]
;		;
%cmp = icmp eq i36 %a, %b		%cmp = icmp eq i36 %a, %b
%mask = call i64 @llvm.amdgcn.icmp.i1(i1 %cmp, i1 false, i32 33)		%mask = call i64 @llvm.amdgcn.icmp.i64.i1(i1 %cmp, i1 false, i32 33)
ret i64 %mask		ret i64 %mask
}		}

define i64 @fold_icmp_i1_ne_0_icmp_eq_i128(i128 %a, i128 %b) {		define i64 @fold_icmp_i1_ne_0_icmp_eq_i128(i128 %a, i128 %b) {
; CHECK-LABEL: @fold_icmp_i1_ne_0_icmp_eq_i128(		; CHECK-LABEL: @fold_icmp_i1_ne_0_icmp_eq_i128(
; CHECK-NEXT: [[CMP:%.]] = icmp eq i128 [[A:%.]], [[B:%.*]]		; CHECK-NEXT: [[CMP:%.]] = icmp eq i128 [[A:%.]], [[B:%.*]]
; CHECK-NEXT: [[MASK:%.*]] = call i64 @llvm.amdgcn.icmp.i1(i1 [[CMP]], i1 false, i32 33)		; CHECK-NEXT: [[MASK:%.*]] = call i64 @llvm.amdgcn.icmp.i64.i1(i1 [[CMP]], i1 false, i32 33)
; CHECK-NEXT: ret i64 [[MASK]]		; CHECK-NEXT: ret i64 [[MASK]]
;		;
%cmp = icmp eq i128 %a, %b		%cmp = icmp eq i128 %a, %b
%mask = call i64 @llvm.amdgcn.icmp.i1(i1 %cmp, i1 false, i32 33)		%mask = call i64 @llvm.amdgcn.icmp.i64.i1(i1 %cmp, i1 false, i32 33)
ret i64 %mask		ret i64 %mask
}		}

define i64 @fold_icmp_i1_ne_0_fcmp_oeq_f16(half %a, half %b) {		define i64 @fold_icmp_i1_ne_0_fcmp_oeq_f16(half %a, half %b) {
; CHECK-LABEL: @fold_icmp_i1_ne_0_fcmp_oeq_f16(		; CHECK-LABEL: @fold_icmp_i1_ne_0_fcmp_oeq_f16(
; CHECK-NEXT: [[CMP:%.]] = fcmp oeq half [[A:%.]], [[B:%.*]]		; CHECK-NEXT: [[CMP:%.]] = fcmp oeq half [[A:%.]], [[B:%.*]]
; CHECK-NEXT: [[MASK:%.*]] = call i64 @llvm.amdgcn.icmp.i1(i1 [[CMP]], i1 false, i32 33)		; CHECK-NEXT: [[MASK:%.*]] = call i64 @llvm.amdgcn.icmp.i64.i1(i1 [[CMP]], i1 false, i32 33)
; CHECK-NEXT: ret i64 [[MASK]]		; CHECK-NEXT: ret i64 [[MASK]]
;		;
%cmp = fcmp oeq half %a, %b		%cmp = fcmp oeq half %a, %b
%mask = call i64 @llvm.amdgcn.icmp.i1(i1 %cmp, i1 false, i32 33)		%mask = call i64 @llvm.amdgcn.icmp.i64.i1(i1 %cmp, i1 false, i32 33)
ret i64 %mask		ret i64 %mask
}		}

define i64 @fold_icmp_i1_ne_0_fcmp_oeq_f128(fp128 %a, fp128 %b) {		define i64 @fold_icmp_i1_ne_0_fcmp_oeq_f128(fp128 %a, fp128 %b) {
;
; CHECK-LABEL: @fold_icmp_i1_ne_0_fcmp_oeq_f128(		; CHECK-LABEL: @fold_icmp_i1_ne_0_fcmp_oeq_f128(
; CHECK-NEXT: [[CMP:%.]] = fcmp oeq fp128 [[A:%.]], [[B:%.*]]		; CHECK-NEXT: [[CMP:%.]] = fcmp oeq fp128 [[A:%.]], [[B:%.*]]
; CHECK-NEXT: [[MASK:%.*]] = call i64 @llvm.amdgcn.icmp.i1(i1 [[CMP]], i1 false, i32 33)		; CHECK-NEXT: [[MASK:%.*]] = call i64 @llvm.amdgcn.icmp.i64.i1(i1 [[CMP]], i1 false, i32 33)
; CHECK-NEXT: ret i64 [[MASK]]		; CHECK-NEXT: ret i64 [[MASK]]
;		;
%cmp = fcmp oeq fp128 %a, %b		%cmp = fcmp oeq fp128 %a, %b
%mask = call i64 @llvm.amdgcn.icmp.i1(i1 %cmp, i1 false, i32 33)		%mask = call i64 @llvm.amdgcn.icmp.i64.i1(i1 %cmp, i1 false, i32 33)
ret i64 %mask		ret i64 %mask
}		}

define i64 @fold_icmp_i1_ne_0_icmp_slt_i4(i4 %a, i4 %b) {		define i64 @fold_icmp_i1_ne_0_icmp_slt_i4(i4 %a, i4 %b) {
; CHECK-LABEL: @fold_icmp_i1_ne_0_icmp_slt_i4(		; CHECK-LABEL: @fold_icmp_i1_ne_0_icmp_slt_i4(
; CHECK-NEXT: [[CMP:%.]] = icmp slt i4 [[A:%.]], [[B:%.*]]		; CHECK-NEXT: [[CMP:%.]] = icmp slt i4 [[A:%.]], [[B:%.*]]
; CHECK-NEXT: [[MASK:%.*]] = call i64 @llvm.amdgcn.icmp.i1(i1 [[CMP]], i1 false, i32 33)		; CHECK-NEXT: [[MASK:%.*]] = call i64 @llvm.amdgcn.icmp.i64.i1(i1 [[CMP]], i1 false, i32 33)
; CHECK-NEXT: ret i64 [[MASK]]		; CHECK-NEXT: ret i64 [[MASK]]
;		;
%cmp = icmp slt i4 %a, %b		%cmp = icmp slt i4 %a, %b
%mask = call i64 @llvm.amdgcn.icmp.i1(i1 %cmp, i1 false, i32 33)		%mask = call i64 @llvm.amdgcn.icmp.i64.i1(i1 %cmp, i1 false, i32 33)
ret i64 %mask		ret i64 %mask
}		}

define i64 @fold_icmp_i1_ne_0_icmp_slt_i8(i8 %a, i8 %b) {		define i64 @fold_icmp_i1_ne_0_icmp_slt_i8(i8 %a, i8 %b) {
; CHECK-LABEL: @fold_icmp_i1_ne_0_icmp_slt_i8(		; CHECK-LABEL: @fold_icmp_i1_ne_0_icmp_slt_i8(
; CHECK-NEXT: [[CMP:%.]] = icmp slt i8 [[A:%.]], [[B:%.*]]		; CHECK-NEXT: [[CMP:%.]] = icmp slt i8 [[A:%.]], [[B:%.*]]
; CHECK-NEXT: [[MASK:%.*]] = call i64 @llvm.amdgcn.icmp.i1(i1 [[CMP]], i1 false, i32 33)		; CHECK-NEXT: [[MASK:%.*]] = call i64 @llvm.amdgcn.icmp.i64.i1(i1 [[CMP]], i1 false, i32 33)
; CHECK-NEXT: ret i64 [[MASK]]		; CHECK-NEXT: ret i64 [[MASK]]
;		;
%cmp = icmp slt i8 %a, %b		%cmp = icmp slt i8 %a, %b
%mask = call i64 @llvm.amdgcn.icmp.i1(i1 %cmp, i1 false, i32 33)		%mask = call i64 @llvm.amdgcn.icmp.i64.i1(i1 %cmp, i1 false, i32 33)
ret i64 %mask		ret i64 %mask
}		}

define i64 @fold_icmp_i1_ne_0_icmp_slt_i16(i16 %a, i16 %b) {		define i64 @fold_icmp_i1_ne_0_icmp_slt_i16(i16 %a, i16 %b) {
; CHECK-LABEL: @fold_icmp_i1_ne_0_icmp_slt_i16(		; CHECK-LABEL: @fold_icmp_i1_ne_0_icmp_slt_i16(
; CHECK-NEXT: [[CMP:%.]] = icmp slt i16 [[A:%.]], [[B:%.*]]		; CHECK-NEXT: [[CMP:%.]] = icmp slt i16 [[A:%.]], [[B:%.*]]
; CHECK-NEXT: [[MASK:%.*]] = call i64 @llvm.amdgcn.icmp.i1(i1 [[CMP]], i1 false, i32 33)		; CHECK-NEXT: [[MASK:%.*]] = call i64 @llvm.amdgcn.icmp.i64.i1(i1 [[CMP]], i1 false, i32 33)
; CHECK-NEXT: ret i64 [[MASK]]		; CHECK-NEXT: ret i64 [[MASK]]
;		;
%cmp = icmp slt i16 %a, %b		%cmp = icmp slt i16 %a, %b
%mask = call i64 @llvm.amdgcn.icmp.i1(i1 %cmp, i1 false, i32 33)		%mask = call i64 @llvm.amdgcn.icmp.i64.i1(i1 %cmp, i1 false, i32 33)
ret i64 %mask		ret i64 %mask
}		}

define i64 @fold_icmp_i1_ne_0_icmp_ult_i4(i4 %a, i4 %b) {		define i64 @fold_icmp_i1_ne_0_icmp_ult_i4(i4 %a, i4 %b) {
; CHECK-LABEL: @fold_icmp_i1_ne_0_icmp_ult_i4(		; CHECK-LABEL: @fold_icmp_i1_ne_0_icmp_ult_i4(
; CHECK-NEXT: [[CMP:%.]] = icmp ult i4 [[A:%.]], [[B:%.*]]		; CHECK-NEXT: [[CMP:%.]] = icmp ult i4 [[A:%.]], [[B:%.*]]
; CHECK-NEXT: [[MASK:%.*]] = call i64 @llvm.amdgcn.icmp.i1(i1 [[CMP]], i1 false, i32 33)		; CHECK-NEXT: [[MASK:%.*]] = call i64 @llvm.amdgcn.icmp.i64.i1(i1 [[CMP]], i1 false, i32 33)
; CHECK-NEXT: ret i64 [[MASK]]		; CHECK-NEXT: ret i64 [[MASK]]
;		;
%cmp = icmp ult i4 %a, %b		%cmp = icmp ult i4 %a, %b
%mask = call i64 @llvm.amdgcn.icmp.i1(i1 %cmp, i1 false, i32 33)		%mask = call i64 @llvm.amdgcn.icmp.i64.i1(i1 %cmp, i1 false, i32 33)
ret i64 %mask		ret i64 %mask
}		}

define i64 @fold_icmp_i1_ne_0_icmp_ult_i8(i8 %a, i8 %b) {		define i64 @fold_icmp_i1_ne_0_icmp_ult_i8(i8 %a, i8 %b) {
; CHECK-LABEL: @fold_icmp_i1_ne_0_icmp_ult_i8(		; CHECK-LABEL: @fold_icmp_i1_ne_0_icmp_ult_i8(
; CHECK-NEXT: [[CMP:%.]] = icmp ult i8 [[A:%.]], [[B:%.*]]		; CHECK-NEXT: [[CMP:%.]] = icmp ult i8 [[A:%.]], [[B:%.*]]
; CHECK-NEXT: [[MASK:%.*]] = call i64 @llvm.amdgcn.icmp.i1(i1 [[CMP]], i1 false, i32 33)		; CHECK-NEXT: [[MASK:%.*]] = call i64 @llvm.amdgcn.icmp.i64.i1(i1 [[CMP]], i1 false, i32 33)
; CHECK-NEXT: ret i64 [[MASK]]		; CHECK-NEXT: ret i64 [[MASK]]
;		;
%cmp = icmp ult i8 %a, %b		%cmp = icmp ult i8 %a, %b
%mask = call i64 @llvm.amdgcn.icmp.i1(i1 %cmp, i1 false, i32 33)		%mask = call i64 @llvm.amdgcn.icmp.i64.i1(i1 %cmp, i1 false, i32 33)
ret i64 %mask		ret i64 %mask
}		}

define i64 @fold_icmp_i1_ne_0_icmp_ult_i16(i16 %a, i16 %b) {		define i64 @fold_icmp_i1_ne_0_icmp_ult_i16(i16 %a, i16 %b) {
; CHECK-LABEL: @fold_icmp_i1_ne_0_icmp_ult_i16(		; CHECK-LABEL: @fold_icmp_i1_ne_0_icmp_ult_i16(
; CHECK-NEXT: [[CMP:%.]] = icmp ult i16 [[A:%.]], [[B:%.*]]		; CHECK-NEXT: [[CMP:%.]] = icmp ult i16 [[A:%.]], [[B:%.*]]
; CHECK-NEXT: [[MASK:%.*]] = call i64 @llvm.amdgcn.icmp.i1(i1 [[CMP]], i1 false, i32 33)		; CHECK-NEXT: [[MASK:%.*]] = call i64 @llvm.amdgcn.icmp.i64.i1(i1 [[CMP]], i1 false, i32 33)
; CHECK-NEXT: ret i64 [[MASK]]		; CHECK-NEXT: ret i64 [[MASK]]
;		;
%cmp = icmp ult i16 %a, %b		%cmp = icmp ult i16 %a, %b
%mask = call i64 @llvm.amdgcn.icmp.i1(i1 %cmp, i1 false, i32 33)		%mask = call i64 @llvm.amdgcn.icmp.i64.i1(i1 %cmp, i1 false, i32 33)
ret i64 %mask		ret i64 %mask
}		}

; --------------------------------------------------------------------		; --------------------------------------------------------------------
; llvm.amdgcn.fcmp		; llvm.amdgcn.fcmp
; --------------------------------------------------------------------		; --------------------------------------------------------------------

declare i64 @llvm.amdgcn.fcmp.f32(float, float, i32 immarg) nounwind readnone convergent		declare i64 @llvm.amdgcn.fcmp.i64.f32(float, float, i32 immarg) nounwind readnone convergent

define i64 @invalid_fcmp_code(float %a, float %b) {		define i64 @invalid_fcmp_code(float %a, float %b) {
; CHECK-LABEL: @invalid_fcmp_code(		; CHECK-LABEL: @invalid_fcmp_code(
; CHECK-NEXT: [[UNDER:%.]] = call i64 @llvm.amdgcn.fcmp.f32(float [[A:%.]], float [[B:%.*]], i32 -1)		; CHECK-NEXT: [[UNDER:%.]] = call i64 @llvm.amdgcn.fcmp.i64.f32(float [[A:%.]], float [[B:%.*]], i32 -1)
; CHECK-NEXT: [[OVER:%.*]] = call i64 @llvm.amdgcn.fcmp.f32(float [[A]], float [[B]], i32 16)		; CHECK-NEXT: [[OVER:%.*]] = call i64 @llvm.amdgcn.fcmp.i64.f32(float [[A]], float [[B]], i32 16)
; CHECK-NEXT: [[OR:%.*]] = or i64 [[UNDER]], [[OVER]]		; CHECK-NEXT: [[OR:%.*]] = or i64 [[UNDER]], [[OVER]]
; CHECK-NEXT: ret i64 [[OR]]		; CHECK-NEXT: ret i64 [[OR]]
;		;
%under = call i64 @llvm.amdgcn.fcmp.f32(float %a, float %b, i32 -1)		%under = call i64 @llvm.amdgcn.fcmp.i64.f32(float %a, float %b, i32 -1)
%over = call i64 @llvm.amdgcn.fcmp.f32(float %a, float %b, i32 16)		%over = call i64 @llvm.amdgcn.fcmp.i64.f32(float %a, float %b, i32 16)
%or = or i64 %under, %over		%or = or i64 %under, %over
ret i64 %or		ret i64 %or
}		}

define i64 @fcmp_constant_inputs_false() {		define i64 @fcmp_constant_inputs_false() {
; CHECK-LABEL: @fcmp_constant_inputs_false(		; CHECK-LABEL: @fcmp_constant_inputs_false(
; CHECK-NEXT: ret i64 0		; CHECK-NEXT: ret i64 0
;		;
%result = call i64 @llvm.amdgcn.fcmp.f32(float 2.0, float 4.0, i32 1)		%result = call i64 @llvm.amdgcn.fcmp.i64.f32(float 2.0, float 4.0, i32 1)
ret i64 %result		ret i64 %result
}		}

define i64 @fcmp_constant_inputs_true() {		define i64 @fcmp_constant_inputs_true() {
; CHECK-LABEL: @fcmp_constant_inputs_true(		; CHECK-LABEL: @fcmp_constant_inputs_true(
; CHECK-NEXT: [[RESULT:%.*]] = call i64 @llvm.read_register.i64(metadata !0) #5		; CHECK-NEXT: [[RESULT:%.*]] = call i64 @llvm.read_register.i64(metadata !0) #5
; CHECK-NEXT: ret i64 [[RESULT]]		; CHECK-NEXT: ret i64 [[RESULT]]
;		;
%result = call i64 @llvm.amdgcn.fcmp.f32(float 2.0, float 4.0, i32 4)		%result = call i64 @llvm.amdgcn.fcmp.i64.f32(float 2.0, float 4.0, i32 4)
ret i64 %result		ret i64 %result
}		}

define i64 @fcmp_constant_to_rhs_olt(float %x) {		define i64 @fcmp_constant_to_rhs_olt(float %x) {
; CHECK-LABEL: @fcmp_constant_to_rhs_olt(		; CHECK-LABEL: @fcmp_constant_to_rhs_olt(
; CHECK-NEXT: [[RESULT:%.]] = call i64 @llvm.amdgcn.fcmp.f32(float [[X:%.]], float 4.000000e+00, i32 2)		; CHECK-NEXT: [[RESULT:%.]] = call i64 @llvm.amdgcn.fcmp.i64.f32(float [[X:%.]], float 4.000000e+00, i32 2)
; CHECK-NEXT: ret i64 [[RESULT]]		; CHECK-NEXT: ret i64 [[RESULT]]
;		;
%result = call i64 @llvm.amdgcn.fcmp.f32(float 4.0, float %x, i32 4)		%result = call i64 @llvm.amdgcn.fcmp.i64.f32(float 4.0, float %x, i32 4)
ret i64 %result		ret i64 %result
}		}

; --------------------------------------------------------------------		; --------------------------------------------------------------------
; llvm.amdgcn.wqm.vote		; llvm.amdgcn.wqm.vote
; --------------------------------------------------------------------		; --------------------------------------------------------------------

declare i1 @llvm.amdgcn.wqm.vote(i1)		declare i1 @llvm.amdgcn.wqm.vote(i1)
▲ Show 20 Lines • Show All 89 Lines • Show Last 20 Lines

test/Verifier/AMDGPU/intrinsic-immarg.ll

Show First 20 Lines • Show All 117 Lines • ▼ Show 20 Lines	define void @exp_compr_invalid_inputs(i32 %tgt, i32 %en, i1 %bool) {

; CHECK: immarg operand has non-immediate parameter		; CHECK: immarg operand has non-immediate parameter
; CHECK-NEXT: i1 %bool		; CHECK-NEXT: i1 %bool
; CHECK-NEXT: call void @llvm.amdgcn.exp.compr.v2f16(i32 0, i32 5, <2 x half> <half 0xH3C00, half 0xH4000>, <2 x half> <half 0xH3800, half 0xH4400>, i1 false, i1 %bool)		; CHECK-NEXT: call void @llvm.amdgcn.exp.compr.v2f16(i32 0, i32 5, <2 x half> <half 0xH3C00, half 0xH4000>, <2 x half> <half 0xH3800, half 0xH4400>, i1 false, i1 %bool)
call void @llvm.amdgcn.exp.compr.v2f16(i32 0, i32 5, <2 x half> <half 1.0, half 2.0>, <2 x half> <half 0.5, half 4.0>, i1 false, i1 %bool)		call void @llvm.amdgcn.exp.compr.v2f16(i32 0, i32 5, <2 x half> <half 1.0, half 2.0>, <2 x half> <half 0.5, half 4.0>, i1 false, i1 %bool)
ret void		ret void
}		}

declare i64 @llvm.amdgcn.icmp.i32(i32, i32, i32)		declare i64 @llvm.amdgcn.icmp.i64.i32(i32, i32, i32)

define i64 @invalid_nonconstant_icmp_code(i32 %a, i32 %b, i32 %c) {		define i64 @invalid_nonconstant_icmp_code(i32 %a, i32 %b, i32 %c) {
; CHECK: immarg operand has non-immediate parameter		; CHECK: immarg operand has non-immediate parameter
; CHECK-NEXT: i32 %c		; CHECK-NEXT: i32 %c
; CHECK-NEXT: %result = call i64 @llvm.amdgcn.icmp.i32(i32 %a, i32 %b, i32 %c)		; CHECK-NEXT: %result = call i64 @llvm.amdgcn.icmp.i64.i32(i32 %a, i32 %b, i32 %c)
%result = call i64 @llvm.amdgcn.icmp.i32(i32 %a, i32 %b, i32 %c)		%result = call i64 @llvm.amdgcn.icmp.i64.i32(i32 %a, i32 %b, i32 %c)
ret i64 %result		ret i64 %result
}		}

declare i64 @llvm.amdgcn.fcmp.f32(float, float, i32)		declare i64 @llvm.amdgcn.fcmp.i64.f32(float, float, i32)
define i64 @invalid_nonconstant_fcmp_code(float %a, float %b, i32 %c) {		define i64 @invalid_nonconstant_fcmp_code(float %a, float %b, i32 %c) {
; CHECK: immarg operand has non-immediate parameter		; CHECK: immarg operand has non-immediate parameter
; CHECK-NEXT: i32 %c		; CHECK-NEXT: i32 %c
; CHECK-NEXT: %result = call i64 @llvm.amdgcn.fcmp.f32(float %a, float %b, i32 %c)		; CHECK-NEXT: %result = call i64 @llvm.amdgcn.fcmp.i64.f32(float %a, float %b, i32 %c)
%result = call i64 @llvm.amdgcn.fcmp.f32(float %a, float %b, i32 %c)		%result = call i64 @llvm.amdgcn.fcmp.i64.f32(float %a, float %b, i32 %c)
ret i64 %result		ret i64 %result
}		}

declare i32 @llvm.amdgcn.atomic.inc.i32.p3i32(i32 addrspace(3)* nocapture, i32, i32, i32, i1)		declare i32 @llvm.amdgcn.atomic.inc.i32.p3i32(i32 addrspace(3)* nocapture, i32, i32, i32, i1)
define amdgpu_kernel void @invalid_atomic_inc(i32 addrspace(1)* %out, i32 addrspace(3)* %ptr, i32 %var, i1 %bool) {		define amdgpu_kernel void @invalid_atomic_inc(i32 addrspace(1)* %out, i32 addrspace(3)* %ptr, i32 %var, i1 %bool) {
; CHECK: immarg operand has non-immediate parameter		; CHECK: immarg operand has non-immediate parameter
; CHECK-NEXT: i32 %var		; CHECK-NEXT: i32 %var
; CHECK-NEXT: %result0 = call i32 @llvm.amdgcn.atomic.inc.i32.p3i32(i32 addrspace(3)* %ptr, i32 42, i32 %var, i32 0, i1 false)		; CHECK-NEXT: %result0 = call i32 @llvm.amdgcn.atomic.inc.i32.p3i32(i32 addrspace(3)* %ptr, i32 42, i32 %var, i32 0, i1 false)
▲ Show 20 Lines • Show All 431 Lines • Show Last 20 Lines

This is an archive of the discontinued LLVM Phabricator instance.

[AMDGPU] gfx1010 wave32 icmp/fcmp intrinsic changes for wave32ClosedPublic

Details

Diff Detail

Event Timeline

Revision Contents

Diff 204635

include/llvm/IR/IntrinsicsAMDGPU.td

lib/Target/AMDGPU/AMDGPUAtomicOptimizer.cpp

lib/Target/AMDGPU/AMDGPUSubtarget.h

lib/Target/AMDGPU/SIAnnotateControlFlow.cpp

lib/Target/AMDGPU/SIISelLowering.cpp

lib/Target/AMDGPU/SIInstructions.td

lib/Transforms/InstCombine/InstCombineCalls.cpp

test/CodeGen/AMDGPU/diverge-switch-default.ll

test/CodeGen/AMDGPU/loop_break.ll

test/CodeGen/AMDGPU/multi-divergent-exit-region.ll

test/CodeGen/AMDGPU/multilevel-break.ll

test/CodeGen/AMDGPU/nested-loop-conditions.ll

test/CodeGen/AMDGPU/si-annotate-cf-unreachable.ll

test/CodeGen/AMDGPU/si-annotatecfg-multiple-backedges.ll

test/Transforms/InstCombine/AMDGPU/amdgcn-intrinsics.ll

test/Verifier/AMDGPU/intrinsic-immarg.ll

[AMDGPU] gfx1010 wave32 icmp/fcmp intrinsic changes for wave32
ClosedPublic