This is an archive of the discontinued LLVM Phabricator instance.

[ARM] Enable multiple icmp when tail folding
Needs ReviewPublic

Authored by samparker on Sep 17 2020, 5:30 AM.

Download Raw Diff

Details

Reviewers

SjoerdMeijer
dmgreen

Summary

Now that we are handling VCMP and VPT during the tail predication, we should allow more than a single icmp when deciding when to tail fold or not.

Diff Detail

Event Timeline

samparker created this revision.Sep 17 2020, 5:30 AM

Herald added a project: Restricted Project. · View Herald TranscriptSep 17 2020, 5:30 AM

Herald added subscribers: danielkiss, hiraditya, kristof.beyls. · View Herald Transcript

samparker requested review of this revision.Sep 17 2020, 5:30 AM

samparker added a parent revision: D87819: [ARM] Find VPT implicitly predicated by VCTP.

Harbormaster completed remote builds in B72002: Diff 292470.Sep 17 2020, 6:04 AM

I have the feeling that this was protecting us from more than just things that would become vpt blocks. I think we have a check that the original was only a single loop? That would keep things simpler at least, to mostly icmp + select (and maybe zext(icmp), but as far as I understand that should be fine). Things like min/max/abs should all work OK, and select on their own. But what about the saturating intrinsics we have? vqmovn for example.

llvm/lib/Target/ARM/ARMTargetTransformInfo.cpp
1592	Fcmp too, I would expect could be treated the same way as icmp.
llvm/test/Transforms/LoopVectorize/ARM/tail-folding-not-allowed.ll
572	Apparently we have two files for these, and they should now maybe belong in tail-folding-allowed.ll

original was only a single loop?

Do you mean a single block loop? If so, I thought the the vectorizer would perform if-conversion after calling this? And for saturation, does the vectorizer not yet consume saturating intrinsics? Are all of ours ones that merge/retain previous values, maybe their idiom could be checked for?

In D87826#2281286, @samparker wrote:

original was only a single loop?

Do you mean a single block loop? If so, I thought the the vectorizer would perform if-conversion after calling this? And for saturation, does the vectorizer not yet consume saturating intrinsics? Are all of ours ones that merge/retain previous values, maybe their idiom could be checked for?

Yeah sorry. Single block loops. So there should be no ifconversion, and we will only be dealing with icmp+select. That will keep things simpler - less complex loop structures to worry about.

There are many forms of saturating instructions. In this case I was just talking about something that does min(max(x, -128), 127) or an i16, for example. We turn that into a vqmovn in the backend, with some bitcasts for the types. I had presumed that the "n" in there would be a problem, but they look like they are marked as valid of tail pred, even if they do change size.

It is still probably worth checking some examples, to make sure they are still doing as well with this change.

Revision Contents

Path

Size

llvm/

lib/

Target/

ARM/

ARMTargetTransformInfo.cpp

21 lines

test/

Transforms/

LoopVectorize/

ARM/

mve-reduction-predselect.ll

64 lines

prefer-tail-loop-folding.ll

8 lines

tail-fold-multiple-icmps.ll

56 lines

tail-folding-not-allowed.ll

160 lines

Diff 292470

llvm/lib/Target/ARM/ARMTargetTransformInfo.cpp

Show First 20 Lines • Show All 1,582 Lines • ▼ Show 20 Lines	bool ARMTTIImpl::isHardwareLoopProfitable(Loop *L, ScalarEvolution &SE,
HWLoopInfo.CounterInReg = true;		HWLoopInfo.CounterInReg = true;
HWLoopInfo.IsNestingLegal = false;		HWLoopInfo.IsNestingLegal = false;
HWLoopInfo.PerformEntryTest = true;		HWLoopInfo.PerformEntryTest = true;
HWLoopInfo.CountType = Type::getInt32Ty(C);		HWLoopInfo.CountType = Type::getInt32Ty(C);
HWLoopInfo.LoopDecrement = ConstantInt::get(HWLoopInfo.CountType, 1);		HWLoopInfo.LoopDecrement = ConstantInt::get(HWLoopInfo.CountType, 1);
return true;		return true;
}		}

static bool canTailPredicateInstruction(Instruction &I, int &ICmpCount) {		static bool canTailPredicateInstruction(Instruction &I) {
// We don't allow icmp's, and because we only look at single block loops,
// we simply count the icmps, i.e. there should only be 1 for the backedge.
if (isa<ICmpInst>(&I) && ++ICmpCount > 1)
return false;

if (isa<FCmpInst>(&I))		if (isa<FCmpInst>(&I))
		dmgreenUnsubmitted Not Done Reply Inline Actions Fcmp too, I would expect could be treated the same way as icmp. dmgreen: Fcmp too, I would expect could be treated the same way as icmp.
return false;		return false;

// We could allow extending/narrowing FP loads/stores, but codegen is		// We could allow extending/narrowing FP loads/stores, but codegen is
// too inefficient so reject this for now.		// too inefficient so reject this for now.
if (isa<FPExtInst>(&I) \|\| isa<FPTruncInst>(&I))		if (isa<FPExtInst>(&I) \|\| isa<FPTruncInst>(&I))
return false;		return false;

// Extends have to be extending-loads		// Extends have to be extending-loads
Show All 32 Lines	static bool canTailPredicateLoop(Loop L, LoopInfo LI, ScalarEvolution &SE,
// vectorizer cannot tailfold the loop), we should fall back to standard		// vectorizer cannot tailfold the loop), we should fall back to standard
// vectorization automatically.		// vectorization automatically.
SmallVector< Instruction *, 8 > LiveOuts;		SmallVector< Instruction *, 8 > LiveOuts;
LiveOuts = llvm::findDefsUsedOutsideOfLoop(L);		LiveOuts = llvm::findDefsUsedOutsideOfLoop(L);
bool ReductionsDisabled =		bool ReductionsDisabled =
EnableTailPredication == TailPredication::EnabledNoReductions \|\|		EnableTailPredication == TailPredication::EnabledNoReductions \|\|
EnableTailPredication == TailPredication::ForceEnabledNoReductions;		EnableTailPredication == TailPredication::ForceEnabledNoReductions;

		if (ReductionsDisabled && !LiveOuts.empty()) {
		LLVM_DEBUG(dbgs() << "Reductions not enabled\n");
		return false;
		}

for (auto *I : LiveOuts) {		for (auto *I : LiveOuts) {
if (!I->getType()->isIntegerTy() && !I->getType()->isFloatTy() &&		if (!I->getType()->isIntegerTy() && !I->getType()->isFloatTy() &&
!I->getType()->isHalfTy()) {		!I->getType()->isHalfTy()) {
LLVM_DEBUG(dbgs() << "Don't tail-predicate loop with non-integer/float "		LLVM_DEBUG(dbgs() << "Don't tail-predicate loop with non-integer/float "
"live-out value\n");		"live-out value\n");
return false;		return false;
}		}
if (ReductionsDisabled) {
LLVM_DEBUG(dbgs() << "Reductions not enabled\n");
return false;
}
}		}

// Next, check that all instructions can be tail-predicated.		// Next, check that all instructions can be tail-predicated.
PredicatedScalarEvolution PSE = LAI->getPSE();		PredicatedScalarEvolution PSE = LAI->getPSE();
SmallVector<Instruction *, 16> LoadStores;		SmallVector<Instruction *, 16> LoadStores;
int ICmpCount = 0;

for (BasicBlock *BB : L->blocks()) {		for (BasicBlock *BB : L->blocks()) {
for (Instruction &I : BB->instructionsWithoutDebug()) {		for (Instruction &I : BB->instructionsWithoutDebug()) {
if (isa<PHINode>(&I))		if (isa<PHINode>(&I))
continue;		continue;
if (!canTailPredicateInstruction(I, ICmpCount)) {		if (!canTailPredicateInstruction(I)) {
LLVM_DEBUG(dbgs() << "Instruction not allowed: "; I.dump());		LLVM_DEBUG(dbgs() << "Instruction not allowed: "; I.dump());
return false;		return false;
}		}

Type *T = I.getType();		Type *T = I.getType();
if (T->isPointerTy())		if (T->isPointerTy())
T = T->getPointerElementType();		T = T->getPointerElementType();

Show All 25 Lines	for (Instruction &I : BB->instructionsWithoutDebug()) {
const SCEV *PtrScev =		const SCEV *PtrScev =
replaceSymbolicStrideSCEV(PSE, llvm::ValueToValueMap(), Ptr);		replaceSymbolicStrideSCEV(PSE, llvm::ValueToValueMap(), Ptr);
if (auto AR = dyn_cast<SCEVAddRecExpr>(PtrScev)) {		if (auto AR = dyn_cast<SCEVAddRecExpr>(PtrScev)) {
const SCEV Step = AR->getStepRecurrence(PSE.getSE());		const SCEV Step = AR->getStepRecurrence(PSE.getSE());
if (PSE.getSE()->isLoopInvariant(Step, L))		if (PSE.getSE()->isLoopInvariant(Step, L))
continue;		continue;
}		}
}		}
LLVM_DEBUG(dbgs() << "Bad stride found, can't "		LLVM_DEBUG(dbgs() << "Bad stride (" << NextStride << ") found, can't "
		Lint: Pre-merge checks Inline Actions clang-format: please reformat the code - LLVM_DEBUG(dbgs() << "Bad stride (" << NextStride << ") found, can't " + LLVM_DEBUG(dbgs() << "Bad stride (" << NextStride + << ") found, can't " Lint: Pre-merge checks: clang-format: please reformat the code ``` - LLVM_DEBUG(dbgs() << "Bad stride (" <<…
"tail-predicate\n.");		"tail-predicate\n.");
return false;		return false;
}		}
}		}
}		}

LLVM_DEBUG(dbgs() << "tail-predication: all instructions allowed!\n");		LLVM_DEBUG(dbgs() << "tail-predication: all instructions allowed!\n");
return true;		return true;
▲ Show 20 Lines • Show All 165 Lines • Show Last 20 Lines

llvm/test/Transforms/LoopVectorize/ARM/mve-reduction-predselect.ll

	Show First 20 Lines • Show All 435 Lines • ▼ Show 20 Lines
	define i32 @reduction_min(i32* nocapture %A, i32* nocapture %B) {			define i32 @reduction_min(i32* nocapture %A, i32* nocapture %B) {
	; CHECK-LABEL: @reduction_min(			; CHECK-LABEL: @reduction_min(
	; CHECK-NEXT: entry:			; CHECK-NEXT: entry:
	; CHECK-NEXT: br i1 false, label [[SCALAR_PH:%.]], label [[VECTOR_PH:%.]]			; CHECK-NEXT: br i1 false, label [[SCALAR_PH:%.]], label [[VECTOR_PH:%.]]
	; CHECK: vector.ph:			; CHECK: vector.ph:
	; CHECK-NEXT: br label [[VECTOR_BODY:%.*]]			; CHECK-NEXT: br label [[VECTOR_BODY:%.*]]
	; CHECK: vector.body:			; CHECK: vector.body:
	; CHECK-NEXT: [[INDEX:%.]] = phi i32 [ 0, [[VECTOR_PH]] ], [ [[INDEX_NEXT:%.]], [[VECTOR_BODY]] ]			; CHECK-NEXT: [[INDEX:%.]] = phi i32 [ 0, [[VECTOR_PH]] ], [ [[INDEX_NEXT:%.]], [[VECTOR_BODY]] ]
	; CHECK-NEXT: [[VEC_PHI:%.]] = phi <4 x i32> [ <i32 1000, i32 1000, i32 1000, i32 1000>, [[VECTOR_PH]] ], [ [[TMP3:%.]], [[VECTOR_BODY]] ]			; CHECK-NEXT: [[VEC_PHI:%.]] = phi <4 x i32> [ <i32 1000, i32 1000, i32 1000, i32 1000>, [[VECTOR_PH]] ], [ [[TMP4:%.]], [[VECTOR_BODY]] ]
	; CHECK-NEXT: [[TMP0:%.]] = getelementptr inbounds i32, i32 [[A:%.*]], i32 [[INDEX]]			; CHECK-NEXT: [[TMP0:%.]] = getelementptr inbounds i32, i32 [[A:%.*]], i32 [[INDEX]]
				; CHECK-NEXT: [[ACTIVE_LANE_MASK:%.*]] = call <4 x i1> @llvm.get.active.lane.mask.v4i1.i32(i32 [[INDEX]], i32 257)
	; CHECK-NEXT: [[TMP1:%.]] = bitcast i32 [[TMP0]] to <4 x i32>*			; CHECK-NEXT: [[TMP1:%.]] = bitcast i32 [[TMP0]] to <4 x i32>*
	; CHECK-NEXT: [[WIDE_LOAD:%.]] = load <4 x i32>, <4 x i32> [[TMP1]], align 4			; CHECK-NEXT: [[WIDE_MASKED_LOAD:%.]] = call <4 x i32> @llvm.masked.load.v4i32.p0v4i32(<4 x i32> [[TMP1]], i32 4, <4 x i1> [[ACTIVE_LANE_MASK]], <4 x i32> undef)
	; CHECK-NEXT: [[TMP2:%.*]] = icmp slt <4 x i32> [[VEC_PHI]], [[WIDE_LOAD]]			; CHECK-NEXT: [[TMP2:%.*]] = icmp slt <4 x i32> [[VEC_PHI]], [[WIDE_MASKED_LOAD]]
	; CHECK-NEXT: [[TMP3]] = select <4 x i1> [[TMP2]], <4 x i32> [[VEC_PHI]], <4 x i32> [[WIDE_LOAD]]			; CHECK-NEXT: [[TMP3:%.*]] = select <4 x i1> [[TMP2]], <4 x i32> [[VEC_PHI]], <4 x i32> [[WIDE_MASKED_LOAD]]
				; CHECK-NEXT: [[TMP4]] = select <4 x i1> [[ACTIVE_LANE_MASK]], <4 x i32> [[TMP3]], <4 x i32> [[VEC_PHI]]
	; CHECK-NEXT: [[INDEX_NEXT]] = add i32 [[INDEX]], 4			; CHECK-NEXT: [[INDEX_NEXT]] = add i32 [[INDEX]], 4
	; CHECK-NEXT: [[TMP4:%.*]] = icmp eq i32 [[INDEX_NEXT]], 256			; CHECK-NEXT: [[TMP5:%.*]] = icmp eq i32 [[INDEX_NEXT]], 260
	; CHECK-NEXT: br i1 [[TMP4]], label [[MIDDLE_BLOCK:%.]], label [[VECTOR_BODY]], [[LOOP18:!llvm.loop !.]]			; CHECK-NEXT: br i1 [[TMP5]], label [[MIDDLE_BLOCK:%.]], label [[VECTOR_BODY]], [[LOOP18:!llvm.loop !.]]
	; CHECK: middle.block:			; CHECK: middle.block:
	; CHECK-NEXT: [[TMP5:%.*]] = call i32 @llvm.experimental.vector.reduce.smin.v4i32(<4 x i32> [[TMP3]])			; CHECK-NEXT: [[TMP6:%.*]] = call i32 @llvm.experimental.vector.reduce.smin.v4i32(<4 x i32> [[TMP4]])
	; CHECK-NEXT: br i1 false, label [[FOR_END:%.*]], label [[SCALAR_PH]]			; CHECK-NEXT: br i1 true, label [[FOR_END:%.*]], label [[SCALAR_PH]]
	; CHECK: scalar.ph:			; CHECK: scalar.ph:
	; CHECK-NEXT: [[BC_RESUME_VAL:%.]] = phi i32 [ 256, [[MIDDLE_BLOCK]] ], [ 0, [[ENTRY:%.]] ]
	; CHECK-NEXT: [[BC_MERGE_RDX:%.*]] = phi i32 [ [[TMP5]], [[MIDDLE_BLOCK]] ], [ 1000, [[ENTRY]] ]
	; CHECK-NEXT: br label [[FOR_BODY:%.*]]			; CHECK-NEXT: br label [[FOR_BODY:%.*]]
	; CHECK: for.body:			; CHECK: for.body:
	; CHECK-NEXT: [[INDVARS_IV:%.]] = phi i32 [ [[INDVARS_IV_NEXT:%.]], [[FOR_BODY]] ], [ [[BC_RESUME_VAL]], [[SCALAR_PH]] ]			; CHECK-NEXT: br i1 undef, label [[FOR_END]], label [[FOR_BODY]], [[LOOP19:!llvm.loop !.*]]
	; CHECK-NEXT: [[RESULT_08:%.]] = phi i32 [ [[V0:%.]], [[FOR_BODY]] ], [ [[BC_MERGE_RDX]], [[SCALAR_PH]] ]
	; CHECK-NEXT: [[ARRAYIDX:%.]] = getelementptr inbounds i32, i32 [[A]], i32 [[INDVARS_IV]]
	; CHECK-NEXT: [[L0:%.]] = load i32, i32 [[ARRAYIDX]], align 4
	; CHECK-NEXT: [[C0:%.*]] = icmp slt i32 [[RESULT_08]], [[L0]]
	; CHECK-NEXT: [[V0]] = select i1 [[C0]], i32 [[RESULT_08]], i32 [[L0]]
	; CHECK-NEXT: [[INDVARS_IV_NEXT]] = add i32 [[INDVARS_IV]], 1
	; CHECK-NEXT: [[EXITCOND:%.*]] = icmp eq i32 [[INDVARS_IV_NEXT]], 257
	; CHECK-NEXT: br i1 [[EXITCOND]], label [[FOR_END]], label [[FOR_BODY]], [[LOOP19:!llvm.loop !.*]]
	; CHECK: for.end:			; CHECK: for.end:
	; CHECK-NEXT: [[RESULT_0_LCSSA:%.*]] = phi i32 [ [[V0]], [[FOR_BODY]] ], [ [[TMP5]], [[MIDDLE_BLOCK]] ]			; CHECK-NEXT: [[RESULT_0_LCSSA:%.*]] = phi i32 [ undef, [[FOR_BODY]] ], [ [[TMP6]], [[MIDDLE_BLOCK]] ]
	; CHECK-NEXT: ret i32 [[RESULT_0_LCSSA]]			; CHECK-NEXT: ret i32 [[RESULT_0_LCSSA]]
	;			;
	entry:			entry:
	br label %for.body			br label %for.body

	for.body: ; preds = %entry, %for.body			for.body: ; preds = %entry, %for.body
	%indvars.iv = phi i32 [ %indvars.iv.next, %for.body ], [ 0, %entry ]			%indvars.iv = phi i32 [ %indvars.iv.next, %for.body ], [ 0, %entry ]
	%result.08 = phi i32 [ %v0, %for.body ], [ 1000, %entry ]			%result.08 = phi i32 [ %v0, %for.body ], [ 1000, %entry ]
	Show All 13 Lines
	define i32 @reduction_max(i32* nocapture %A, i32* nocapture %B) {			define i32 @reduction_max(i32* nocapture %A, i32* nocapture %B) {
	; CHECK-LABEL: @reduction_max(			; CHECK-LABEL: @reduction_max(
	; CHECK-NEXT: entry:			; CHECK-NEXT: entry:
	; CHECK-NEXT: br i1 false, label [[SCALAR_PH:%.]], label [[VECTOR_PH:%.]]			; CHECK-NEXT: br i1 false, label [[SCALAR_PH:%.]], label [[VECTOR_PH:%.]]
	; CHECK: vector.ph:			; CHECK: vector.ph:
	; CHECK-NEXT: br label [[VECTOR_BODY:%.*]]			; CHECK-NEXT: br label [[VECTOR_BODY:%.*]]
	; CHECK: vector.body:			; CHECK: vector.body:
	; CHECK-NEXT: [[INDEX:%.]] = phi i32 [ 0, [[VECTOR_PH]] ], [ [[INDEX_NEXT:%.]], [[VECTOR_BODY]] ]			; CHECK-NEXT: [[INDEX:%.]] = phi i32 [ 0, [[VECTOR_PH]] ], [ [[INDEX_NEXT:%.]], [[VECTOR_BODY]] ]
	; CHECK-NEXT: [[VEC_PHI:%.]] = phi <4 x i32> [ <i32 1000, i32 1000, i32 1000, i32 1000>, [[VECTOR_PH]] ], [ [[TMP3:%.]], [[VECTOR_BODY]] ]			; CHECK-NEXT: [[VEC_PHI:%.]] = phi <4 x i32> [ <i32 1000, i32 1000, i32 1000, i32 1000>, [[VECTOR_PH]] ], [ [[TMP4:%.]], [[VECTOR_BODY]] ]
	; CHECK-NEXT: [[TMP0:%.]] = getelementptr inbounds i32, i32 [[A:%.*]], i32 [[INDEX]]			; CHECK-NEXT: [[TMP0:%.]] = getelementptr inbounds i32, i32 [[A:%.*]], i32 [[INDEX]]
				; CHECK-NEXT: [[ACTIVE_LANE_MASK:%.*]] = call <4 x i1> @llvm.get.active.lane.mask.v4i1.i32(i32 [[INDEX]], i32 257)
	; CHECK-NEXT: [[TMP1:%.]] = bitcast i32 [[TMP0]] to <4 x i32>*			; CHECK-NEXT: [[TMP1:%.]] = bitcast i32 [[TMP0]] to <4 x i32>*
	; CHECK-NEXT: [[WIDE_LOAD:%.]] = load <4 x i32>, <4 x i32> [[TMP1]], align 4			; CHECK-NEXT: [[WIDE_MASKED_LOAD:%.]] = call <4 x i32> @llvm.masked.load.v4i32.p0v4i32(<4 x i32> [[TMP1]], i32 4, <4 x i1> [[ACTIVE_LANE_MASK]], <4 x i32> undef)
	; CHECK-NEXT: [[TMP2:%.*]] = icmp ugt <4 x i32> [[VEC_PHI]], [[WIDE_LOAD]]			; CHECK-NEXT: [[TMP2:%.*]] = icmp ugt <4 x i32> [[VEC_PHI]], [[WIDE_MASKED_LOAD]]
	; CHECK-NEXT: [[TMP3]] = select <4 x i1> [[TMP2]], <4 x i32> [[VEC_PHI]], <4 x i32> [[WIDE_LOAD]]			; CHECK-NEXT: [[TMP3:%.*]] = select <4 x i1> [[TMP2]], <4 x i32> [[VEC_PHI]], <4 x i32> [[WIDE_MASKED_LOAD]]
				; CHECK-NEXT: [[TMP4]] = select <4 x i1> [[ACTIVE_LANE_MASK]], <4 x i32> [[TMP3]], <4 x i32> [[VEC_PHI]]
	; CHECK-NEXT: [[INDEX_NEXT]] = add i32 [[INDEX]], 4			; CHECK-NEXT: [[INDEX_NEXT]] = add i32 [[INDEX]], 4
	; CHECK-NEXT: [[TMP4:%.*]] = icmp eq i32 [[INDEX_NEXT]], 256			; CHECK-NEXT: [[TMP5:%.*]] = icmp eq i32 [[INDEX_NEXT]], 260
	; CHECK-NEXT: br i1 [[TMP4]], label [[MIDDLE_BLOCK:%.]], label [[VECTOR_BODY]], [[LOOP20:!llvm.loop !.]]			; CHECK-NEXT: br i1 [[TMP5]], label [[MIDDLE_BLOCK:%.]], label [[VECTOR_BODY]], [[LOOP20:!llvm.loop !.]]
	; CHECK: middle.block:			; CHECK: middle.block:
	; CHECK-NEXT: [[TMP5:%.*]] = call i32 @llvm.experimental.vector.reduce.umax.v4i32(<4 x i32> [[TMP3]])			; CHECK-NEXT: [[TMP6:%.*]] = call i32 @llvm.experimental.vector.reduce.umax.v4i32(<4 x i32> [[TMP4]])
	; CHECK-NEXT: br i1 false, label [[FOR_END:%.*]], label [[SCALAR_PH]]			; CHECK-NEXT: br i1 true, label [[FOR_END:%.*]], label [[SCALAR_PH]]
	; CHECK: scalar.ph:			; CHECK: scalar.ph:
	; CHECK-NEXT: [[BC_RESUME_VAL:%.]] = phi i32 [ 256, [[MIDDLE_BLOCK]] ], [ 0, [[ENTRY:%.]] ]
	; CHECK-NEXT: [[BC_MERGE_RDX:%.*]] = phi i32 [ [[TMP5]], [[MIDDLE_BLOCK]] ], [ 1000, [[ENTRY]] ]
	; CHECK-NEXT: br label [[FOR_BODY:%.*]]			; CHECK-NEXT: br label [[FOR_BODY:%.*]]
	; CHECK: for.body:			; CHECK: for.body:
	; CHECK-NEXT: [[INDVARS_IV:%.]] = phi i32 [ [[INDVARS_IV_NEXT:%.]], [[FOR_BODY]] ], [ [[BC_RESUME_VAL]], [[SCALAR_PH]] ]			; CHECK-NEXT: br i1 undef, label [[FOR_END]], label [[FOR_BODY]], [[LOOP21:!llvm.loop !.*]]
	; CHECK-NEXT: [[RESULT_08:%.]] = phi i32 [ [[V0:%.]], [[FOR_BODY]] ], [ [[BC_MERGE_RDX]], [[SCALAR_PH]] ]
	; CHECK-NEXT: [[ARRAYIDX:%.]] = getelementptr inbounds i32, i32 [[A]], i32 [[INDVARS_IV]]
	; CHECK-NEXT: [[L0:%.]] = load i32, i32 [[ARRAYIDX]], align 4
	; CHECK-NEXT: [[C0:%.*]] = icmp ugt i32 [[RESULT_08]], [[L0]]
	; CHECK-NEXT: [[V0]] = select i1 [[C0]], i32 [[RESULT_08]], i32 [[L0]]
	; CHECK-NEXT: [[INDVARS_IV_NEXT]] = add i32 [[INDVARS_IV]], 1
	; CHECK-NEXT: [[EXITCOND:%.*]] = icmp eq i32 [[INDVARS_IV_NEXT]], 257
	; CHECK-NEXT: br i1 [[EXITCOND]], label [[FOR_END]], label [[FOR_BODY]], [[LOOP21:!llvm.loop !.*]]
	; CHECK: for.end:			; CHECK: for.end:
	; CHECK-NEXT: [[RESULT_0_LCSSA:%.*]] = phi i32 [ [[V0]], [[FOR_BODY]] ], [ [[TMP5]], [[MIDDLE_BLOCK]] ]			; CHECK-NEXT: [[RESULT_0_LCSSA:%.*]] = phi i32 [ undef, [[FOR_BODY]] ], [ [[TMP6]], [[MIDDLE_BLOCK]] ]
	; CHECK-NEXT: ret i32 [[RESULT_0_LCSSA]]			; CHECK-NEXT: ret i32 [[RESULT_0_LCSSA]]
	;			;
	entry:			entry:
	br label %for.body			br label %for.body

	for.body: ; preds = %entry, %for.body			for.body: ; preds = %entry, %for.body
	%indvars.iv = phi i32 [ %indvars.iv.next, %for.body ], [ 0, %entry ]			%indvars.iv = phi i32 [ %indvars.iv.next, %for.body ], [ 0, %entry ]
	%result.08 = phi i32 [ %v0, %for.body ], [ 1000, %entry ]			%result.08 = phi i32 [ %v0, %for.body ], [ 1000, %entry ]
	▲ Show 20 Lines • Show All 48 Lines • Show Last 20 Lines

llvm/test/Transforms/LoopVectorize/ARM/prefer-tail-loop-folding.ll

Show First 20 Lines • Show All 198 Lines • ▼ Show 20 Lines	for.body:
store i8 %conv, i8* %arrayidx2, align 1		store i8 %conv, i8* %arrayidx2, align 1
%add3 = add nuw nsw i32 %i.09, 1		%add3 = add nuw nsw i32 %i.09, 1
%exitcond = icmp eq i32 %add3, 431		%exitcond = icmp eq i32 %add3, 431
br i1 %exitcond, label %for.cond.cleanup, label %for.body		br i1 %exitcond, label %for.cond.cleanup, label %for.body
}		}

@tab = common global [32 x i8] zeroinitializer, align 1		@tab = common global [32 x i8] zeroinitializer, align 1

define i32 @icmp_not_allowed() #0 {		define i32 @icmp_now_allowed() #0 {
; CHECK-LABEL: icmp_not_allowed(		; CHECK-LABEL: icmp_now_allowed(
; PREFER-FOLDING: vector.body:		; PREFER-FOLDING: vector.body:
; PREFER-FOLDING-NOT: llvm.masked.load		; PREFER-FOLDING: llvm.masked.load
; PREFER-FOLDING-NOT: llvm.masked.store		; PREFER-FOLDING: llvm.masked.store
; PREFER-FOLDING: br i1 %{{.}}, label %{{.}}, label %vector.body		; PREFER-FOLDING: br i1 %{{.}}, label %{{.}}, label %vector.body
entry:		entry:
br label %for.body		br label %for.body

for.body:		for.body:
%i.08 = phi i32 [ 0, %entry ], [ %inc, %for.body ]		%i.08 = phi i32 [ 0, %entry ], [ %inc, %for.body ]
%arrayidx = getelementptr inbounds [32 x i8], [32 x i8]* @tab, i32 0, i32 %i.08		%arrayidx = getelementptr inbounds [32 x i8], [32 x i8]* @tab, i32 0, i32 %i.08
%0 = load i8, i8* %arrayidx, align 1		%0 = load i8, i8* %arrayidx, align 1
▲ Show 20 Lines • Show All 212 Lines • Show Last 20 Lines

llvm/test/Transforms/LoopVectorize/ARM/tail-fold-multiple-icmps.ll

	; NOTE: Assertions have been autogenerated by utils/update_test_checks.py			; NOTE: Assertions have been autogenerated by utils/update_test_checks.py
	; RUN: opt -mtriple=thumbv8.1m.main-none-none-eabi -mattr=+mve -tail-predication=enabled -loop-vectorize -instcombine -simplifycfg %s -S -o - \| FileCheck %s			; RUN: opt -mtriple=thumbv8.1m.main-none-none-eabi -mattr=+mve -tail-predication=enabled -loop-vectorize -instcombine -simplifycfg %s -S -o - \| FileCheck %s

	target datalayout = "e-m:e-p:32:32-Fi8-i64:64-v128:64:128-a:0:32-n32-S64"			target datalayout = "e-m:e-p:32:32-Fi8-i64:64-v128:64:128-a:0:32-n32-S64"

	define arm_aapcs_vfpcc i32 @minmaxval4(i32* nocapture readonly %x, i32* nocapture %minp, i32 %N) {			define arm_aapcs_vfpcc i32 @minmaxval4(i32* nocapture readonly %x, i32* nocapture %minp, i32 %N) {
	; CHECK-LABEL: @minmaxval4(			; CHECK-LABEL: @minmaxval4(
	; CHECK-NEXT: entry:			; CHECK-NEXT: entry:
	; CHECK-NEXT: [[CMP26_NOT:%.]] = icmp eq i32 [[N:%.]], 0			; CHECK-NEXT: [[CMP26_NOT:%.]] = icmp eq i32 [[N:%.]], 0
	; CHECK-NEXT: br i1 [[CMP26_NOT]], label [[FOR_COND_CLEANUP:%.]], label [[FOR_BODY_PREHEADER:%.]]			; CHECK-NEXT: br i1 [[CMP26_NOT]], label [[FOR_COND_CLEANUP:%.]], label [[VECTOR_PH:%.]]
	; CHECK: for.body.preheader:
	; CHECK-NEXT: [[MIN_ITERS_CHECK:%.*]] = icmp ult i32 [[N]], 4
	; CHECK-NEXT: br i1 [[MIN_ITERS_CHECK]], label [[SCALAR_PH:%.]], label [[VECTOR_PH:%.]]
	; CHECK: vector.ph:			; CHECK: vector.ph:
	; CHECK-NEXT: [[N_VEC:%.*]] = and i32 [[N]], -4			; CHECK-NEXT: [[N_RND_UP:%.*]] = add i32 [[N]], 3
				; CHECK-NEXT: [[N_VEC:%.*]] = and i32 [[N_RND_UP]], -4
	; CHECK-NEXT: br label [[VECTOR_BODY:%.*]]			; CHECK-NEXT: br label [[VECTOR_BODY:%.*]]
	; CHECK: vector.body:			; CHECK: vector.body:
	; CHECK-NEXT: [[INDEX:%.]] = phi i32 [ 0, [[VECTOR_PH]] ], [ [[INDEX_NEXT:%.]], [[VECTOR_BODY]] ]			; CHECK-NEXT: [[INDEX:%.]] = phi i32 [ 0, [[VECTOR_PH]] ], [ [[INDEX_NEXT:%.]], [[VECTOR_BODY]] ]
	; CHECK-NEXT: [[VEC_PHI:%.]] = phi <4 x i32> [ <i32 2147483647, i32 2147483647, i32 2147483647, i32 2147483647>, [[VECTOR_PH]] ], [ [[TMP5:%.]], [[VECTOR_BODY]] ]			; CHECK-NEXT: [[VEC_PHI:%.]] = phi <4 x i32> [ <i32 2147483647, i32 2147483647, i32 2147483647, i32 2147483647>, [[VECTOR_PH]] ], [ [[TMP5:%.]], [[VECTOR_BODY]] ]
	; CHECK-NEXT: [[VEC_PHI1:%.]] = phi <4 x i32> [ <i32 -2147483648, i32 -2147483648, i32 -2147483648, i32 -2147483648>, [[VECTOR_PH]] ], [ [[TMP3:%.]], [[VECTOR_BODY]] ]			; CHECK-NEXT: [[VEC_PHI1:%.]] = phi <4 x i32> [ <i32 -2147483648, i32 -2147483648, i32 -2147483648, i32 -2147483648>, [[VECTOR_PH]] ], [ [[TMP7:%.]], [[VECTOR_BODY]] ]
	; CHECK-NEXT: [[TMP0:%.]] = getelementptr inbounds i32, i32 [[X:%.*]], i32 [[INDEX]]			; CHECK-NEXT: [[TMP0:%.]] = getelementptr inbounds i32, i32 [[X:%.*]], i32 [[INDEX]]
				; CHECK-NEXT: [[ACTIVE_LANE_MASK:%.*]] = call <4 x i1> @llvm.get.active.lane.mask.v4i1.i32(i32 [[INDEX]], i32 [[N]])
	; CHECK-NEXT: [[TMP1:%.]] = bitcast i32 [[TMP0]] to <4 x i32>*			; CHECK-NEXT: [[TMP1:%.]] = bitcast i32 [[TMP0]] to <4 x i32>*
	; CHECK-NEXT: [[WIDE_LOAD:%.]] = load <4 x i32>, <4 x i32> [[TMP1]], align 4			; CHECK-NEXT: [[WIDE_MASKED_LOAD:%.]] = call <4 x i32> @llvm.masked.load.v4i32.p0v4i32(<4 x i32> [[TMP1]], i32 4, <4 x i1> [[ACTIVE_LANE_MASK]], <4 x i32> undef)
	; CHECK-NEXT: [[TMP2:%.*]] = icmp sgt <4 x i32> [[WIDE_LOAD]], [[VEC_PHI1]]			; CHECK-NEXT: [[TMP2:%.*]] = icmp sgt <4 x i32> [[WIDE_MASKED_LOAD]], [[VEC_PHI1]]
	; CHECK-NEXT: [[TMP3]] = select <4 x i1> [[TMP2]], <4 x i32> [[WIDE_LOAD]], <4 x i32> [[VEC_PHI1]]			; CHECK-NEXT: [[TMP3:%.*]] = icmp slt <4 x i32> [[WIDE_MASKED_LOAD]], [[VEC_PHI]]
	; CHECK-NEXT: [[TMP4:%.*]] = icmp slt <4 x i32> [[WIDE_LOAD]], [[VEC_PHI]]			; CHECK-NEXT: [[TMP4:%.*]] = and <4 x i1> [[ACTIVE_LANE_MASK]], [[TMP3]]
	; CHECK-NEXT: [[TMP5]] = select <4 x i1> [[TMP4]], <4 x i32> [[WIDE_LOAD]], <4 x i32> [[VEC_PHI]]			; CHECK-NEXT: [[TMP5]] = select <4 x i1> [[TMP4]], <4 x i32> [[WIDE_MASKED_LOAD]], <4 x i32> [[VEC_PHI]]
				; CHECK-NEXT: [[TMP6:%.*]] = and <4 x i1> [[ACTIVE_LANE_MASK]], [[TMP2]]
				; CHECK-NEXT: [[TMP7]] = select <4 x i1> [[TMP6]], <4 x i32> [[WIDE_MASKED_LOAD]], <4 x i32> [[VEC_PHI1]]
	; CHECK-NEXT: [[INDEX_NEXT]] = add i32 [[INDEX]], 4			; CHECK-NEXT: [[INDEX_NEXT]] = add i32 [[INDEX]], 4
	; CHECK-NEXT: [[TMP6:%.*]] = icmp eq i32 [[INDEX_NEXT]], [[N_VEC]]			; CHECK-NEXT: [[TMP8:%.*]] = icmp eq i32 [[INDEX_NEXT]], [[N_VEC]]
	; CHECK-NEXT: br i1 [[TMP6]], label [[MIDDLE_BLOCK:%.]], label [[VECTOR_BODY]], [[LOOP0:!llvm.loop !.]]			; CHECK-NEXT: br i1 [[TMP8]], label [[MIDDLE_BLOCK:%.]], label [[VECTOR_BODY]], [[LOOP0:!llvm.loop !.]]
	; CHECK: middle.block:			; CHECK: middle.block:
	; CHECK-NEXT: [[TMP7:%.*]] = call i32 @llvm.experimental.vector.reduce.smax.v4i32(<4 x i32> [[TMP3]])			; CHECK-NEXT: [[TMP9:%.*]] = call i32 @llvm.experimental.vector.reduce.smax.v4i32(<4 x i32> [[TMP7]])
	; CHECK-NEXT: [[TMP8:%.*]] = call i32 @llvm.experimental.vector.reduce.smin.v4i32(<4 x i32> [[TMP5]])			; CHECK-NEXT: [[TMP10:%.*]] = call i32 @llvm.experimental.vector.reduce.smin.v4i32(<4 x i32> [[TMP5]])
	; CHECK-NEXT: [[CMP_N:%.*]] = icmp eq i32 [[N_VEC]], [[N]]			; CHECK-NEXT: br label [[FOR_COND_CLEANUP]]
	; CHECK-NEXT: br i1 [[CMP_N]], label [[FOR_COND_CLEANUP]], label [[SCALAR_PH]]
	; CHECK: scalar.ph:
	; CHECK-NEXT: [[BC_RESUME_VAL:%.*]] = phi i32 [ [[N_VEC]], [[MIDDLE_BLOCK]] ], [ 0, [[FOR_BODY_PREHEADER]] ]
	; CHECK-NEXT: [[BC_MERGE_RDX:%.*]] = phi i32 [ [[TMP8]], [[MIDDLE_BLOCK]] ], [ 2147483647, [[FOR_BODY_PREHEADER]] ]
	; CHECK-NEXT: [[BC_MERGE_RDX2:%.*]] = phi i32 [ [[TMP7]], [[MIDDLE_BLOCK]] ], [ -2147483648, [[FOR_BODY_PREHEADER]] ]
	; CHECK-NEXT: br label [[FOR_BODY:%.*]]
	; CHECK: for.cond.cleanup:			; CHECK: for.cond.cleanup:
	; CHECK-NEXT: [[MAX_0_LCSSA:%.]] = phi i32 [ -2147483648, [[ENTRY:%.]] ], [ [[COND:%.*]], [[FOR_BODY]] ], [ [[TMP7]], [[MIDDLE_BLOCK]] ]			; CHECK-NEXT: [[MAX_0_LCSSA:%.]] = phi i32 [ -2147483648, [[ENTRY:%.]] ], [ [[TMP9]], [[MIDDLE_BLOCK]] ]
	; CHECK-NEXT: [[MIN_0_LCSSA:%.]] = phi i32 [ 2147483647, [[ENTRY]] ], [ [[COND9:%.]], [[FOR_BODY]] ], [ [[TMP8]], [[MIDDLE_BLOCK]] ]			; CHECK-NEXT: [[MIN_0_LCSSA:%.*]] = phi i32 [ 2147483647, [[ENTRY]] ], [ [[TMP10]], [[MIDDLE_BLOCK]] ]
	; CHECK-NEXT: store i32 [[MIN_0_LCSSA]], i32* [[MINP:%.*]], align 4			; CHECK-NEXT: store i32 [[MIN_0_LCSSA]], i32* [[MINP:%.*]], align 4
	; CHECK-NEXT: ret i32 [[MAX_0_LCSSA]]			; CHECK-NEXT: ret i32 [[MAX_0_LCSSA]]
	; CHECK: for.body:
	; CHECK-NEXT: [[I_029:%.]] = phi i32 [ [[INC:%.]], [[FOR_BODY]] ], [ [[BC_RESUME_VAL]], [[SCALAR_PH]] ]
	; CHECK-NEXT: [[MIN_028:%.*]] = phi i32 [ [[COND9]], [[FOR_BODY]] ], [ [[BC_MERGE_RDX]], [[SCALAR_PH]] ]
	; CHECK-NEXT: [[MAX_027:%.*]] = phi i32 [ [[COND]], [[FOR_BODY]] ], [ [[BC_MERGE_RDX2]], [[SCALAR_PH]] ]
	; CHECK-NEXT: [[ARRAYIDX:%.]] = getelementptr inbounds i32, i32 [[X]], i32 [[I_029]]
	; CHECK-NEXT: [[TMP9:%.]] = load i32, i32 [[ARRAYIDX]], align 4
	; CHECK-NEXT: [[CMP1:%.*]] = icmp sgt i32 [[TMP9]], [[MAX_027]]
	; CHECK-NEXT: [[COND]] = select i1 [[CMP1]], i32 [[TMP9]], i32 [[MAX_027]]
	; CHECK-NEXT: [[CMP4:%.*]] = icmp slt i32 [[TMP9]], [[MIN_028]]
	; CHECK-NEXT: [[COND9]] = select i1 [[CMP4]], i32 [[TMP9]], i32 [[MIN_028]]
	; CHECK-NEXT: [[INC]] = add nuw i32 [[I_029]], 1
	; CHECK-NEXT: [[EXITCOND_NOT:%.*]] = icmp eq i32 [[INC]], [[N]]
	; CHECK-NEXT: br i1 [[EXITCOND_NOT]], label [[FOR_COND_CLEANUP]], label [[FOR_BODY]], [[LOOP2:!llvm.loop !.*]]
	;			;
	entry:			entry:
	%cmp26.not = icmp eq i32 %N, 0			%cmp26.not = icmp eq i32 %N, 0
	br i1 %cmp26.not, label %for.cond.cleanup, label %for.body			br i1 %cmp26.not, label %for.cond.cleanup, label %for.body

	for.cond.cleanup: ; preds = %for.body, %entry			for.cond.cleanup: ; preds = %for.body, %entry
	%max.0.lcssa = phi i32 [ -2147483648, %entry ], [ %cond, %for.body ]			%max.0.lcssa = phi i32 [ -2147483648, %entry ], [ %cond, %for.body ]
	%min.0.lcssa = phi i32 [ 2147483647, %entry ], [ %cond9, %for.body ]			%min.0.lcssa = phi i32 [ 2147483647, %entry ], [ %cond9, %for.body ]
	Show All 18 Lines

llvm/test/Transforms/LoopVectorize/ARM/tail-folding-not-allowed.ll

Show First 20 Lines • Show All 563 Lines • ▼ Show 20 Lines	for.body: ; preds = %for.body.preheader, %for.body
%cond = load i32, i32* %cond.in, align 4		%cond = load i32, i32* %cond.in, align 4
%arrayidx3 = getelementptr inbounds i32, i32* %A, i32 %i.011		%arrayidx3 = getelementptr inbounds i32, i32* %A, i32 %i.011
store i32 %cond, i32* %arrayidx3, align 4		store i32 %cond, i32* %arrayidx3, align 4
%inc = add nuw nsw i32 %i.011, 1		%inc = add nuw nsw i32 %i.011, 1
%exitcond.not = icmp eq i32 %inc, %N		%exitcond.not = icmp eq i32 %inc, %N
br i1 %exitcond.not, label %for.cond.cleanup.loopexit, label %for.body		br i1 %exitcond.not, label %for.cond.cleanup.loopexit, label %for.body
}		}

define i32 @i32_smin_reduction(i32* nocapture readonly %x, i32 %n) #0 {		define i32 @i32_smin_reduction(i32* nocapture readonly %x, i32 %n) #0 {
		dmgreenUnsubmitted Not Done Reply Inline Actions Apparently we have two files for these, and they should now maybe belong in tail-folding-allowed.ll dmgreen: Apparently we have two files for these, and they should now maybe belong in tail-folding…
; CHECK-LABEL: @i32_smin_reduction(		; CHECK-LABEL: @i32_smin_reduction(
; CHECK-NEXT: entry:		; CHECK-NEXT: entry:
; CHECK-NEXT: [[CMP6:%.]] = icmp sgt i32 [[N:%.]], 0		; CHECK-NEXT: [[CMP6:%.]] = icmp sgt i32 [[N:%.]], 0
; CHECK-NEXT: br i1 [[CMP6]], label [[FOR_BODY_PREHEADER:%.]], label [[FOR_COND_CLEANUP:%.]]		; CHECK-NEXT: br i1 [[CMP6]], label [[FOR_BODY_PREHEADER:%.]], label [[FOR_COND_CLEANUP:%.]]
; CHECK: for.body.preheader:		; CHECK: for.body.preheader:
; CHECK-NEXT: [[MIN_ITERS_CHECK:%.*]] = icmp ult i32 [[N]], 4		; CHECK-NEXT: br i1 false, label [[SCALAR_PH:%.]], label [[VECTOR_PH:%.]]
; CHECK-NEXT: br i1 [[MIN_ITERS_CHECK]], label [[SCALAR_PH:%.]], label [[VECTOR_PH:%.]]
; CHECK: vector.ph:		; CHECK: vector.ph:
; CHECK-NEXT: [[N_MOD_VF:%.*]] = urem i32 [[N]], 4		; CHECK-NEXT: [[N_RND_UP:%.*]] = add i32 [[N]], 3
; CHECK-NEXT: [[N_VEC:%.*]] = sub i32 [[N]], [[N_MOD_VF]]		; CHECK-NEXT: [[N_MOD_VF:%.*]] = urem i32 [[N_RND_UP]], 4
		; CHECK-NEXT: [[N_VEC:%.*]] = sub i32 [[N_RND_UP]], [[N_MOD_VF]]
; CHECK-NEXT: br label [[VECTOR_BODY:%.*]]		; CHECK-NEXT: br label [[VECTOR_BODY:%.*]]
; CHECK: vector.body:		; CHECK: vector.body:
; CHECK-NEXT: [[INDEX:%.]] = phi i32 [ 0, [[VECTOR_PH]] ], [ [[INDEX_NEXT:%.]], [[VECTOR_BODY]] ]		; CHECK-NEXT: [[INDEX:%.]] = phi i32 [ 0, [[VECTOR_PH]] ], [ [[INDEX_NEXT:%.]], [[VECTOR_BODY]] ]
; CHECK-NEXT: [[VEC_PHI:%.]] = phi <4 x i32> [ <i32 2147483647, i32 2147483647, i32 2147483647, i32 2147483647>, [[VECTOR_PH]] ], [ [[TMP5:%.]], [[VECTOR_BODY]] ]		; CHECK-NEXT: [[VEC_PHI:%.]] = phi <4 x i32> [ <i32 2147483647, i32 2147483647, i32 2147483647, i32 2147483647>, [[VECTOR_PH]] ], [ [[TMP6:%.]], [[VECTOR_BODY]] ]
		; CHECK-NEXT: [[BROADCAST_SPLATINSERT:%.*]] = insertelement <4 x i32> undef, i32 [[INDEX]], i32 0
		; CHECK-NEXT: [[BROADCAST_SPLAT:%.*]] = shufflevector <4 x i32> [[BROADCAST_SPLATINSERT]], <4 x i32> undef, <4 x i32> zeroinitializer
		; CHECK-NEXT: [[INDUCTION:%.*]] = add <4 x i32> [[BROADCAST_SPLAT]], <i32 0, i32 1, i32 2, i32 3>
; CHECK-NEXT: [[TMP0:%.*]] = add i32 [[INDEX]], 0		; CHECK-NEXT: [[TMP0:%.*]] = add i32 [[INDEX]], 0
; CHECK-NEXT: [[TMP1:%.]] = getelementptr inbounds i32, i32 [[X:%.*]], i32 [[TMP0]]		; CHECK-NEXT: [[TMP1:%.]] = getelementptr inbounds i32, i32 [[X:%.*]], i32 [[TMP0]]
		; CHECK-NEXT: [[ACTIVE_LANE_MASK:%.*]] = call <4 x i1> @llvm.get.active.lane.mask.v4i1.i32(i32 [[TMP0]], i32 [[N]])
; CHECK-NEXT: [[TMP2:%.]] = getelementptr inbounds i32, i32 [[TMP1]], i32 0		; CHECK-NEXT: [[TMP2:%.]] = getelementptr inbounds i32, i32 [[TMP1]], i32 0
; CHECK-NEXT: [[TMP3:%.]] = bitcast i32 [[TMP2]] to <4 x i32>*		; CHECK-NEXT: [[TMP3:%.]] = bitcast i32 [[TMP2]] to <4 x i32>*
; CHECK-NEXT: [[WIDE_LOAD:%.]] = load <4 x i32>, <4 x i32> [[TMP3]], align 4		; CHECK-NEXT: [[WIDE_MASKED_LOAD:%.]] = call <4 x i32> @llvm.masked.load.v4i32.p0v4i32(<4 x i32> [[TMP3]], i32 4, <4 x i1> [[ACTIVE_LANE_MASK]], <4 x i32> undef)
; CHECK-NEXT: [[TMP4:%.*]] = icmp slt <4 x i32> [[VEC_PHI]], [[WIDE_LOAD]]		; CHECK-NEXT: [[TMP4:%.*]] = icmp slt <4 x i32> [[VEC_PHI]], [[WIDE_MASKED_LOAD]]
; CHECK-NEXT: [[TMP5]] = select <4 x i1> [[TMP4]], <4 x i32> [[VEC_PHI]], <4 x i32> [[WIDE_LOAD]]		; CHECK-NEXT: [[TMP5:%.*]] = select <4 x i1> [[TMP4]], <4 x i32> [[VEC_PHI]], <4 x i32> [[WIDE_MASKED_LOAD]]
		; CHECK-NEXT: [[TMP6]] = select <4 x i1> [[ACTIVE_LANE_MASK]], <4 x i32> [[TMP5]], <4 x i32> [[VEC_PHI]]
; CHECK-NEXT: [[INDEX_NEXT]] = add i32 [[INDEX]], 4		; CHECK-NEXT: [[INDEX_NEXT]] = add i32 [[INDEX]], 4
; CHECK-NEXT: [[TMP6:%.*]] = icmp eq i32 [[INDEX_NEXT]], [[N_VEC]]		; CHECK-NEXT: [[TMP7:%.*]] = icmp eq i32 [[INDEX_NEXT]], [[N_VEC]]
; CHECK-NEXT: br i1 [[TMP6]], label [[MIDDLE_BLOCK:%.]], label [[VECTOR_BODY]], [[LOOP12:!llvm.loop !.]]		; CHECK-NEXT: br i1 [[TMP7]], label [[MIDDLE_BLOCK:%.]], label [[VECTOR_BODY]], [[LOOP12:!llvm.loop !.]]
; CHECK: middle.block:		; CHECK: middle.block:
; CHECK-NEXT: [[TMP7:%.*]] = call i32 @llvm.experimental.vector.reduce.smin.v4i32(<4 x i32> [[TMP5]])		; CHECK-NEXT: [[TMP8:%.*]] = call i32 @llvm.experimental.vector.reduce.smin.v4i32(<4 x i32> [[TMP6]])
; CHECK-NEXT: [[CMP_N:%.*]] = icmp eq i32 [[N]], [[N_VEC]]		; CHECK-NEXT: br i1 true, label [[FOR_COND_CLEANUP_LOOPEXIT:%.*]], label [[SCALAR_PH]]
; CHECK-NEXT: br i1 [[CMP_N]], label [[FOR_COND_CLEANUP_LOOPEXIT:%.*]], label [[SCALAR_PH]]
; CHECK: scalar.ph:		; CHECK: scalar.ph:
; CHECK-NEXT: [[BC_RESUME_VAL:%.*]] = phi i32 [ [[N_VEC]], [[MIDDLE_BLOCK]] ], [ 0, [[FOR_BODY_PREHEADER]] ]		; CHECK-NEXT: [[BC_RESUME_VAL:%.*]] = phi i32 [ [[N_VEC]], [[MIDDLE_BLOCK]] ], [ 0, [[FOR_BODY_PREHEADER]] ]
; CHECK-NEXT: [[BC_MERGE_RDX:%.*]] = phi i32 [ 2147483647, [[FOR_BODY_PREHEADER]] ], [ [[TMP7]], [[MIDDLE_BLOCK]] ]		; CHECK-NEXT: [[BC_MERGE_RDX:%.*]] = phi i32 [ 2147483647, [[FOR_BODY_PREHEADER]] ], [ [[TMP8]], [[MIDDLE_BLOCK]] ]
; CHECK-NEXT: br label [[FOR_BODY:%.*]]		; CHECK-NEXT: br label [[FOR_BODY:%.*]]
; CHECK: for.body:		; CHECK: for.body:
; CHECK-NEXT: [[I_08:%.]] = phi i32 [ [[INC:%.]], [[FOR_BODY]] ], [ [[BC_RESUME_VAL]], [[SCALAR_PH]] ]		; CHECK-NEXT: [[I_08:%.]] = phi i32 [ [[INC:%.]], [[FOR_BODY]] ], [ [[BC_RESUME_VAL]], [[SCALAR_PH]] ]
; CHECK-NEXT: [[R_07:%.]] = phi i32 [ [[ADD:%.]], [[FOR_BODY]] ], [ [[BC_MERGE_RDX]], [[SCALAR_PH]] ]		; CHECK-NEXT: [[R_07:%.]] = phi i32 [ [[ADD:%.]], [[FOR_BODY]] ], [ [[BC_MERGE_RDX]], [[SCALAR_PH]] ]
; CHECK-NEXT: [[ARRAYIDX:%.]] = getelementptr inbounds i32, i32 [[X]], i32 [[I_08]]		; CHECK-NEXT: [[ARRAYIDX:%.]] = getelementptr inbounds i32, i32 [[X]], i32 [[I_08]]
; CHECK-NEXT: [[TMP8:%.]] = load i32, i32 [[ARRAYIDX]], align 4		; CHECK-NEXT: [[TMP9:%.]] = load i32, i32 [[ARRAYIDX]], align 4
; CHECK-NEXT: [[C:%.*]] = icmp slt i32 [[R_07]], [[TMP8]]		; CHECK-NEXT: [[C:%.*]] = icmp slt i32 [[R_07]], [[TMP9]]
; CHECK-NEXT: [[ADD]] = select i1 [[C]], i32 [[R_07]], i32 [[TMP8]]		; CHECK-NEXT: [[ADD]] = select i1 [[C]], i32 [[R_07]], i32 [[TMP9]]
; CHECK-NEXT: [[INC]] = add nuw nsw i32 [[I_08]], 1		; CHECK-NEXT: [[INC]] = add nuw nsw i32 [[I_08]], 1
; CHECK-NEXT: [[EXITCOND:%.*]] = icmp eq i32 [[INC]], [[N]]		; CHECK-NEXT: [[EXITCOND:%.*]] = icmp eq i32 [[INC]], [[N]]
; CHECK-NEXT: br i1 [[EXITCOND]], label [[FOR_COND_CLEANUP_LOOPEXIT]], label [[FOR_BODY]], [[LOOP13:!llvm.loop !.*]]		; CHECK-NEXT: br i1 [[EXITCOND]], label [[FOR_COND_CLEANUP_LOOPEXIT]], label [[FOR_BODY]], [[LOOP13:!llvm.loop !.*]]
; CHECK: for.cond.cleanup.loopexit:		; CHECK: for.cond.cleanup.loopexit:
; CHECK-NEXT: [[ADD_LCSSA:%.*]] = phi i32 [ [[ADD]], [[FOR_BODY]] ], [ [[TMP7]], [[MIDDLE_BLOCK]] ]		; CHECK-NEXT: [[ADD_LCSSA:%.*]] = phi i32 [ [[ADD]], [[FOR_BODY]] ], [ [[TMP8]], [[MIDDLE_BLOCK]] ]
; CHECK-NEXT: br label [[FOR_COND_CLEANUP]]		; CHECK-NEXT: br label [[FOR_COND_CLEANUP]]
; CHECK: for.cond.cleanup:		; CHECK: for.cond.cleanup:
; CHECK-NEXT: [[R_0_LCSSA:%.]] = phi i32 [ 2147483647, [[ENTRY:%.]] ], [ [[ADD_LCSSA]], [[FOR_COND_CLEANUP_LOOPEXIT]] ]		; CHECK-NEXT: [[R_0_LCSSA:%.]] = phi i32 [ 2147483647, [[ENTRY:%.]] ], [ [[ADD_LCSSA]], [[FOR_COND_CLEANUP_LOOPEXIT]] ]
; CHECK-NEXT: ret i32 [[R_0_LCSSA]]		; CHECK-NEXT: ret i32 [[R_0_LCSSA]]
;		;
entry:		entry:
%cmp6 = icmp sgt i32 %n, 0		%cmp6 = icmp sgt i32 %n, 0
br i1 %cmp6, label %for.body, label %for.cond.cleanup		br i1 %cmp6, label %for.body, label %for.cond.cleanup
Show All 15 Lines
}		}

define i32 @i32_smax_reduction(i32* nocapture readonly %x, i32 %n) #0 {		define i32 @i32_smax_reduction(i32* nocapture readonly %x, i32 %n) #0 {
; CHECK-LABEL: @i32_smax_reduction(		; CHECK-LABEL: @i32_smax_reduction(
; CHECK-NEXT: entry:		; CHECK-NEXT: entry:
; CHECK-NEXT: [[CMP6:%.]] = icmp sgt i32 [[N:%.]], 0		; CHECK-NEXT: [[CMP6:%.]] = icmp sgt i32 [[N:%.]], 0
; CHECK-NEXT: br i1 [[CMP6]], label [[FOR_BODY_PREHEADER:%.]], label [[FOR_COND_CLEANUP:%.]]		; CHECK-NEXT: br i1 [[CMP6]], label [[FOR_BODY_PREHEADER:%.]], label [[FOR_COND_CLEANUP:%.]]
; CHECK: for.body.preheader:		; CHECK: for.body.preheader:
; CHECK-NEXT: [[MIN_ITERS_CHECK:%.*]] = icmp ult i32 [[N]], 4		; CHECK-NEXT: br i1 false, label [[SCALAR_PH:%.]], label [[VECTOR_PH:%.]]
; CHECK-NEXT: br i1 [[MIN_ITERS_CHECK]], label [[SCALAR_PH:%.]], label [[VECTOR_PH:%.]]
; CHECK: vector.ph:		; CHECK: vector.ph:
; CHECK-NEXT: [[N_MOD_VF:%.*]] = urem i32 [[N]], 4		; CHECK-NEXT: [[N_RND_UP:%.*]] = add i32 [[N]], 3
; CHECK-NEXT: [[N_VEC:%.*]] = sub i32 [[N]], [[N_MOD_VF]]		; CHECK-NEXT: [[N_MOD_VF:%.*]] = urem i32 [[N_RND_UP]], 4
		; CHECK-NEXT: [[N_VEC:%.*]] = sub i32 [[N_RND_UP]], [[N_MOD_VF]]
; CHECK-NEXT: br label [[VECTOR_BODY:%.*]]		; CHECK-NEXT: br label [[VECTOR_BODY:%.*]]
; CHECK: vector.body:		; CHECK: vector.body:
; CHECK-NEXT: [[INDEX:%.]] = phi i32 [ 0, [[VECTOR_PH]] ], [ [[INDEX_NEXT:%.]], [[VECTOR_BODY]] ]		; CHECK-NEXT: [[INDEX:%.]] = phi i32 [ 0, [[VECTOR_PH]] ], [ [[INDEX_NEXT:%.]], [[VECTOR_BODY]] ]
; CHECK-NEXT: [[VEC_PHI:%.]] = phi <4 x i32> [ <i32 -2147483648, i32 -2147483648, i32 -2147483648, i32 -2147483648>, [[VECTOR_PH]] ], [ [[TMP5:%.]], [[VECTOR_BODY]] ]		; CHECK-NEXT: [[VEC_PHI:%.]] = phi <4 x i32> [ <i32 -2147483648, i32 -2147483648, i32 -2147483648, i32 -2147483648>, [[VECTOR_PH]] ], [ [[TMP6:%.]], [[VECTOR_BODY]] ]
		; CHECK-NEXT: [[BROADCAST_SPLATINSERT:%.*]] = insertelement <4 x i32> undef, i32 [[INDEX]], i32 0
		; CHECK-NEXT: [[BROADCAST_SPLAT:%.*]] = shufflevector <4 x i32> [[BROADCAST_SPLATINSERT]], <4 x i32> undef, <4 x i32> zeroinitializer
		; CHECK-NEXT: [[INDUCTION:%.*]] = add <4 x i32> [[BROADCAST_SPLAT]], <i32 0, i32 1, i32 2, i32 3>
; CHECK-NEXT: [[TMP0:%.*]] = add i32 [[INDEX]], 0		; CHECK-NEXT: [[TMP0:%.*]] = add i32 [[INDEX]], 0
; CHECK-NEXT: [[TMP1:%.]] = getelementptr inbounds i32, i32 [[X:%.*]], i32 [[TMP0]]		; CHECK-NEXT: [[TMP1:%.]] = getelementptr inbounds i32, i32 [[X:%.*]], i32 [[TMP0]]
		; CHECK-NEXT: [[ACTIVE_LANE_MASK:%.*]] = call <4 x i1> @llvm.get.active.lane.mask.v4i1.i32(i32 [[TMP0]], i32 [[N]])
; CHECK-NEXT: [[TMP2:%.]] = getelementptr inbounds i32, i32 [[TMP1]], i32 0		; CHECK-NEXT: [[TMP2:%.]] = getelementptr inbounds i32, i32 [[TMP1]], i32 0
; CHECK-NEXT: [[TMP3:%.]] = bitcast i32 [[TMP2]] to <4 x i32>*		; CHECK-NEXT: [[TMP3:%.]] = bitcast i32 [[TMP2]] to <4 x i32>*
; CHECK-NEXT: [[WIDE_LOAD:%.]] = load <4 x i32>, <4 x i32> [[TMP3]], align 4		; CHECK-NEXT: [[WIDE_MASKED_LOAD:%.]] = call <4 x i32> @llvm.masked.load.v4i32.p0v4i32(<4 x i32> [[TMP3]], i32 4, <4 x i1> [[ACTIVE_LANE_MASK]], <4 x i32> undef)
; CHECK-NEXT: [[TMP4:%.*]] = icmp sgt <4 x i32> [[VEC_PHI]], [[WIDE_LOAD]]		; CHECK-NEXT: [[TMP4:%.*]] = icmp sgt <4 x i32> [[VEC_PHI]], [[WIDE_MASKED_LOAD]]
; CHECK-NEXT: [[TMP5]] = select <4 x i1> [[TMP4]], <4 x i32> [[VEC_PHI]], <4 x i32> [[WIDE_LOAD]]		; CHECK-NEXT: [[TMP5:%.*]] = select <4 x i1> [[TMP4]], <4 x i32> [[VEC_PHI]], <4 x i32> [[WIDE_MASKED_LOAD]]
		; CHECK-NEXT: [[TMP6]] = select <4 x i1> [[ACTIVE_LANE_MASK]], <4 x i32> [[TMP5]], <4 x i32> [[VEC_PHI]]
; CHECK-NEXT: [[INDEX_NEXT]] = add i32 [[INDEX]], 4		; CHECK-NEXT: [[INDEX_NEXT]] = add i32 [[INDEX]], 4
; CHECK-NEXT: [[TMP6:%.*]] = icmp eq i32 [[INDEX_NEXT]], [[N_VEC]]		; CHECK-NEXT: [[TMP7:%.*]] = icmp eq i32 [[INDEX_NEXT]], [[N_VEC]]
; CHECK-NEXT: br i1 [[TMP6]], label [[MIDDLE_BLOCK:%.]], label [[VECTOR_BODY]], [[LOOP14:!llvm.loop !.]]		; CHECK-NEXT: br i1 [[TMP7]], label [[MIDDLE_BLOCK:%.]], label [[VECTOR_BODY]], [[LOOP14:!llvm.loop !.]]
; CHECK: middle.block:		; CHECK: middle.block:
; CHECK-NEXT: [[TMP7:%.*]] = call i32 @llvm.experimental.vector.reduce.smax.v4i32(<4 x i32> [[TMP5]])		; CHECK-NEXT: [[TMP8:%.*]] = call i32 @llvm.experimental.vector.reduce.smax.v4i32(<4 x i32> [[TMP6]])
; CHECK-NEXT: [[CMP_N:%.*]] = icmp eq i32 [[N]], [[N_VEC]]		; CHECK-NEXT: br i1 true, label [[FOR_COND_CLEANUP_LOOPEXIT:%.*]], label [[SCALAR_PH]]
; CHECK-NEXT: br i1 [[CMP_N]], label [[FOR_COND_CLEANUP_LOOPEXIT:%.*]], label [[SCALAR_PH]]
; CHECK: scalar.ph:		; CHECK: scalar.ph:
; CHECK-NEXT: [[BC_RESUME_VAL:%.*]] = phi i32 [ [[N_VEC]], [[MIDDLE_BLOCK]] ], [ 0, [[FOR_BODY_PREHEADER]] ]		; CHECK-NEXT: [[BC_RESUME_VAL:%.*]] = phi i32 [ [[N_VEC]], [[MIDDLE_BLOCK]] ], [ 0, [[FOR_BODY_PREHEADER]] ]
; CHECK-NEXT: [[BC_MERGE_RDX:%.*]] = phi i32 [ -2147483648, [[FOR_BODY_PREHEADER]] ], [ [[TMP7]], [[MIDDLE_BLOCK]] ]		; CHECK-NEXT: [[BC_MERGE_RDX:%.*]] = phi i32 [ -2147483648, [[FOR_BODY_PREHEADER]] ], [ [[TMP8]], [[MIDDLE_BLOCK]] ]
; CHECK-NEXT: br label [[FOR_BODY:%.*]]		; CHECK-NEXT: br label [[FOR_BODY:%.*]]
; CHECK: for.body:		; CHECK: for.body:
; CHECK-NEXT: [[I_08:%.]] = phi i32 [ [[INC:%.]], [[FOR_BODY]] ], [ [[BC_RESUME_VAL]], [[SCALAR_PH]] ]		; CHECK-NEXT: [[I_08:%.]] = phi i32 [ [[INC:%.]], [[FOR_BODY]] ], [ [[BC_RESUME_VAL]], [[SCALAR_PH]] ]
; CHECK-NEXT: [[R_07:%.]] = phi i32 [ [[ADD:%.]], [[FOR_BODY]] ], [ [[BC_MERGE_RDX]], [[SCALAR_PH]] ]		; CHECK-NEXT: [[R_07:%.]] = phi i32 [ [[ADD:%.]], [[FOR_BODY]] ], [ [[BC_MERGE_RDX]], [[SCALAR_PH]] ]
; CHECK-NEXT: [[ARRAYIDX:%.]] = getelementptr inbounds i32, i32 [[X]], i32 [[I_08]]		; CHECK-NEXT: [[ARRAYIDX:%.]] = getelementptr inbounds i32, i32 [[X]], i32 [[I_08]]
; CHECK-NEXT: [[TMP8:%.]] = load i32, i32 [[ARRAYIDX]], align 4		; CHECK-NEXT: [[TMP9:%.]] = load i32, i32 [[ARRAYIDX]], align 4
; CHECK-NEXT: [[C:%.*]] = icmp sgt i32 [[R_07]], [[TMP8]]		; CHECK-NEXT: [[C:%.*]] = icmp sgt i32 [[R_07]], [[TMP9]]
; CHECK-NEXT: [[ADD]] = select i1 [[C]], i32 [[R_07]], i32 [[TMP8]]		; CHECK-NEXT: [[ADD]] = select i1 [[C]], i32 [[R_07]], i32 [[TMP9]]
; CHECK-NEXT: [[INC]] = add nuw nsw i32 [[I_08]], 1		; CHECK-NEXT: [[INC]] = add nuw nsw i32 [[I_08]], 1
; CHECK-NEXT: [[EXITCOND:%.*]] = icmp eq i32 [[INC]], [[N]]		; CHECK-NEXT: [[EXITCOND:%.*]] = icmp eq i32 [[INC]], [[N]]
; CHECK-NEXT: br i1 [[EXITCOND]], label [[FOR_COND_CLEANUP_LOOPEXIT]], label [[FOR_BODY]], [[LOOP15:!llvm.loop !.*]]		; CHECK-NEXT: br i1 [[EXITCOND]], label [[FOR_COND_CLEANUP_LOOPEXIT]], label [[FOR_BODY]], [[LOOP15:!llvm.loop !.*]]
; CHECK: for.cond.cleanup.loopexit:		; CHECK: for.cond.cleanup.loopexit:
; CHECK-NEXT: [[ADD_LCSSA:%.*]] = phi i32 [ [[ADD]], [[FOR_BODY]] ], [ [[TMP7]], [[MIDDLE_BLOCK]] ]		; CHECK-NEXT: [[ADD_LCSSA:%.*]] = phi i32 [ [[ADD]], [[FOR_BODY]] ], [ [[TMP8]], [[MIDDLE_BLOCK]] ]
; CHECK-NEXT: br label [[FOR_COND_CLEANUP]]		; CHECK-NEXT: br label [[FOR_COND_CLEANUP]]
; CHECK: for.cond.cleanup:		; CHECK: for.cond.cleanup:
; CHECK-NEXT: [[R_0_LCSSA:%.]] = phi i32 [ -2147483648, [[ENTRY:%.]] ], [ [[ADD_LCSSA]], [[FOR_COND_CLEANUP_LOOPEXIT]] ]		; CHECK-NEXT: [[R_0_LCSSA:%.]] = phi i32 [ -2147483648, [[ENTRY:%.]] ], [ [[ADD_LCSSA]], [[FOR_COND_CLEANUP_LOOPEXIT]] ]
; CHECK-NEXT: ret i32 [[R_0_LCSSA]]		; CHECK-NEXT: ret i32 [[R_0_LCSSA]]
;		;
entry:		entry:
%cmp6 = icmp sgt i32 %n, 0		%cmp6 = icmp sgt i32 %n, 0
br i1 %cmp6, label %for.body, label %for.cond.cleanup		br i1 %cmp6, label %for.body, label %for.cond.cleanup
Show All 15 Lines
}		}

define i32 @i32_umin_reduction(i32* nocapture readonly %x, i32 %n) #0 {		define i32 @i32_umin_reduction(i32* nocapture readonly %x, i32 %n) #0 {
; CHECK-LABEL: @i32_umin_reduction(		; CHECK-LABEL: @i32_umin_reduction(
; CHECK-NEXT: entry:		; CHECK-NEXT: entry:
; CHECK-NEXT: [[CMP6:%.]] = icmp sgt i32 [[N:%.]], 0		; CHECK-NEXT: [[CMP6:%.]] = icmp sgt i32 [[N:%.]], 0
; CHECK-NEXT: br i1 [[CMP6]], label [[FOR_BODY_PREHEADER:%.]], label [[FOR_COND_CLEANUP:%.]]		; CHECK-NEXT: br i1 [[CMP6]], label [[FOR_BODY_PREHEADER:%.]], label [[FOR_COND_CLEANUP:%.]]
; CHECK: for.body.preheader:		; CHECK: for.body.preheader:
; CHECK-NEXT: [[MIN_ITERS_CHECK:%.*]] = icmp ult i32 [[N]], 4		; CHECK-NEXT: br i1 false, label [[SCALAR_PH:%.]], label [[VECTOR_PH:%.]]
; CHECK-NEXT: br i1 [[MIN_ITERS_CHECK]], label [[SCALAR_PH:%.]], label [[VECTOR_PH:%.]]
; CHECK: vector.ph:		; CHECK: vector.ph:
; CHECK-NEXT: [[N_MOD_VF:%.*]] = urem i32 [[N]], 4		; CHECK-NEXT: [[N_RND_UP:%.*]] = add i32 [[N]], 3
; CHECK-NEXT: [[N_VEC:%.*]] = sub i32 [[N]], [[N_MOD_VF]]		; CHECK-NEXT: [[N_MOD_VF:%.*]] = urem i32 [[N_RND_UP]], 4
		; CHECK-NEXT: [[N_VEC:%.*]] = sub i32 [[N_RND_UP]], [[N_MOD_VF]]
; CHECK-NEXT: br label [[VECTOR_BODY:%.*]]		; CHECK-NEXT: br label [[VECTOR_BODY:%.*]]
; CHECK: vector.body:		; CHECK: vector.body:
; CHECK-NEXT: [[INDEX:%.]] = phi i32 [ 0, [[VECTOR_PH]] ], [ [[INDEX_NEXT:%.]], [[VECTOR_BODY]] ]		; CHECK-NEXT: [[INDEX:%.]] = phi i32 [ 0, [[VECTOR_PH]] ], [ [[INDEX_NEXT:%.]], [[VECTOR_BODY]] ]
; CHECK-NEXT: [[VEC_PHI:%.]] = phi <4 x i32> [ <i32 -1, i32 -1, i32 -1, i32 -1>, [[VECTOR_PH]] ], [ [[TMP5:%.]], [[VECTOR_BODY]] ]		; CHECK-NEXT: [[VEC_PHI:%.]] = phi <4 x i32> [ <i32 -1, i32 -1, i32 -1, i32 -1>, [[VECTOR_PH]] ], [ [[TMP6:%.]], [[VECTOR_BODY]] ]
		; CHECK-NEXT: [[BROADCAST_SPLATINSERT:%.*]] = insertelement <4 x i32> undef, i32 [[INDEX]], i32 0
		; CHECK-NEXT: [[BROADCAST_SPLAT:%.*]] = shufflevector <4 x i32> [[BROADCAST_SPLATINSERT]], <4 x i32> undef, <4 x i32> zeroinitializer
		; CHECK-NEXT: [[INDUCTION:%.*]] = add <4 x i32> [[BROADCAST_SPLAT]], <i32 0, i32 1, i32 2, i32 3>
; CHECK-NEXT: [[TMP0:%.*]] = add i32 [[INDEX]], 0		; CHECK-NEXT: [[TMP0:%.*]] = add i32 [[INDEX]], 0
; CHECK-NEXT: [[TMP1:%.]] = getelementptr inbounds i32, i32 [[X:%.*]], i32 [[TMP0]]		; CHECK-NEXT: [[TMP1:%.]] = getelementptr inbounds i32, i32 [[X:%.*]], i32 [[TMP0]]
		; CHECK-NEXT: [[ACTIVE_LANE_MASK:%.*]] = call <4 x i1> @llvm.get.active.lane.mask.v4i1.i32(i32 [[TMP0]], i32 [[N]])
; CHECK-NEXT: [[TMP2:%.]] = getelementptr inbounds i32, i32 [[TMP1]], i32 0		; CHECK-NEXT: [[TMP2:%.]] = getelementptr inbounds i32, i32 [[TMP1]], i32 0
; CHECK-NEXT: [[TMP3:%.]] = bitcast i32 [[TMP2]] to <4 x i32>*		; CHECK-NEXT: [[TMP3:%.]] = bitcast i32 [[TMP2]] to <4 x i32>*
; CHECK-NEXT: [[WIDE_LOAD:%.]] = load <4 x i32>, <4 x i32> [[TMP3]], align 4		; CHECK-NEXT: [[WIDE_MASKED_LOAD:%.]] = call <4 x i32> @llvm.masked.load.v4i32.p0v4i32(<4 x i32> [[TMP3]], i32 4, <4 x i1> [[ACTIVE_LANE_MASK]], <4 x i32> undef)
; CHECK-NEXT: [[TMP4:%.*]] = icmp ult <4 x i32> [[VEC_PHI]], [[WIDE_LOAD]]		; CHECK-NEXT: [[TMP4:%.*]] = icmp ult <4 x i32> [[VEC_PHI]], [[WIDE_MASKED_LOAD]]
; CHECK-NEXT: [[TMP5]] = select <4 x i1> [[TMP4]], <4 x i32> [[VEC_PHI]], <4 x i32> [[WIDE_LOAD]]		; CHECK-NEXT: [[TMP5:%.*]] = select <4 x i1> [[TMP4]], <4 x i32> [[VEC_PHI]], <4 x i32> [[WIDE_MASKED_LOAD]]
		; CHECK-NEXT: [[TMP6]] = select <4 x i1> [[ACTIVE_LANE_MASK]], <4 x i32> [[TMP5]], <4 x i32> [[VEC_PHI]]
; CHECK-NEXT: [[INDEX_NEXT]] = add i32 [[INDEX]], 4		; CHECK-NEXT: [[INDEX_NEXT]] = add i32 [[INDEX]], 4
; CHECK-NEXT: [[TMP6:%.*]] = icmp eq i32 [[INDEX_NEXT]], [[N_VEC]]		; CHECK-NEXT: [[TMP7:%.*]] = icmp eq i32 [[INDEX_NEXT]], [[N_VEC]]
; CHECK-NEXT: br i1 [[TMP6]], label [[MIDDLE_BLOCK:%.]], label [[VECTOR_BODY]], [[LOOP16:!llvm.loop !.]]		; CHECK-NEXT: br i1 [[TMP7]], label [[MIDDLE_BLOCK:%.]], label [[VECTOR_BODY]], [[LOOP16:!llvm.loop !.]]
; CHECK: middle.block:		; CHECK: middle.block:
; CHECK-NEXT: [[TMP7:%.*]] = call i32 @llvm.experimental.vector.reduce.umin.v4i32(<4 x i32> [[TMP5]])		; CHECK-NEXT: [[TMP8:%.*]] = call i32 @llvm.experimental.vector.reduce.umin.v4i32(<4 x i32> [[TMP6]])
; CHECK-NEXT: [[CMP_N:%.*]] = icmp eq i32 [[N]], [[N_VEC]]		; CHECK-NEXT: br i1 true, label [[FOR_COND_CLEANUP_LOOPEXIT:%.*]], label [[SCALAR_PH]]
; CHECK-NEXT: br i1 [[CMP_N]], label [[FOR_COND_CLEANUP_LOOPEXIT:%.*]], label [[SCALAR_PH]]
; CHECK: scalar.ph:		; CHECK: scalar.ph:
; CHECK-NEXT: [[BC_RESUME_VAL:%.*]] = phi i32 [ [[N_VEC]], [[MIDDLE_BLOCK]] ], [ 0, [[FOR_BODY_PREHEADER]] ]		; CHECK-NEXT: [[BC_RESUME_VAL:%.*]] = phi i32 [ [[N_VEC]], [[MIDDLE_BLOCK]] ], [ 0, [[FOR_BODY_PREHEADER]] ]
; CHECK-NEXT: [[BC_MERGE_RDX:%.*]] = phi i32 [ -1, [[FOR_BODY_PREHEADER]] ], [ [[TMP7]], [[MIDDLE_BLOCK]] ]		; CHECK-NEXT: [[BC_MERGE_RDX:%.*]] = phi i32 [ -1, [[FOR_BODY_PREHEADER]] ], [ [[TMP8]], [[MIDDLE_BLOCK]] ]
; CHECK-NEXT: br label [[FOR_BODY:%.*]]		; CHECK-NEXT: br label [[FOR_BODY:%.*]]
; CHECK: for.body:		; CHECK: for.body:
; CHECK-NEXT: [[I_08:%.]] = phi i32 [ [[INC:%.]], [[FOR_BODY]] ], [ [[BC_RESUME_VAL]], [[SCALAR_PH]] ]		; CHECK-NEXT: [[I_08:%.]] = phi i32 [ [[INC:%.]], [[FOR_BODY]] ], [ [[BC_RESUME_VAL]], [[SCALAR_PH]] ]
; CHECK-NEXT: [[R_07:%.]] = phi i32 [ [[ADD:%.]], [[FOR_BODY]] ], [ [[BC_MERGE_RDX]], [[SCALAR_PH]] ]		; CHECK-NEXT: [[R_07:%.]] = phi i32 [ [[ADD:%.]], [[FOR_BODY]] ], [ [[BC_MERGE_RDX]], [[SCALAR_PH]] ]
; CHECK-NEXT: [[ARRAYIDX:%.]] = getelementptr inbounds i32, i32 [[X]], i32 [[I_08]]		; CHECK-NEXT: [[ARRAYIDX:%.]] = getelementptr inbounds i32, i32 [[X]], i32 [[I_08]]
; CHECK-NEXT: [[TMP8:%.]] = load i32, i32 [[ARRAYIDX]], align 4		; CHECK-NEXT: [[TMP9:%.]] = load i32, i32 [[ARRAYIDX]], align 4
; CHECK-NEXT: [[C:%.*]] = icmp ult i32 [[R_07]], [[TMP8]]		; CHECK-NEXT: [[C:%.*]] = icmp ult i32 [[R_07]], [[TMP9]]
; CHECK-NEXT: [[ADD]] = select i1 [[C]], i32 [[R_07]], i32 [[TMP8]]		; CHECK-NEXT: [[ADD]] = select i1 [[C]], i32 [[R_07]], i32 [[TMP9]]
; CHECK-NEXT: [[INC]] = add nuw nsw i32 [[I_08]], 1		; CHECK-NEXT: [[INC]] = add nuw nsw i32 [[I_08]], 1
; CHECK-NEXT: [[EXITCOND:%.*]] = icmp eq i32 [[INC]], [[N]]		; CHECK-NEXT: [[EXITCOND:%.*]] = icmp eq i32 [[INC]], [[N]]
; CHECK-NEXT: br i1 [[EXITCOND]], label [[FOR_COND_CLEANUP_LOOPEXIT]], label [[FOR_BODY]], [[LOOP17:!llvm.loop !.*]]		; CHECK-NEXT: br i1 [[EXITCOND]], label [[FOR_COND_CLEANUP_LOOPEXIT]], label [[FOR_BODY]], [[LOOP17:!llvm.loop !.*]]
; CHECK: for.cond.cleanup.loopexit:		; CHECK: for.cond.cleanup.loopexit:
; CHECK-NEXT: [[ADD_LCSSA:%.*]] = phi i32 [ [[ADD]], [[FOR_BODY]] ], [ [[TMP7]], [[MIDDLE_BLOCK]] ]		; CHECK-NEXT: [[ADD_LCSSA:%.*]] = phi i32 [ [[ADD]], [[FOR_BODY]] ], [ [[TMP8]], [[MIDDLE_BLOCK]] ]
; CHECK-NEXT: br label [[FOR_COND_CLEANUP]]		; CHECK-NEXT: br label [[FOR_COND_CLEANUP]]
; CHECK: for.cond.cleanup:		; CHECK: for.cond.cleanup:
; CHECK-NEXT: [[R_0_LCSSA:%.]] = phi i32 [ -1, [[ENTRY:%.]] ], [ [[ADD_LCSSA]], [[FOR_COND_CLEANUP_LOOPEXIT]] ]		; CHECK-NEXT: [[R_0_LCSSA:%.]] = phi i32 [ -1, [[ENTRY:%.]] ], [ [[ADD_LCSSA]], [[FOR_COND_CLEANUP_LOOPEXIT]] ]
; CHECK-NEXT: ret i32 [[R_0_LCSSA]]		; CHECK-NEXT: ret i32 [[R_0_LCSSA]]
;		;
entry:		entry:
%cmp6 = icmp sgt i32 %n, 0		%cmp6 = icmp sgt i32 %n, 0
br i1 %cmp6, label %for.body, label %for.cond.cleanup		br i1 %cmp6, label %for.body, label %for.cond.cleanup
Show All 15 Lines
}		}

define i32 @i32_umax_reduction(i32* nocapture readonly %x, i32 %n) #0 {		define i32 @i32_umax_reduction(i32* nocapture readonly %x, i32 %n) #0 {
; CHECK-LABEL: @i32_umax_reduction(		; CHECK-LABEL: @i32_umax_reduction(
; CHECK-NEXT: entry:		; CHECK-NEXT: entry:
; CHECK-NEXT: [[CMP6:%.]] = icmp sgt i32 [[N:%.]], 0		; CHECK-NEXT: [[CMP6:%.]] = icmp sgt i32 [[N:%.]], 0
; CHECK-NEXT: br i1 [[CMP6]], label [[FOR_BODY_PREHEADER:%.]], label [[FOR_COND_CLEANUP:%.]]		; CHECK-NEXT: br i1 [[CMP6]], label [[FOR_BODY_PREHEADER:%.]], label [[FOR_COND_CLEANUP:%.]]
; CHECK: for.body.preheader:		; CHECK: for.body.preheader:
; CHECK-NEXT: [[MIN_ITERS_CHECK:%.*]] = icmp ult i32 [[N]], 4		; CHECK-NEXT: br i1 false, label [[SCALAR_PH:%.]], label [[VECTOR_PH:%.]]
; CHECK-NEXT: br i1 [[MIN_ITERS_CHECK]], label [[SCALAR_PH:%.]], label [[VECTOR_PH:%.]]
; CHECK: vector.ph:		; CHECK: vector.ph:
; CHECK-NEXT: [[N_MOD_VF:%.*]] = urem i32 [[N]], 4		; CHECK-NEXT: [[N_RND_UP:%.*]] = add i32 [[N]], 3
; CHECK-NEXT: [[N_VEC:%.*]] = sub i32 [[N]], [[N_MOD_VF]]		; CHECK-NEXT: [[N_MOD_VF:%.*]] = urem i32 [[N_RND_UP]], 4
		; CHECK-NEXT: [[N_VEC:%.*]] = sub i32 [[N_RND_UP]], [[N_MOD_VF]]
; CHECK-NEXT: br label [[VECTOR_BODY:%.*]]		; CHECK-NEXT: br label [[VECTOR_BODY:%.*]]
; CHECK: vector.body:		; CHECK: vector.body:
; CHECK-NEXT: [[INDEX:%.]] = phi i32 [ 0, [[VECTOR_PH]] ], [ [[INDEX_NEXT:%.]], [[VECTOR_BODY]] ]		; CHECK-NEXT: [[INDEX:%.]] = phi i32 [ 0, [[VECTOR_PH]] ], [ [[INDEX_NEXT:%.]], [[VECTOR_BODY]] ]
; CHECK-NEXT: [[VEC_PHI:%.]] = phi <4 x i32> [ zeroinitializer, [[VECTOR_PH]] ], [ [[TMP5:%.]], [[VECTOR_BODY]] ]		; CHECK-NEXT: [[VEC_PHI:%.]] = phi <4 x i32> [ zeroinitializer, [[VECTOR_PH]] ], [ [[TMP6:%.]], [[VECTOR_BODY]] ]
		; CHECK-NEXT: [[BROADCAST_SPLATINSERT:%.*]] = insertelement <4 x i32> undef, i32 [[INDEX]], i32 0
		; CHECK-NEXT: [[BROADCAST_SPLAT:%.*]] = shufflevector <4 x i32> [[BROADCAST_SPLATINSERT]], <4 x i32> undef, <4 x i32> zeroinitializer
		; CHECK-NEXT: [[INDUCTION:%.*]] = add <4 x i32> [[BROADCAST_SPLAT]], <i32 0, i32 1, i32 2, i32 3>
; CHECK-NEXT: [[TMP0:%.*]] = add i32 [[INDEX]], 0		; CHECK-NEXT: [[TMP0:%.*]] = add i32 [[INDEX]], 0
; CHECK-NEXT: [[TMP1:%.]] = getelementptr inbounds i32, i32 [[X:%.*]], i32 [[TMP0]]		; CHECK-NEXT: [[TMP1:%.]] = getelementptr inbounds i32, i32 [[X:%.*]], i32 [[TMP0]]
		; CHECK-NEXT: [[ACTIVE_LANE_MASK:%.*]] = call <4 x i1> @llvm.get.active.lane.mask.v4i1.i32(i32 [[TMP0]], i32 [[N]])
; CHECK-NEXT: [[TMP2:%.]] = getelementptr inbounds i32, i32 [[TMP1]], i32 0		; CHECK-NEXT: [[TMP2:%.]] = getelementptr inbounds i32, i32 [[TMP1]], i32 0
; CHECK-NEXT: [[TMP3:%.]] = bitcast i32 [[TMP2]] to <4 x i32>*		; CHECK-NEXT: [[TMP3:%.]] = bitcast i32 [[TMP2]] to <4 x i32>*
; CHECK-NEXT: [[WIDE_LOAD:%.]] = load <4 x i32>, <4 x i32> [[TMP3]], align 4		; CHECK-NEXT: [[WIDE_MASKED_LOAD:%.]] = call <4 x i32> @llvm.masked.load.v4i32.p0v4i32(<4 x i32> [[TMP3]], i32 4, <4 x i1> [[ACTIVE_LANE_MASK]], <4 x i32> undef)
; CHECK-NEXT: [[TMP4:%.*]] = icmp ugt <4 x i32> [[VEC_PHI]], [[WIDE_LOAD]]		; CHECK-NEXT: [[TMP4:%.*]] = icmp ugt <4 x i32> [[VEC_PHI]], [[WIDE_MASKED_LOAD]]
; CHECK-NEXT: [[TMP5]] = select <4 x i1> [[TMP4]], <4 x i32> [[VEC_PHI]], <4 x i32> [[WIDE_LOAD]]		; CHECK-NEXT: [[TMP5:%.*]] = select <4 x i1> [[TMP4]], <4 x i32> [[VEC_PHI]], <4 x i32> [[WIDE_MASKED_LOAD]]
		; CHECK-NEXT: [[TMP6]] = select <4 x i1> [[ACTIVE_LANE_MASK]], <4 x i32> [[TMP5]], <4 x i32> [[VEC_PHI]]
; CHECK-NEXT: [[INDEX_NEXT]] = add i32 [[INDEX]], 4		; CHECK-NEXT: [[INDEX_NEXT]] = add i32 [[INDEX]], 4
; CHECK-NEXT: [[TMP6:%.*]] = icmp eq i32 [[INDEX_NEXT]], [[N_VEC]]		; CHECK-NEXT: [[TMP7:%.*]] = icmp eq i32 [[INDEX_NEXT]], [[N_VEC]]
; CHECK-NEXT: br i1 [[TMP6]], label [[MIDDLE_BLOCK:%.]], label [[VECTOR_BODY]], [[LOOP18:!llvm.loop !.]]		; CHECK-NEXT: br i1 [[TMP7]], label [[MIDDLE_BLOCK:%.]], label [[VECTOR_BODY]], [[LOOP18:!llvm.loop !.]]
; CHECK: middle.block:		; CHECK: middle.block:
; CHECK-NEXT: [[TMP7:%.*]] = call i32 @llvm.experimental.vector.reduce.umax.v4i32(<4 x i32> [[TMP5]])		; CHECK-NEXT: [[TMP8:%.*]] = call i32 @llvm.experimental.vector.reduce.umax.v4i32(<4 x i32> [[TMP6]])
; CHECK-NEXT: [[CMP_N:%.*]] = icmp eq i32 [[N]], [[N_VEC]]		; CHECK-NEXT: br i1 true, label [[FOR_COND_CLEANUP_LOOPEXIT:%.*]], label [[SCALAR_PH]]
; CHECK-NEXT: br i1 [[CMP_N]], label [[FOR_COND_CLEANUP_LOOPEXIT:%.*]], label [[SCALAR_PH]]
; CHECK: scalar.ph:		; CHECK: scalar.ph:
; CHECK-NEXT: [[BC_RESUME_VAL:%.*]] = phi i32 [ [[N_VEC]], [[MIDDLE_BLOCK]] ], [ 0, [[FOR_BODY_PREHEADER]] ]		; CHECK-NEXT: [[BC_RESUME_VAL:%.*]] = phi i32 [ [[N_VEC]], [[MIDDLE_BLOCK]] ], [ 0, [[FOR_BODY_PREHEADER]] ]
; CHECK-NEXT: [[BC_MERGE_RDX:%.*]] = phi i32 [ 0, [[FOR_BODY_PREHEADER]] ], [ [[TMP7]], [[MIDDLE_BLOCK]] ]		; CHECK-NEXT: [[BC_MERGE_RDX:%.*]] = phi i32 [ 0, [[FOR_BODY_PREHEADER]] ], [ [[TMP8]], [[MIDDLE_BLOCK]] ]
; CHECK-NEXT: br label [[FOR_BODY:%.*]]		; CHECK-NEXT: br label [[FOR_BODY:%.*]]
; CHECK: for.body:		; CHECK: for.body:
; CHECK-NEXT: [[I_08:%.]] = phi i32 [ [[INC:%.]], [[FOR_BODY]] ], [ [[BC_RESUME_VAL]], [[SCALAR_PH]] ]		; CHECK-NEXT: [[I_08:%.]] = phi i32 [ [[INC:%.]], [[FOR_BODY]] ], [ [[BC_RESUME_VAL]], [[SCALAR_PH]] ]
; CHECK-NEXT: [[R_07:%.]] = phi i32 [ [[ADD:%.]], [[FOR_BODY]] ], [ [[BC_MERGE_RDX]], [[SCALAR_PH]] ]		; CHECK-NEXT: [[R_07:%.]] = phi i32 [ [[ADD:%.]], [[FOR_BODY]] ], [ [[BC_MERGE_RDX]], [[SCALAR_PH]] ]
; CHECK-NEXT: [[ARRAYIDX:%.]] = getelementptr inbounds i32, i32 [[X]], i32 [[I_08]]		; CHECK-NEXT: [[ARRAYIDX:%.]] = getelementptr inbounds i32, i32 [[X]], i32 [[I_08]]
; CHECK-NEXT: [[TMP8:%.]] = load i32, i32 [[ARRAYIDX]], align 4		; CHECK-NEXT: [[TMP9:%.]] = load i32, i32 [[ARRAYIDX]], align 4
; CHECK-NEXT: [[C:%.*]] = icmp ugt i32 [[R_07]], [[TMP8]]		; CHECK-NEXT: [[C:%.*]] = icmp ugt i32 [[R_07]], [[TMP9]]
; CHECK-NEXT: [[ADD]] = select i1 [[C]], i32 [[R_07]], i32 [[TMP8]]		; CHECK-NEXT: [[ADD]] = select i1 [[C]], i32 [[R_07]], i32 [[TMP9]]
; CHECK-NEXT: [[INC]] = add nuw nsw i32 [[I_08]], 1		; CHECK-NEXT: [[INC]] = add nuw nsw i32 [[I_08]], 1
; CHECK-NEXT: [[EXITCOND:%.*]] = icmp eq i32 [[INC]], [[N]]		; CHECK-NEXT: [[EXITCOND:%.*]] = icmp eq i32 [[INC]], [[N]]
; CHECK-NEXT: br i1 [[EXITCOND]], label [[FOR_COND_CLEANUP_LOOPEXIT]], label [[FOR_BODY]], [[LOOP19:!llvm.loop !.*]]		; CHECK-NEXT: br i1 [[EXITCOND]], label [[FOR_COND_CLEANUP_LOOPEXIT]], label [[FOR_BODY]], [[LOOP19:!llvm.loop !.*]]
; CHECK: for.cond.cleanup.loopexit:		; CHECK: for.cond.cleanup.loopexit:
; CHECK-NEXT: [[ADD_LCSSA:%.*]] = phi i32 [ [[ADD]], [[FOR_BODY]] ], [ [[TMP7]], [[MIDDLE_BLOCK]] ]		; CHECK-NEXT: [[ADD_LCSSA:%.*]] = phi i32 [ [[ADD]], [[FOR_BODY]] ], [ [[TMP8]], [[MIDDLE_BLOCK]] ]
; CHECK-NEXT: br label [[FOR_COND_CLEANUP]]		; CHECK-NEXT: br label [[FOR_COND_CLEANUP]]
; CHECK: for.cond.cleanup:		; CHECK: for.cond.cleanup:
; CHECK-NEXT: [[R_0_LCSSA:%.]] = phi i32 [ 0, [[ENTRY:%.]] ], [ [[ADD_LCSSA]], [[FOR_COND_CLEANUP_LOOPEXIT]] ]		; CHECK-NEXT: [[R_0_LCSSA:%.]] = phi i32 [ 0, [[ENTRY:%.]] ], [ [[ADD_LCSSA]], [[FOR_COND_CLEANUP_LOOPEXIT]] ]
; CHECK-NEXT: ret i32 [[R_0_LCSSA]]		; CHECK-NEXT: ret i32 [[R_0_LCSSA]]
;		;
entry:		entry:
%cmp6 = icmp sgt i32 %n, 0		%cmp6 = icmp sgt i32 %n, 0
br i1 %cmp6, label %for.body, label %for.cond.cleanup		br i1 %cmp6, label %for.body, label %for.cond.cleanup
Show All 19 Lines