This is an archive of the discontinued LLVM Phabricator instance.

Paths

Table of Contentst

-
llvm/
-
lib/Transforms/Vectorize/
-
Transforms/
-
Vectorize/
1/1
LoopVectorizationLegality.cpp
2/2
LoopVectorize.cpp
-
test/Transforms/LoopVectorize/
-
Transforms/
-
LoopVectorize/
-
AArch64/
-
masked-call.ll
-
scalarize-masked-call.ll

Differential D134422

Scalarize calls to masked functions in LV
ClosedPublic

Authored by huntergr on Sep 22 2022, 2:16 AM.

Download Raw Diff

Details

Reviewers

reames
fhahn
david-arm
paulwalker-arm

Commits

rGa180344589ca: [LV] Allow scalarization of function calls when masking is required

Summary

This patch adds support for scalarizing calls to a function when there is a vector variant that cannot be used, either because there isn't a masked variant or because the cost model indicated a VF with out a masked variant was better.

Diff Detail

Repository: rG LLVM Github Monorepo

Event Timeline

huntergr created this revision.Sep 22 2022, 2:16 AM

Herald added a project: Restricted Project. · View Herald TranscriptSep 22 2022, 2:16 AM

Herald added a subscriber: hiraditya. · View Herald Transcript

huntergr requested review of this revision.Sep 22 2022, 2:16 AM

Herald added a project: Restricted Project. · View Herald TranscriptSep 22 2022, 2:16 AM

Herald added subscribers: llvm-commits, • pcwang-thead. · View Herald Transcript

huntergr added a parent revision: D132458: [LoopVectorize] Synthesize mask operands for vector variants as needed.Sep 22 2022, 2:17 AM

Harbormaster completed remote builds in B188129: Diff 462118.Sep 22 2022, 2:17 AM

mgabka added a subscriber: mgabka.Sep 26 2022, 7:14 AM

huntergr mentioned this in D136251: [LoopVectorize] Use available masked vector functions when required.Oct 19 2022, 5:34 AM

Matt added a subscriber: Matt.Feb 2 2023, 1:15 PM

Herald added a subscriber: StephenFan. · View Herald TranscriptFeb 2 2023, 1:15 PM

paulwalker-arm added a subscriber: paulwalker-arm.Feb 14 2023, 10:34 AM

paulwalker-arm added inline comments.

llvm/include/llvm/Analysis/VectorUtils.h
129 ↗	(On Diff #462118)	Perhaps have this return `Optional<unsigned>` to give people the option to use either `if (isMasked())` or `if (auto Pos = getParamIndexForMask())` depending on what works best for them? It just seems bad to force people to traverse the shape parameters multiple time and means `getParamIndexForOptionalMask` can be removed.
llvm/lib/Transforms/Vectorize/LoopVectorizationLegality.cpp
1137–1140	Is it possible to use `any_of` here?
llvm/lib/Transforms/Vectorize/LoopVectorize.cpp
3583	Given we cannot scalarise scalable vectors should this return `InvalidCost` somewhere? Or perhaps `blockCanBePredicated` should return false? Perhaps I'm being paranoid but the code should be resilient to the case of somebody adding a bunch of scalable vector TLI mapping before all the LoopVectorize support is finished. Or do tests already exist for scalable vectors that show it does not matter?

Rebased, amended patch per review comments.

huntergr marked 2 inline comments as done.Feb 27 2023, 8:27 AM

huntergr added inline comments.

llvm/lib/Transforms/Vectorize/LoopVectorize.cpp
3583	So Cost should already be Invalid from the call to getScalarizationOverhead above, but I've added in another check here just to be sure -- while we might one day implement scalarization for a subset of operations if there's a need for it, I don't think this will be one of them due to the standard AArch64 PCS. I was going to add a test for this but can't find a nice way of doing so yet with the 3rd patch in the series implementing full mask support -- it will probably have to wait until we remove the restriction on needing a variant.

Harbormaster completed remote builds in B216233: Diff 500802.Feb 27 2023, 9:50 AM

paulwalker-arm accepted this revision.Mar 1 2023, 9:31 AM

This revision is now accepted and ready to land.Mar 1 2023, 9:31 AM

This revision was landed with ongoing or failed builds.Mar 3 2023, 7:39 AM

Closed by commit rGa180344589ca: [LV] Allow scalarization of function calls when masking is required (authored by huntergr). · Explain Why

This revision was automatically updated to reflect the committed changes.

huntergr marked an inline comment as done.

huntergr added a commit: rGa180344589ca: [LV] Allow scalarization of function calls when masking is required.

Revision Contents

Path

Size

llvm/

lib/

Transforms/

Vectorize/

LoopVectorizationLegality.cpp

14 lines

LoopVectorize.cpp

8 lines

test/

Transforms/

LoopVectorize/

AArch64/

masked-call.ll

222 lines

scalarize-masked-call.ll

60 lines

Diff 502127

llvm/lib/Transforms/Vectorize/LoopVectorizationLegality.cpp

Show First 20 Lines • Show All 1,118 Lines • ▼ Show 20 Lines	for (Instruction &I : *BB) {
}		}

// Do not let llvm.experimental.noalias.scope.decl block the vectorization.		// Do not let llvm.experimental.noalias.scope.decl block the vectorization.
// TODO: there might be cases that it should block the vectorization. Let's		// TODO: there might be cases that it should block the vectorization. Let's
// ignore those for now.		// ignore those for now.
if (isa<NoAliasScopeDeclInst>(&I))		if (isa<NoAliasScopeDeclInst>(&I))
continue;		continue;

		// We can allow masked calls if there's at least one vector variant, even
		// if we end up scalarizing due to the cost model calculations.
		// TODO: Allow other calls if they have appropriate attributes... readonly
		// and argmemonly?
		if (CallInst *CI = dyn_cast<CallInst>(&I)) {
		// Check whether we have at least one masked vector version of a scalar
		// function.
		if (any_of(VFDatabase::getMappings(*CI),
		[](VFInfo &Info) { return Info.isMasked(); })) {
		MaskedOp.insert(CI);
		continue;
		}
		}

		paulwalker-armUnsubmitted Done Reply Inline Actions Is it possible to use `any_of` here? paulwalker-arm: Is it possible to use `any_of` here?
// Loads are handled via masking (or speculated if safe to do so.)		// Loads are handled via masking (or speculated if safe to do so.)
if (auto *LI = dyn_cast<LoadInst>(&I)) {		if (auto *LI = dyn_cast<LoadInst>(&I)) {
if (!SafePtrs.count(LI->getPointerOperand()))		if (!SafePtrs.count(LI->getPointerOperand()))
MaskedOp.insert(LI);		MaskedOp.insert(LI);
continue;		continue;
}		}

// Predicated store requires some form of masking:		// Predicated store requires some form of masking:
▲ Show 20 Lines • Show All 301 Lines • Show Last 20 Lines

llvm/lib/Transforms/Vectorize/LoopVectorize.cpp

This file is larger than 256 KB, so syntax highlighting is disabled by default.

Show First 20 Lines • Show All 3,573 Lines • ▼ Show 20 Lines	if (VecFunc) {
MaskCost = TTI.getShuffleCost(		MaskCost = TTI.getShuffleCost(
TargetTransformInfo::SK_Broadcast,		TargetTransformInfo::SK_Broadcast,
VectorType::get(		VectorType::get(
IntegerType::getInt1Ty(VecFunc->getFunctionType()->getContext()),		IntegerType::getInt1Ty(VecFunc->getFunctionType()->getContext()),
VF));		VF));
}		}
}		}

if (!TLI \|\| CI->isNoBuiltin() \|\| !VecFunc)		// We don't support masked function calls yet, but we can scalarize a
return Cost;		// masked call with branches (unless VF is scalable).
		paulwalker-armUnsubmitted Done Reply Inline Actions Given we cannot scalarise scalable vectors should this return `InvalidCost` somewhere? Or perhaps `blockCanBePredicated` should return false? Perhaps I'm being paranoid but the code should be resilient to the case of somebody adding a bunch of scalable vector TLI mapping before all the LoopVectorize support is finished. Or do tests already exist for scalable vectors that show it does not matter? paulwalker-arm: Given we cannot scalarise scalable vectors should this return `InvalidCost` somewhere? Or…
		huntergrAuthorUnsubmitted Done Reply Inline Actions So Cost should already be Invalid from the call to getScalarizationOverhead above, but I've added in another check here just to be sure -- while we might one day implement scalarization for a subset of operations if there's a need for it, I don't think this will be one of them due to the standard AArch64 PCS. I was going to add a test for this but can't find a nice way of doing so yet with the 3rd patch in the series implementing full mask support -- it will probably have to wait until we remove the restriction on needing a variant. huntergr: So Cost should already be Invalid from the call to getScalarizationOverhead above, but I've…
		if (!TLI \|\| CI->isNoBuiltin() \|\| !VecFunc \|\| Legal->isMaskRequired(CI))
		return VF.isScalable() ? InstructionCost::getInvalid() : Cost;

// If the corresponding vector cost is cheaper, return its cost.		// If the corresponding vector cost is cheaper, return its cost.
InstructionCost VectorCallCost =		InstructionCost VectorCallCost =
TTI.getCallInstrCost(nullptr, RetTy, Tys, CostKind) + MaskCost;		TTI.getCallInstrCost(nullptr, RetTy, Tys, CostKind) + MaskCost;
if (VectorCallCost < Cost) {		if (VectorCallCost < Cost) {
*Variant = VecFunc;		*Variant = VecFunc;
Cost = VectorCallCost;		Cost = VectorCallCost;
}		}
▲ Show 20 Lines • Show All 959 Lines • ▼ Show 20 Lines	bool LoopVectorizationCostModel::isPredicatedInst(Instruction *I) const {
}		}
case Instruction::UDiv:		case Instruction::UDiv:
case Instruction::SDiv:		case Instruction::SDiv:
case Instruction::SRem:		case Instruction::SRem:
case Instruction::URem:		case Instruction::URem:
// TODO: We can use the loop-preheader as context point here and get		// TODO: We can use the loop-preheader as context point here and get
// context sensitive reasoning		// context sensitive reasoning
return !isSafeToSpeculativelyExecute(I);		return !isSafeToSpeculativelyExecute(I);
		case Instruction::Call:
		return Legal->isMaskRequired(I);
}		}
}		}

std::pair<InstructionCost, InstructionCost>		std::pair<InstructionCost, InstructionCost>
LoopVectorizationCostModel::getDivRemSpeculationCost(Instruction *I,		LoopVectorizationCostModel::getDivRemSpeculationCost(Instruction *I,
ElementCount VF) const {		ElementCount VF) const {
assert(I->getOpcode() == Instruction::UDiv \|\|		assert(I->getOpcode() == Instruction::UDiv \|\|
I->getOpcode() == Instruction::SDiv \|\|		I->getOpcode() == Instruction::SDiv \|\|
▲ Show 20 Lines • Show All 6,194 Lines • Show Last 20 Lines

llvm/test/Transforms/LoopVectorize/AArch64/masked-call.ll

	Show First 20 Lines • Show All 47 Lines • ▼ Show 20 Lines
	; TFNONE-NEXT: [[INDVARS_IV_NEXT]] = add nuw nsw i64 [[INDVARS_IV]], 1			; TFNONE-NEXT: [[INDVARS_IV_NEXT]] = add nuw nsw i64 [[INDVARS_IV]], 1
	; TFNONE-NEXT: [[EXITCOND:%.*]] = icmp eq i64 [[INDVARS_IV_NEXT]], 1024			; TFNONE-NEXT: [[EXITCOND:%.*]] = icmp eq i64 [[INDVARS_IV_NEXT]], 1024
	; TFNONE-NEXT: br i1 [[EXITCOND]], label [[FOR_COND_CLEANUP]], label [[FOR_BODY]], !llvm.loop [[LOOP3:![0-9]+]]			; TFNONE-NEXT: br i1 [[EXITCOND]], label [[FOR_COND_CLEANUP]], label [[FOR_BODY]], !llvm.loop [[LOOP3:![0-9]+]]
	; TFNONE: for.cond.cleanup:			; TFNONE: for.cond.cleanup:
	; TFNONE-NEXT: ret void			; TFNONE-NEXT: ret void
	;			;
	; TFALWAYS-LABEL: @test_widen(			; TFALWAYS-LABEL: @test_widen(
	; TFALWAYS-NEXT: entry:			; TFALWAYS-NEXT: entry:
				; TFALWAYS-NEXT: br i1 false, label [[SCALAR_PH:%.]], label [[VECTOR_PH:%.]]
				; TFALWAYS: vector.ph:
				; TFALWAYS-NEXT: br label [[VECTOR_BODY:%.*]]
				; TFALWAYS: vector.body:
				; TFALWAYS-NEXT: [[INDEX:%.]] = phi i64 [ 0, [[VECTOR_PH]] ], [ [[INDEX_NEXT:%.]], [[PRED_CALL_CONTINUE2:%.*]] ]
				; TFALWAYS-NEXT: [[ACTIVE_LANE_MASK:%.]] = phi <2 x i1> [ <i1 true, i1 true>, [[VECTOR_PH]] ], [ [[ACTIVE_LANE_MASK_NEXT:%.]], [[PRED_CALL_CONTINUE2]] ]
				; TFALWAYS-NEXT: [[TMP0:%.]] = getelementptr i64, ptr [[B:%.]], i64 [[INDEX]]
				; TFALWAYS-NEXT: [[WIDE_MASKED_LOAD:%.*]] = call <2 x i64> @llvm.masked.load.v2i64.p0(ptr [[TMP0]], i32 4, <2 x i1> [[ACTIVE_LANE_MASK]], <2 x i64> poison)
				; TFALWAYS-NEXT: [[TMP1:%.*]] = extractelement <2 x i1> [[ACTIVE_LANE_MASK]], i32 0
				; TFALWAYS-NEXT: br i1 [[TMP1]], label [[PRED_CALL_IF:%.]], label [[PRED_CALL_CONTINUE:%.]]
				; TFALWAYS: pred.call.if:
				; TFALWAYS-NEXT: [[TMP2:%.*]] = extractelement <2 x i64> [[WIDE_MASKED_LOAD]], i32 0
				; TFALWAYS-NEXT: [[TMP3:%.*]] = call i64 @foo(i64 [[TMP2]]) #[[ATTR4:[0-9]+]]
				; TFALWAYS-NEXT: [[TMP4:%.*]] = insertelement <2 x i64> poison, i64 [[TMP3]], i32 0
				; TFALWAYS-NEXT: br label [[PRED_CALL_CONTINUE]]
				; TFALWAYS: pred.call.continue:
				; TFALWAYS-NEXT: [[TMP5:%.*]] = phi <2 x i64> [ poison, [[VECTOR_BODY]] ], [ [[TMP4]], [[PRED_CALL_IF]] ]
				; TFALWAYS-NEXT: [[TMP6:%.*]] = extractelement <2 x i1> [[ACTIVE_LANE_MASK]], i32 1
				; TFALWAYS-NEXT: br i1 [[TMP6]], label [[PRED_CALL_IF1:%.*]], label [[PRED_CALL_CONTINUE2]]
				; TFALWAYS: pred.call.if1:
				; TFALWAYS-NEXT: [[TMP7:%.*]] = extractelement <2 x i64> [[WIDE_MASKED_LOAD]], i32 1
				; TFALWAYS-NEXT: [[TMP8:%.*]] = call i64 @foo(i64 [[TMP7]]) #[[ATTR4]]
				; TFALWAYS-NEXT: [[TMP9:%.*]] = insertelement <2 x i64> [[TMP5]], i64 [[TMP8]], i32 1
				; TFALWAYS-NEXT: br label [[PRED_CALL_CONTINUE2]]
				; TFALWAYS: pred.call.continue2:
				; TFALWAYS-NEXT: [[TMP10:%.*]] = phi <2 x i64> [ [[TMP5]], [[PRED_CALL_CONTINUE]] ], [ [[TMP9]], [[PRED_CALL_IF1]] ]
				; TFALWAYS-NEXT: [[TMP11:%.]] = getelementptr inbounds i64, ptr [[A:%.]], i64 [[INDEX]]
				; TFALWAYS-NEXT: call void @llvm.masked.store.v2i64.p0(<2 x i64> [[TMP10]], ptr [[TMP11]], i32 4, <2 x i1> [[ACTIVE_LANE_MASK]])
				; TFALWAYS-NEXT: [[INDEX_NEXT]] = add i64 [[INDEX]], 2
				; TFALWAYS-NEXT: [[ACTIVE_LANE_MASK_NEXT]] = call <2 x i1> @llvm.get.active.lane.mask.v2i1.i64(i64 [[INDEX_NEXT]], i64 1024)
				; TFALWAYS-NEXT: [[TMP12:%.*]] = xor <2 x i1> [[ACTIVE_LANE_MASK_NEXT]], <i1 true, i1 true>
				; TFALWAYS-NEXT: [[TMP13:%.*]] = extractelement <2 x i1> [[TMP12]], i32 0
				; TFALWAYS-NEXT: br i1 [[TMP13]], label [[MIDDLE_BLOCK:%.*]], label [[VECTOR_BODY]], !llvm.loop [[LOOP0:![0-9]+]]
				; TFALWAYS: middle.block:
				; TFALWAYS-NEXT: br i1 true, label [[FOR_COND_CLEANUP:%.*]], label [[SCALAR_PH]]
				; TFALWAYS: scalar.ph:
				; TFALWAYS-NEXT: [[BC_RESUME_VAL:%.]] = phi i64 [ 1024, [[MIDDLE_BLOCK]] ], [ 0, [[ENTRY:%.]] ]
	; TFALWAYS-NEXT: br label [[FOR_BODY:%.*]]			; TFALWAYS-NEXT: br label [[FOR_BODY:%.*]]
	; TFALWAYS: for.body:			; TFALWAYS: for.body:
	; TFALWAYS-NEXT: [[INDVARS_IV:%.]] = phi i64 [ 0, [[ENTRY:%.]] ], [ [[INDVARS_IV_NEXT:%.*]], [[FOR_BODY]] ]			; TFALWAYS-NEXT: [[INDVARS_IV:%.]] = phi i64 [ [[BC_RESUME_VAL]], [[SCALAR_PH]] ], [ [[INDVARS_IV_NEXT:%.]], [[FOR_BODY]] ]
	; TFALWAYS-NEXT: [[GEP:%.]] = getelementptr i64, ptr [[B:%.]], i64 [[INDVARS_IV]]			; TFALWAYS-NEXT: [[GEP:%.*]] = getelementptr i64, ptr [[B]], i64 [[INDVARS_IV]]
	; TFALWAYS-NEXT: [[LOAD:%.*]] = load i64, ptr [[GEP]], align 4			; TFALWAYS-NEXT: [[LOAD:%.*]] = load i64, ptr [[GEP]], align 4
	; TFALWAYS-NEXT: [[CALL:%.*]] = call i64 @foo(i64 [[LOAD]]) #[[ATTR1:[0-9]+]]			; TFALWAYS-NEXT: [[CALL:%.*]] = call i64 @foo(i64 [[LOAD]]) #[[ATTR4]]
	; TFALWAYS-NEXT: [[ARRAYIDX:%.]] = getelementptr inbounds i64, ptr [[A:%.]], i64 [[INDVARS_IV]]			; TFALWAYS-NEXT: [[ARRAYIDX:%.*]] = getelementptr inbounds i64, ptr [[A]], i64 [[INDVARS_IV]]
	; TFALWAYS-NEXT: store i64 [[CALL]], ptr [[ARRAYIDX]], align 4			; TFALWAYS-NEXT: store i64 [[CALL]], ptr [[ARRAYIDX]], align 4
	; TFALWAYS-NEXT: [[INDVARS_IV_NEXT]] = add nuw nsw i64 [[INDVARS_IV]], 1			; TFALWAYS-NEXT: [[INDVARS_IV_NEXT]] = add nuw nsw i64 [[INDVARS_IV]], 1
	; TFALWAYS-NEXT: [[EXITCOND:%.*]] = icmp eq i64 [[INDVARS_IV_NEXT]], 1024			; TFALWAYS-NEXT: [[EXITCOND:%.*]] = icmp eq i64 [[INDVARS_IV_NEXT]], 1024
	; TFALWAYS-NEXT: br i1 [[EXITCOND]], label [[FOR_COND_CLEANUP:%.*]], label [[FOR_BODY]]			; TFALWAYS-NEXT: br i1 [[EXITCOND]], label [[FOR_COND_CLEANUP]], label [[FOR_BODY]], !llvm.loop [[LOOP3:![0-9]+]]
	; TFALWAYS: for.cond.cleanup:			; TFALWAYS: for.cond.cleanup:
	; TFALWAYS-NEXT: ret void			; TFALWAYS-NEXT: ret void
	;			;
	; TFFALLBACK-LABEL: @test_widen(			; TFFALLBACK-LABEL: @test_widen(
	; TFFALLBACK-NEXT: entry:			; TFFALLBACK-NEXT: entry:
	; TFFALLBACK-NEXT: [[TMP0:%.*]] = call i64 @llvm.vscale.i64()			; TFFALLBACK-NEXT: br i1 false, label [[SCALAR_PH:%.]], label [[VECTOR_PH:%.]]
	; TFFALLBACK-NEXT: [[TMP1:%.*]] = mul i64 [[TMP0]], 2
	; TFFALLBACK-NEXT: [[MIN_ITERS_CHECK:%.*]] = icmp ult i64 1024, [[TMP1]]
	; TFFALLBACK-NEXT: br i1 [[MIN_ITERS_CHECK]], label [[SCALAR_PH:%.]], label [[VECTOR_PH:%.]]
	; TFFALLBACK: vector.ph:			; TFFALLBACK: vector.ph:
	; TFFALLBACK-NEXT: [[TMP2:%.*]] = call i64 @llvm.vscale.i64()
	; TFFALLBACK-NEXT: [[TMP3:%.*]] = mul i64 [[TMP2]], 2
	; TFFALLBACK-NEXT: [[N_MOD_VF:%.*]] = urem i64 1024, [[TMP3]]
	; TFFALLBACK-NEXT: [[N_VEC:%.*]] = sub i64 1024, [[N_MOD_VF]]
	; TFFALLBACK-NEXT: br label [[VECTOR_BODY:%.*]]			; TFFALLBACK-NEXT: br label [[VECTOR_BODY:%.*]]
	; TFFALLBACK: vector.body:			; TFFALLBACK: vector.body:
	; TFFALLBACK-NEXT: [[INDEX:%.]] = phi i64 [ 0, [[VECTOR_PH]] ], [ [[INDEX_NEXT:%.]], [[VECTOR_BODY]] ]			; TFFALLBACK-NEXT: [[INDEX:%.]] = phi i64 [ 0, [[VECTOR_PH]] ], [ [[INDEX_NEXT:%.]], [[PRED_CALL_CONTINUE2:%.*]] ]
	; TFFALLBACK-NEXT: [[TMP4:%.]] = getelementptr i64, ptr [[B:%.]], i64 [[INDEX]]			; TFFALLBACK-NEXT: [[ACTIVE_LANE_MASK:%.]] = phi <2 x i1> [ <i1 true, i1 true>, [[VECTOR_PH]] ], [ [[ACTIVE_LANE_MASK_NEXT:%.]], [[PRED_CALL_CONTINUE2]] ]
	; TFFALLBACK-NEXT: [[WIDE_LOAD:%.*]] = load <vscale x 2 x i64>, ptr [[TMP4]], align 4			; TFFALLBACK-NEXT: [[TMP0:%.]] = getelementptr i64, ptr [[B:%.]], i64 [[INDEX]]
	; TFFALLBACK-NEXT: [[TMP5:%.*]] = call <vscale x 2 x i64> @foo_vector(<vscale x 2 x i64> [[WIDE_LOAD]], <vscale x 2 x i1> shufflevector (<vscale x 2 x i1> insertelement (<vscale x 2 x i1> poison, i1 true, i64 0), <vscale x 2 x i1> poison, <vscale x 2 x i32> zeroinitializer))			; TFFALLBACK-NEXT: [[WIDE_MASKED_LOAD:%.*]] = call <2 x i64> @llvm.masked.load.v2i64.p0(ptr [[TMP0]], i32 4, <2 x i1> [[ACTIVE_LANE_MASK]], <2 x i64> poison)
	; TFFALLBACK-NEXT: [[TMP6:%.]] = getelementptr inbounds i64, ptr [[A:%.]], i64 [[INDEX]]			; TFFALLBACK-NEXT: [[TMP1:%.*]] = extractelement <2 x i1> [[ACTIVE_LANE_MASK]], i32 0
	; TFFALLBACK-NEXT: store <vscale x 2 x i64> [[TMP5]], ptr [[TMP6]], align 4			; TFFALLBACK-NEXT: br i1 [[TMP1]], label [[PRED_CALL_IF:%.]], label [[PRED_CALL_CONTINUE:%.]]
	; TFFALLBACK-NEXT: [[TMP7:%.*]] = call i64 @llvm.vscale.i64()			; TFFALLBACK: pred.call.if:
	; TFFALLBACK-NEXT: [[TMP8:%.*]] = mul i64 [[TMP7]], 2			; TFFALLBACK-NEXT: [[TMP2:%.*]] = extractelement <2 x i64> [[WIDE_MASKED_LOAD]], i32 0
	; TFFALLBACK-NEXT: [[INDEX_NEXT]] = add nuw i64 [[INDEX]], [[TMP8]]			; TFFALLBACK-NEXT: [[TMP3:%.*]] = call i64 @foo(i64 [[TMP2]]) #[[ATTR4:[0-9]+]]
	; TFFALLBACK-NEXT: [[TMP9:%.*]] = icmp eq i64 [[INDEX_NEXT]], [[N_VEC]]			; TFFALLBACK-NEXT: [[TMP4:%.*]] = insertelement <2 x i64> poison, i64 [[TMP3]], i32 0
	; TFFALLBACK-NEXT: br i1 [[TMP9]], label [[MIDDLE_BLOCK:%.*]], label [[VECTOR_BODY]], !llvm.loop [[LOOP0:![0-9]+]]			; TFFALLBACK-NEXT: br label [[PRED_CALL_CONTINUE]]
				; TFFALLBACK: pred.call.continue:
				; TFFALLBACK-NEXT: [[TMP5:%.*]] = phi <2 x i64> [ poison, [[VECTOR_BODY]] ], [ [[TMP4]], [[PRED_CALL_IF]] ]
				; TFFALLBACK-NEXT: [[TMP6:%.*]] = extractelement <2 x i1> [[ACTIVE_LANE_MASK]], i32 1
				; TFFALLBACK-NEXT: br i1 [[TMP6]], label [[PRED_CALL_IF1:%.*]], label [[PRED_CALL_CONTINUE2]]
				; TFFALLBACK: pred.call.if1:
				; TFFALLBACK-NEXT: [[TMP7:%.*]] = extractelement <2 x i64> [[WIDE_MASKED_LOAD]], i32 1
				; TFFALLBACK-NEXT: [[TMP8:%.*]] = call i64 @foo(i64 [[TMP7]]) #[[ATTR4]]
				; TFFALLBACK-NEXT: [[TMP9:%.*]] = insertelement <2 x i64> [[TMP5]], i64 [[TMP8]], i32 1
				; TFFALLBACK-NEXT: br label [[PRED_CALL_CONTINUE2]]
				; TFFALLBACK: pred.call.continue2:
				; TFFALLBACK-NEXT: [[TMP10:%.*]] = phi <2 x i64> [ [[TMP5]], [[PRED_CALL_CONTINUE]] ], [ [[TMP9]], [[PRED_CALL_IF1]] ]
				; TFFALLBACK-NEXT: [[TMP11:%.]] = getelementptr inbounds i64, ptr [[A:%.]], i64 [[INDEX]]
				; TFFALLBACK-NEXT: call void @llvm.masked.store.v2i64.p0(<2 x i64> [[TMP10]], ptr [[TMP11]], i32 4, <2 x i1> [[ACTIVE_LANE_MASK]])
				; TFFALLBACK-NEXT: [[INDEX_NEXT]] = add i64 [[INDEX]], 2
				; TFFALLBACK-NEXT: [[ACTIVE_LANE_MASK_NEXT]] = call <2 x i1> @llvm.get.active.lane.mask.v2i1.i64(i64 [[INDEX_NEXT]], i64 1024)
				; TFFALLBACK-NEXT: [[TMP12:%.*]] = xor <2 x i1> [[ACTIVE_LANE_MASK_NEXT]], <i1 true, i1 true>
				; TFFALLBACK-NEXT: [[TMP13:%.*]] = extractelement <2 x i1> [[TMP12]], i32 0
				; TFFALLBACK-NEXT: br i1 [[TMP13]], label [[MIDDLE_BLOCK:%.*]], label [[VECTOR_BODY]], !llvm.loop [[LOOP0:![0-9]+]]
	; TFFALLBACK: middle.block:			; TFFALLBACK: middle.block:
	; TFFALLBACK-NEXT: [[CMP_N:%.*]] = icmp eq i64 1024, [[N_VEC]]			; TFFALLBACK-NEXT: br i1 true, label [[FOR_COND_CLEANUP:%.*]], label [[SCALAR_PH]]
	; TFFALLBACK-NEXT: br i1 [[CMP_N]], label [[FOR_COND_CLEANUP:%.*]], label [[SCALAR_PH]]
	; TFFALLBACK: scalar.ph:			; TFFALLBACK: scalar.ph:
	; TFFALLBACK-NEXT: [[BC_RESUME_VAL:%.]] = phi i64 [ [[N_VEC]], [[MIDDLE_BLOCK]] ], [ 0, [[ENTRY:%.]] ]			; TFFALLBACK-NEXT: [[BC_RESUME_VAL:%.]] = phi i64 [ 1024, [[MIDDLE_BLOCK]] ], [ 0, [[ENTRY:%.]] ]
	; TFFALLBACK-NEXT: br label [[FOR_BODY:%.*]]			; TFFALLBACK-NEXT: br label [[FOR_BODY:%.*]]
	; TFFALLBACK: for.body:			; TFFALLBACK: for.body:
	; TFFALLBACK-NEXT: [[INDVARS_IV:%.]] = phi i64 [ [[BC_RESUME_VAL]], [[SCALAR_PH]] ], [ [[INDVARS_IV_NEXT:%.]], [[FOR_BODY]] ]			; TFFALLBACK-NEXT: [[INDVARS_IV:%.]] = phi i64 [ [[BC_RESUME_VAL]], [[SCALAR_PH]] ], [ [[INDVARS_IV_NEXT:%.]], [[FOR_BODY]] ]
	; TFFALLBACK-NEXT: [[GEP:%.*]] = getelementptr i64, ptr [[B]], i64 [[INDVARS_IV]]			; TFFALLBACK-NEXT: [[GEP:%.*]] = getelementptr i64, ptr [[B]], i64 [[INDVARS_IV]]
	; TFFALLBACK-NEXT: [[LOAD:%.*]] = load i64, ptr [[GEP]], align 4			; TFFALLBACK-NEXT: [[LOAD:%.*]] = load i64, ptr [[GEP]], align 4
	; TFFALLBACK-NEXT: [[CALL:%.*]] = call i64 @foo(i64 [[LOAD]]) #[[ATTR2:[0-9]+]]			; TFFALLBACK-NEXT: [[CALL:%.*]] = call i64 @foo(i64 [[LOAD]]) #[[ATTR4]]
	; TFFALLBACK-NEXT: [[ARRAYIDX:%.*]] = getelementptr inbounds i64, ptr [[A]], i64 [[INDVARS_IV]]			; TFFALLBACK-NEXT: [[ARRAYIDX:%.*]] = getelementptr inbounds i64, ptr [[A]], i64 [[INDVARS_IV]]
	; TFFALLBACK-NEXT: store i64 [[CALL]], ptr [[ARRAYIDX]], align 4			; TFFALLBACK-NEXT: store i64 [[CALL]], ptr [[ARRAYIDX]], align 4
	; TFFALLBACK-NEXT: [[INDVARS_IV_NEXT]] = add nuw nsw i64 [[INDVARS_IV]], 1			; TFFALLBACK-NEXT: [[INDVARS_IV_NEXT]] = add nuw nsw i64 [[INDVARS_IV]], 1
	; TFFALLBACK-NEXT: [[EXITCOND:%.*]] = icmp eq i64 [[INDVARS_IV_NEXT]], 1024			; TFFALLBACK-NEXT: [[EXITCOND:%.*]] = icmp eq i64 [[INDVARS_IV_NEXT]], 1024
	; TFFALLBACK-NEXT: br i1 [[EXITCOND]], label [[FOR_COND_CLEANUP]], label [[FOR_BODY]], !llvm.loop [[LOOP3:![0-9]+]]			; TFFALLBACK-NEXT: br i1 [[EXITCOND]], label [[FOR_COND_CLEANUP]], label [[FOR_BODY]], !llvm.loop [[LOOP3:![0-9]+]]
	; TFFALLBACK: for.cond.cleanup:			; TFFALLBACK: for.cond.cleanup:
	; TFFALLBACK-NEXT: ret void			; TFFALLBACK-NEXT: ret void
	;			;
	▲ Show 20 Lines • Show All 44 Lines • ▼ Show 20 Lines
	; TFALWAYS-NEXT: br label [[FOR_BODY:%.*]]			; TFALWAYS-NEXT: br label [[FOR_BODY:%.*]]
	; TFALWAYS: for.body:			; TFALWAYS: for.body:
	; TFALWAYS-NEXT: [[INDVARS_IV:%.]] = phi i64 [ [[INDVARS_IV_NEXT:%.]], [[IF_END:%.]] ], [ 0, [[ENTRY:%.]] ]			; TFALWAYS-NEXT: [[INDVARS_IV:%.]] = phi i64 [ [[INDVARS_IV_NEXT:%.]], [[IF_END:%.]] ], [ 0, [[ENTRY:%.]] ]
	; TFALWAYS-NEXT: [[ARRAYIDX:%.]] = getelementptr inbounds i64, ptr [[A:%.]], i64 [[INDVARS_IV]]			; TFALWAYS-NEXT: [[ARRAYIDX:%.]] = getelementptr inbounds i64, ptr [[A:%.]], i64 [[INDVARS_IV]]
	; TFALWAYS-NEXT: [[TMP0:%.*]] = load i64, ptr [[ARRAYIDX]], align 8			; TFALWAYS-NEXT: [[TMP0:%.*]] = load i64, ptr [[ARRAYIDX]], align 8
	; TFALWAYS-NEXT: [[CMP:%.*]] = icmp ugt i64 [[TMP0]], 50			; TFALWAYS-NEXT: [[CMP:%.*]] = icmp ugt i64 [[TMP0]], 50
	; TFALWAYS-NEXT: br i1 [[CMP]], label [[IF_THEN:%.*]], label [[IF_END]]			; TFALWAYS-NEXT: br i1 [[CMP]], label [[IF_THEN:%.*]], label [[IF_END]]
	; TFALWAYS: if.then:			; TFALWAYS: if.then:
	; TFALWAYS-NEXT: [[TMP1:%.*]] = call i64 @foo(i64 [[TMP0]]) #[[ATTR1]]			; TFALWAYS-NEXT: [[TMP1:%.*]] = call i64 @foo(i64 [[TMP0]]) #[[ATTR4]]
	; TFALWAYS-NEXT: br label [[IF_END]]			; TFALWAYS-NEXT: br label [[IF_END]]
	; TFALWAYS: if.end:			; TFALWAYS: if.end:
	; TFALWAYS-NEXT: [[TMP2:%.*]] = phi i64 [ [[TMP1]], [[IF_THEN]] ], [ 0, [[FOR_BODY]] ]			; TFALWAYS-NEXT: [[TMP2:%.*]] = phi i64 [ [[TMP1]], [[IF_THEN]] ], [ 0, [[FOR_BODY]] ]
	; TFALWAYS-NEXT: [[ARRAYIDX1:%.]] = getelementptr inbounds i64, ptr [[B:%.]], i64 [[INDVARS_IV]]			; TFALWAYS-NEXT: [[ARRAYIDX1:%.]] = getelementptr inbounds i64, ptr [[B:%.]], i64 [[INDVARS_IV]]
	; TFALWAYS-NEXT: store i64 [[TMP2]], ptr [[ARRAYIDX1]], align 8			; TFALWAYS-NEXT: store i64 [[TMP2]], ptr [[ARRAYIDX1]], align 8
	; TFALWAYS-NEXT: [[INDVARS_IV_NEXT]] = add nuw nsw i64 [[INDVARS_IV]], 1			; TFALWAYS-NEXT: [[INDVARS_IV_NEXT]] = add nuw nsw i64 [[INDVARS_IV]], 1
	; TFALWAYS-NEXT: [[EXITCOND:%.*]] = icmp eq i64 [[INDVARS_IV_NEXT]], 1024			; TFALWAYS-NEXT: [[EXITCOND:%.*]] = icmp eq i64 [[INDVARS_IV_NEXT]], 1024
	; TFALWAYS-NEXT: br i1 [[EXITCOND]], label [[FOR_COND_CLEANUP:%.*]], label [[FOR_BODY]]			; TFALWAYS-NEXT: br i1 [[EXITCOND]], label [[FOR_COND_CLEANUP:%.*]], label [[FOR_BODY]]
	; TFALWAYS: for.cond.cleanup:			; TFALWAYS: for.cond.cleanup:
	; TFALWAYS-NEXT: ret void			; TFALWAYS-NEXT: ret void
	;			;
	; TFFALLBACK-LABEL: @test_if_then(			; TFFALLBACK-LABEL: @test_if_then(
	; TFFALLBACK-NEXT: entry:			; TFFALLBACK-NEXT: entry:
	; TFFALLBACK-NEXT: br label [[FOR_BODY:%.*]]			; TFFALLBACK-NEXT: br label [[FOR_BODY:%.*]]
	; TFFALLBACK: for.body:			; TFFALLBACK: for.body:
	; TFFALLBACK-NEXT: [[INDVARS_IV:%.]] = phi i64 [ [[INDVARS_IV_NEXT:%.]], [[IF_END:%.]] ], [ 0, [[ENTRY:%.]] ]			; TFFALLBACK-NEXT: [[INDVARS_IV:%.]] = phi i64 [ [[INDVARS_IV_NEXT:%.]], [[IF_END:%.]] ], [ 0, [[ENTRY:%.]] ]
	; TFFALLBACK-NEXT: [[ARRAYIDX:%.]] = getelementptr inbounds i64, ptr [[A:%.]], i64 [[INDVARS_IV]]			; TFFALLBACK-NEXT: [[ARRAYIDX:%.]] = getelementptr inbounds i64, ptr [[A:%.]], i64 [[INDVARS_IV]]
	; TFFALLBACK-NEXT: [[TMP0:%.*]] = load i64, ptr [[ARRAYIDX]], align 8			; TFFALLBACK-NEXT: [[TMP0:%.*]] = load i64, ptr [[ARRAYIDX]], align 8
	; TFFALLBACK-NEXT: [[CMP:%.*]] = icmp ugt i64 [[TMP0]], 50			; TFFALLBACK-NEXT: [[CMP:%.*]] = icmp ugt i64 [[TMP0]], 50
	; TFFALLBACK-NEXT: br i1 [[CMP]], label [[IF_THEN:%.*]], label [[IF_END]]			; TFFALLBACK-NEXT: br i1 [[CMP]], label [[IF_THEN:%.*]], label [[IF_END]]
	; TFFALLBACK: if.then:			; TFFALLBACK: if.then:
	; TFFALLBACK-NEXT: [[TMP1:%.*]] = call i64 @foo(i64 [[TMP0]]) #[[ATTR2]]			; TFFALLBACK-NEXT: [[TMP1:%.*]] = call i64 @foo(i64 [[TMP0]]) #[[ATTR4]]
	; TFFALLBACK-NEXT: br label [[IF_END]]			; TFFALLBACK-NEXT: br label [[IF_END]]
	; TFFALLBACK: if.end:			; TFFALLBACK: if.end:
	; TFFALLBACK-NEXT: [[TMP2:%.*]] = phi i64 [ [[TMP1]], [[IF_THEN]] ], [ 0, [[FOR_BODY]] ]			; TFFALLBACK-NEXT: [[TMP2:%.*]] = phi i64 [ [[TMP1]], [[IF_THEN]] ], [ 0, [[FOR_BODY]] ]
	; TFFALLBACK-NEXT: [[ARRAYIDX1:%.]] = getelementptr inbounds i64, ptr [[B:%.]], i64 [[INDVARS_IV]]			; TFFALLBACK-NEXT: [[ARRAYIDX1:%.]] = getelementptr inbounds i64, ptr [[B:%.]], i64 [[INDVARS_IV]]
	; TFFALLBACK-NEXT: store i64 [[TMP2]], ptr [[ARRAYIDX1]], align 8			; TFFALLBACK-NEXT: store i64 [[TMP2]], ptr [[ARRAYIDX1]], align 8
	; TFFALLBACK-NEXT: [[INDVARS_IV_NEXT]] = add nuw nsw i64 [[INDVARS_IV]], 1			; TFFALLBACK-NEXT: [[INDVARS_IV_NEXT]] = add nuw nsw i64 [[INDVARS_IV]], 1
	; TFFALLBACK-NEXT: [[EXITCOND:%.*]] = icmp eq i64 [[INDVARS_IV_NEXT]], 1024			; TFFALLBACK-NEXT: [[EXITCOND:%.*]] = icmp eq i64 [[INDVARS_IV_NEXT]], 1024
	; TFFALLBACK-NEXT: br i1 [[EXITCOND]], label [[FOR_COND_CLEANUP:%.*]], label [[FOR_BODY]]			; TFFALLBACK-NEXT: br i1 [[EXITCOND]], label [[FOR_COND_CLEANUP:%.*]], label [[FOR_BODY]]
	▲ Show 20 Lines • Show All 61 Lines • ▼ Show 20 Lines
	; TFALWAYS-NEXT: br label [[FOR_BODY:%.*]]			; TFALWAYS-NEXT: br label [[FOR_BODY:%.*]]
	; TFALWAYS: for.body:			; TFALWAYS: for.body:
	; TFALWAYS-NEXT: [[INDVARS_IV:%.]] = phi i64 [ [[INDVARS_IV_NEXT:%.]], [[IF_END:%.]] ], [ 0, [[ENTRY:%.]] ]			; TFALWAYS-NEXT: [[INDVARS_IV:%.]] = phi i64 [ [[INDVARS_IV_NEXT:%.]], [[IF_END:%.]] ], [ 0, [[ENTRY:%.]] ]
	; TFALWAYS-NEXT: [[ARRAYIDX:%.]] = getelementptr inbounds i64, ptr [[A:%.]], i64 [[INDVARS_IV]]			; TFALWAYS-NEXT: [[ARRAYIDX:%.]] = getelementptr inbounds i64, ptr [[A:%.]], i64 [[INDVARS_IV]]
	; TFALWAYS-NEXT: [[TMP0:%.*]] = load i64, ptr [[ARRAYIDX]], align 8			; TFALWAYS-NEXT: [[TMP0:%.*]] = load i64, ptr [[ARRAYIDX]], align 8
	; TFALWAYS-NEXT: [[CMP:%.*]] = icmp ugt i64 [[TMP0]], 50			; TFALWAYS-NEXT: [[CMP:%.*]] = icmp ugt i64 [[TMP0]], 50
	; TFALWAYS-NEXT: br i1 [[CMP]], label [[IF_THEN:%.]], label [[IF_ELSE:%.]]			; TFALWAYS-NEXT: br i1 [[CMP]], label [[IF_THEN:%.]], label [[IF_ELSE:%.]]
	; TFALWAYS: if.then:			; TFALWAYS: if.then:
	; TFALWAYS-NEXT: [[TMP1:%.*]] = call i64 @foo(i64 [[TMP0]]) #[[ATTR2:[0-9]+]]			; TFALWAYS-NEXT: [[TMP1:%.*]] = call i64 @foo(i64 [[TMP0]]) #[[ATTR5:[0-9]+]]
	; TFALWAYS-NEXT: br label [[IF_END]]			; TFALWAYS-NEXT: br label [[IF_END]]
	; TFALWAYS: if.else:			; TFALWAYS: if.else:
	; TFALWAYS-NEXT: [[TMP2:%.*]] = call i64 @foo(i64 0) #[[ATTR2]]			; TFALWAYS-NEXT: [[TMP2:%.*]] = call i64 @foo(i64 0) #[[ATTR5]]
	; TFALWAYS-NEXT: br label [[IF_END]]			; TFALWAYS-NEXT: br label [[IF_END]]
	; TFALWAYS: if.end:			; TFALWAYS: if.end:
	; TFALWAYS-NEXT: [[TMP3:%.*]] = phi i64 [ [[TMP1]], [[IF_THEN]] ], [ [[TMP2]], [[IF_ELSE]] ]			; TFALWAYS-NEXT: [[TMP3:%.*]] = phi i64 [ [[TMP1]], [[IF_THEN]] ], [ [[TMP2]], [[IF_ELSE]] ]
	; TFALWAYS-NEXT: [[ARRAYIDX1:%.]] = getelementptr inbounds i64, ptr [[B:%.]], i64 [[INDVARS_IV]]			; TFALWAYS-NEXT: [[ARRAYIDX1:%.]] = getelementptr inbounds i64, ptr [[B:%.]], i64 [[INDVARS_IV]]
	; TFALWAYS-NEXT: store i64 [[TMP3]], ptr [[ARRAYIDX1]], align 8			; TFALWAYS-NEXT: store i64 [[TMP3]], ptr [[ARRAYIDX1]], align 8
	; TFALWAYS-NEXT: [[INDVARS_IV_NEXT]] = add nuw nsw i64 [[INDVARS_IV]], 1			; TFALWAYS-NEXT: [[INDVARS_IV_NEXT]] = add nuw nsw i64 [[INDVARS_IV]], 1
	; TFALWAYS-NEXT: [[EXITCOND:%.*]] = icmp eq i64 [[INDVARS_IV_NEXT]], 1024			; TFALWAYS-NEXT: [[EXITCOND:%.*]] = icmp eq i64 [[INDVARS_IV_NEXT]], 1024
	; TFALWAYS-NEXT: br i1 [[EXITCOND]], label [[FOR_COND_CLEANUP:%.*]], label [[FOR_BODY]]			; TFALWAYS-NEXT: br i1 [[EXITCOND]], label [[FOR_COND_CLEANUP:%.*]], label [[FOR_BODY]]
	; TFALWAYS: for.cond.cleanup:			; TFALWAYS: for.cond.cleanup:
	; TFALWAYS-NEXT: ret void			; TFALWAYS-NEXT: ret void
	;			;
	; TFFALLBACK-LABEL: @test_widen_if_then_else(			; TFFALLBACK-LABEL: @test_widen_if_then_else(
	; TFFALLBACK-NEXT: entry:			; TFFALLBACK-NEXT: entry:
	; TFFALLBACK-NEXT: br label [[FOR_BODY:%.*]]			; TFFALLBACK-NEXT: br label [[FOR_BODY:%.*]]
	; TFFALLBACK: for.body:			; TFFALLBACK: for.body:
	; TFFALLBACK-NEXT: [[INDVARS_IV:%.]] = phi i64 [ [[INDVARS_IV_NEXT:%.]], [[IF_END:%.]] ], [ 0, [[ENTRY:%.]] ]			; TFFALLBACK-NEXT: [[INDVARS_IV:%.]] = phi i64 [ [[INDVARS_IV_NEXT:%.]], [[IF_END:%.]] ], [ 0, [[ENTRY:%.]] ]
	; TFFALLBACK-NEXT: [[ARRAYIDX:%.]] = getelementptr inbounds i64, ptr [[A:%.]], i64 [[INDVARS_IV]]			; TFFALLBACK-NEXT: [[ARRAYIDX:%.]] = getelementptr inbounds i64, ptr [[A:%.]], i64 [[INDVARS_IV]]
	; TFFALLBACK-NEXT: [[TMP0:%.*]] = load i64, ptr [[ARRAYIDX]], align 8			; TFFALLBACK-NEXT: [[TMP0:%.*]] = load i64, ptr [[ARRAYIDX]], align 8
	; TFFALLBACK-NEXT: [[CMP:%.*]] = icmp ugt i64 [[TMP0]], 50			; TFFALLBACK-NEXT: [[CMP:%.*]] = icmp ugt i64 [[TMP0]], 50
	; TFFALLBACK-NEXT: br i1 [[CMP]], label [[IF_THEN:%.]], label [[IF_ELSE:%.]]			; TFFALLBACK-NEXT: br i1 [[CMP]], label [[IF_THEN:%.]], label [[IF_ELSE:%.]]
	; TFFALLBACK: if.then:			; TFFALLBACK: if.then:
	; TFFALLBACK-NEXT: [[TMP1:%.*]] = call i64 @foo(i64 [[TMP0]]) #[[ATTR3:[0-9]+]]			; TFFALLBACK-NEXT: [[TMP1:%.*]] = call i64 @foo(i64 [[TMP0]]) #[[ATTR5:[0-9]+]]
	; TFFALLBACK-NEXT: br label [[IF_END]]			; TFFALLBACK-NEXT: br label [[IF_END]]
	; TFFALLBACK: if.else:			; TFFALLBACK: if.else:
	; TFFALLBACK-NEXT: [[TMP2:%.*]] = call i64 @foo(i64 0) #[[ATTR3]]			; TFFALLBACK-NEXT: [[TMP2:%.*]] = call i64 @foo(i64 0) #[[ATTR5]]
	; TFFALLBACK-NEXT: br label [[IF_END]]			; TFFALLBACK-NEXT: br label [[IF_END]]
	; TFFALLBACK: if.end:			; TFFALLBACK: if.end:
	; TFFALLBACK-NEXT: [[TMP3:%.*]] = phi i64 [ [[TMP1]], [[IF_THEN]] ], [ [[TMP2]], [[IF_ELSE]] ]			; TFFALLBACK-NEXT: [[TMP3:%.*]] = phi i64 [ [[TMP1]], [[IF_THEN]] ], [ [[TMP2]], [[IF_ELSE]] ]
	; TFFALLBACK-NEXT: [[ARRAYIDX1:%.]] = getelementptr inbounds i64, ptr [[B:%.]], i64 [[INDVARS_IV]]			; TFFALLBACK-NEXT: [[ARRAYIDX1:%.]] = getelementptr inbounds i64, ptr [[B:%.]], i64 [[INDVARS_IV]]
	; TFFALLBACK-NEXT: store i64 [[TMP3]], ptr [[ARRAYIDX1]], align 8			; TFFALLBACK-NEXT: store i64 [[TMP3]], ptr [[ARRAYIDX1]], align 8
	; TFFALLBACK-NEXT: [[INDVARS_IV_NEXT]] = add nuw nsw i64 [[INDVARS_IV]], 1			; TFFALLBACK-NEXT: [[INDVARS_IV_NEXT]] = add nuw nsw i64 [[INDVARS_IV]], 1
	; TFFALLBACK-NEXT: [[EXITCOND:%.*]] = icmp eq i64 [[INDVARS_IV_NEXT]], 1024			; TFFALLBACK-NEXT: [[EXITCOND:%.*]] = icmp eq i64 [[INDVARS_IV_NEXT]], 1024
	; TFFALLBACK-NEXT: br i1 [[EXITCOND]], label [[FOR_COND_CLEANUP:%.*]], label [[FOR_BODY]]			; TFFALLBACK-NEXT: br i1 [[EXITCOND]], label [[FOR_COND_CLEANUP:%.*]], label [[FOR_BODY]]
	▲ Show 20 Lines • Show All 79 Lines • ▼ Show 20 Lines
	;			;
	; TFALWAYS-LABEL: @test_widen_nomask(			; TFALWAYS-LABEL: @test_widen_nomask(
	; TFALWAYS-NEXT: entry:			; TFALWAYS-NEXT: entry:
	; TFALWAYS-NEXT: br label [[FOR_BODY:%.*]]			; TFALWAYS-NEXT: br label [[FOR_BODY:%.*]]
	; TFALWAYS: for.body:			; TFALWAYS: for.body:
	; TFALWAYS-NEXT: [[INDVARS_IV:%.]] = phi i64 [ 0, [[ENTRY:%.]] ], [ [[INDVARS_IV_NEXT:%.*]], [[FOR_BODY]] ]			; TFALWAYS-NEXT: [[INDVARS_IV:%.]] = phi i64 [ 0, [[ENTRY:%.]] ], [ [[INDVARS_IV_NEXT:%.*]], [[FOR_BODY]] ]
	; TFALWAYS-NEXT: [[GEP:%.]] = getelementptr i64, ptr [[B:%.]], i64 [[INDVARS_IV]]			; TFALWAYS-NEXT: [[GEP:%.]] = getelementptr i64, ptr [[B:%.]], i64 [[INDVARS_IV]]
	; TFALWAYS-NEXT: [[LOAD:%.*]] = load i64, ptr [[GEP]], align 4			; TFALWAYS-NEXT: [[LOAD:%.*]] = load i64, ptr [[GEP]], align 4
	; TFALWAYS-NEXT: [[CALL:%.*]] = call i64 @foo(i64 [[LOAD]]) #[[ATTR3:[0-9]+]]			; TFALWAYS-NEXT: [[CALL:%.*]] = call i64 @foo(i64 [[LOAD]]) #[[ATTR6:[0-9]+]]
	; TFALWAYS-NEXT: [[ARRAYIDX:%.]] = getelementptr inbounds i64, ptr [[A:%.]], i64 [[INDVARS_IV]]			; TFALWAYS-NEXT: [[ARRAYIDX:%.]] = getelementptr inbounds i64, ptr [[A:%.]], i64 [[INDVARS_IV]]
	; TFALWAYS-NEXT: store i64 [[CALL]], ptr [[ARRAYIDX]], align 4			; TFALWAYS-NEXT: store i64 [[CALL]], ptr [[ARRAYIDX]], align 4
	; TFALWAYS-NEXT: [[INDVARS_IV_NEXT]] = add nuw nsw i64 [[INDVARS_IV]], 1			; TFALWAYS-NEXT: [[INDVARS_IV_NEXT]] = add nuw nsw i64 [[INDVARS_IV]], 1
	; TFALWAYS-NEXT: [[EXITCOND:%.*]] = icmp eq i64 [[INDVARS_IV_NEXT]], 1024			; TFALWAYS-NEXT: [[EXITCOND:%.*]] = icmp eq i64 [[INDVARS_IV_NEXT]], 1024
	; TFALWAYS-NEXT: br i1 [[EXITCOND]], label [[FOR_COND_CLEANUP:%.*]], label [[FOR_BODY]]			; TFALWAYS-NEXT: br i1 [[EXITCOND]], label [[FOR_COND_CLEANUP:%.*]], label [[FOR_BODY]]
	; TFALWAYS: for.cond.cleanup:			; TFALWAYS: for.cond.cleanup:
	; TFALWAYS-NEXT: ret void			; TFALWAYS-NEXT: ret void
	;			;
	Show All 26 Lines
	; TFFALLBACK-NEXT: br i1 [[CMP_N]], label [[FOR_COND_CLEANUP:%.*]], label [[SCALAR_PH]]			; TFFALLBACK-NEXT: br i1 [[CMP_N]], label [[FOR_COND_CLEANUP:%.*]], label [[SCALAR_PH]]
	; TFFALLBACK: scalar.ph:			; TFFALLBACK: scalar.ph:
	; TFFALLBACK-NEXT: [[BC_RESUME_VAL:%.]] = phi i64 [ [[N_VEC]], [[MIDDLE_BLOCK]] ], [ 0, [[ENTRY:%.]] ]			; TFFALLBACK-NEXT: [[BC_RESUME_VAL:%.]] = phi i64 [ [[N_VEC]], [[MIDDLE_BLOCK]] ], [ 0, [[ENTRY:%.]] ]
	; TFFALLBACK-NEXT: br label [[FOR_BODY:%.*]]			; TFFALLBACK-NEXT: br label [[FOR_BODY:%.*]]
	; TFFALLBACK: for.body:			; TFFALLBACK: for.body:
	; TFFALLBACK-NEXT: [[INDVARS_IV:%.]] = phi i64 [ [[BC_RESUME_VAL]], [[SCALAR_PH]] ], [ [[INDVARS_IV_NEXT:%.]], [[FOR_BODY]] ]			; TFFALLBACK-NEXT: [[INDVARS_IV:%.]] = phi i64 [ [[BC_RESUME_VAL]], [[SCALAR_PH]] ], [ [[INDVARS_IV_NEXT:%.]], [[FOR_BODY]] ]
	; TFFALLBACK-NEXT: [[GEP:%.*]] = getelementptr i64, ptr [[B]], i64 [[INDVARS_IV]]			; TFFALLBACK-NEXT: [[GEP:%.*]] = getelementptr i64, ptr [[B]], i64 [[INDVARS_IV]]
	; TFFALLBACK-NEXT: [[LOAD:%.*]] = load i64, ptr [[GEP]], align 4			; TFFALLBACK-NEXT: [[LOAD:%.*]] = load i64, ptr [[GEP]], align 4
	; TFFALLBACK-NEXT: [[CALL:%.*]] = call i64 @foo(i64 [[LOAD]]) #[[ATTR4:[0-9]+]]			; TFFALLBACK-NEXT: [[CALL:%.*]] = call i64 @foo(i64 [[LOAD]]) #[[ATTR6:[0-9]+]]
	; TFFALLBACK-NEXT: [[ARRAYIDX:%.*]] = getelementptr inbounds i64, ptr [[A]], i64 [[INDVARS_IV]]			; TFFALLBACK-NEXT: [[ARRAYIDX:%.*]] = getelementptr inbounds i64, ptr [[A]], i64 [[INDVARS_IV]]
	; TFFALLBACK-NEXT: store i64 [[CALL]], ptr [[ARRAYIDX]], align 4			; TFFALLBACK-NEXT: store i64 [[CALL]], ptr [[ARRAYIDX]], align 4
	; TFFALLBACK-NEXT: [[INDVARS_IV_NEXT]] = add nuw nsw i64 [[INDVARS_IV]], 1			; TFFALLBACK-NEXT: [[INDVARS_IV_NEXT]] = add nuw nsw i64 [[INDVARS_IV]], 1
	; TFFALLBACK-NEXT: [[EXITCOND:%.*]] = icmp eq i64 [[INDVARS_IV_NEXT]], 1024			; TFFALLBACK-NEXT: [[EXITCOND:%.*]] = icmp eq i64 [[INDVARS_IV_NEXT]], 1024
	; TFFALLBACK-NEXT: br i1 [[EXITCOND]], label [[FOR_COND_CLEANUP]], label [[FOR_BODY]], !llvm.loop [[LOOP5:![0-9]+]]			; TFFALLBACK-NEXT: br i1 [[EXITCOND]], label [[FOR_COND_CLEANUP]], label [[FOR_BODY]], !llvm.loop [[LOOP5:![0-9]+]]
	; TFFALLBACK: for.cond.cleanup:			; TFFALLBACK: for.cond.cleanup:
	; TFFALLBACK-NEXT: ret void			; TFFALLBACK-NEXT: ret void
	;			;
	▲ Show 20 Lines • Show All 59 Lines • ▼ Show 20 Lines
	; TFNONE-NEXT: [[INDVARS_IV_NEXT]] = add nuw nsw i64 [[INDVARS_IV]], 1			; TFNONE-NEXT: [[INDVARS_IV_NEXT]] = add nuw nsw i64 [[INDVARS_IV]], 1
	; TFNONE-NEXT: [[EXITCOND:%.*]] = icmp eq i64 [[INDVARS_IV_NEXT]], 1024			; TFNONE-NEXT: [[EXITCOND:%.*]] = icmp eq i64 [[INDVARS_IV_NEXT]], 1024
	; TFNONE-NEXT: br i1 [[EXITCOND]], label [[FOR_COND_CLEANUP]], label [[FOR_BODY]], !llvm.loop [[LOOP7:![0-9]+]]			; TFNONE-NEXT: br i1 [[EXITCOND]], label [[FOR_COND_CLEANUP]], label [[FOR_BODY]], !llvm.loop [[LOOP7:![0-9]+]]
	; TFNONE: for.cond.cleanup:			; TFNONE: for.cond.cleanup:
	; TFNONE-NEXT: ret void			; TFNONE-NEXT: ret void
	;			;
	; TFALWAYS-LABEL: @test_widen_optmask(			; TFALWAYS-LABEL: @test_widen_optmask(
	; TFALWAYS-NEXT: entry:			; TFALWAYS-NEXT: entry:
				; TFALWAYS-NEXT: br i1 false, label [[SCALAR_PH:%.]], label [[VECTOR_PH:%.]]
				; TFALWAYS: vector.ph:
				; TFALWAYS-NEXT: br label [[VECTOR_BODY:%.*]]
				; TFALWAYS: vector.body:
				; TFALWAYS-NEXT: [[INDEX:%.]] = phi i64 [ 0, [[VECTOR_PH]] ], [ [[INDEX_NEXT:%.]], [[PRED_CALL_CONTINUE2:%.*]] ]
				; TFALWAYS-NEXT: [[ACTIVE_LANE_MASK:%.]] = phi <2 x i1> [ <i1 true, i1 true>, [[VECTOR_PH]] ], [ [[ACTIVE_LANE_MASK_NEXT:%.]], [[PRED_CALL_CONTINUE2]] ]
				; TFALWAYS-NEXT: [[TMP0:%.]] = getelementptr i64, ptr [[B:%.]], i64 [[INDEX]]
				; TFALWAYS-NEXT: [[WIDE_MASKED_LOAD:%.*]] = call <2 x i64> @llvm.masked.load.v2i64.p0(ptr [[TMP0]], i32 4, <2 x i1> [[ACTIVE_LANE_MASK]], <2 x i64> poison)
				; TFALWAYS-NEXT: [[TMP1:%.*]] = extractelement <2 x i1> [[ACTIVE_LANE_MASK]], i32 0
				; TFALWAYS-NEXT: br i1 [[TMP1]], label [[PRED_CALL_IF:%.]], label [[PRED_CALL_CONTINUE:%.]]
				; TFALWAYS: pred.call.if:
				; TFALWAYS-NEXT: [[TMP2:%.*]] = extractelement <2 x i64> [[WIDE_MASKED_LOAD]], i32 0
				; TFALWAYS-NEXT: [[TMP3:%.*]] = call i64 @foo(i64 [[TMP2]]) #[[ATTR7:[0-9]+]]
				; TFALWAYS-NEXT: [[TMP4:%.*]] = insertelement <2 x i64> poison, i64 [[TMP3]], i32 0
				; TFALWAYS-NEXT: br label [[PRED_CALL_CONTINUE]]
				; TFALWAYS: pred.call.continue:
				; TFALWAYS-NEXT: [[TMP5:%.*]] = phi <2 x i64> [ poison, [[VECTOR_BODY]] ], [ [[TMP4]], [[PRED_CALL_IF]] ]
				; TFALWAYS-NEXT: [[TMP6:%.*]] = extractelement <2 x i1> [[ACTIVE_LANE_MASK]], i32 1
				; TFALWAYS-NEXT: br i1 [[TMP6]], label [[PRED_CALL_IF1:%.*]], label [[PRED_CALL_CONTINUE2]]
				; TFALWAYS: pred.call.if1:
				; TFALWAYS-NEXT: [[TMP7:%.*]] = extractelement <2 x i64> [[WIDE_MASKED_LOAD]], i32 1
				; TFALWAYS-NEXT: [[TMP8:%.*]] = call i64 @foo(i64 [[TMP7]]) #[[ATTR7]]
				; TFALWAYS-NEXT: [[TMP9:%.*]] = insertelement <2 x i64> [[TMP5]], i64 [[TMP8]], i32 1
				; TFALWAYS-NEXT: br label [[PRED_CALL_CONTINUE2]]
				; TFALWAYS: pred.call.continue2:
				; TFALWAYS-NEXT: [[TMP10:%.*]] = phi <2 x i64> [ [[TMP5]], [[PRED_CALL_CONTINUE]] ], [ [[TMP9]], [[PRED_CALL_IF1]] ]
				; TFALWAYS-NEXT: [[TMP11:%.]] = getelementptr inbounds i64, ptr [[A:%.]], i64 [[INDEX]]
				; TFALWAYS-NEXT: call void @llvm.masked.store.v2i64.p0(<2 x i64> [[TMP10]], ptr [[TMP11]], i32 4, <2 x i1> [[ACTIVE_LANE_MASK]])
				; TFALWAYS-NEXT: [[INDEX_NEXT]] = add i64 [[INDEX]], 2
				; TFALWAYS-NEXT: [[ACTIVE_LANE_MASK_NEXT]] = call <2 x i1> @llvm.get.active.lane.mask.v2i1.i64(i64 [[INDEX_NEXT]], i64 1024)
				; TFALWAYS-NEXT: [[TMP12:%.*]] = xor <2 x i1> [[ACTIVE_LANE_MASK_NEXT]], <i1 true, i1 true>
				; TFALWAYS-NEXT: [[TMP13:%.*]] = extractelement <2 x i1> [[TMP12]], i32 0
				; TFALWAYS-NEXT: br i1 [[TMP13]], label [[MIDDLE_BLOCK:%.*]], label [[VECTOR_BODY]], !llvm.loop [[LOOP4:![0-9]+]]
				; TFALWAYS: middle.block:
				; TFALWAYS-NEXT: br i1 true, label [[FOR_COND_CLEANUP:%.*]], label [[SCALAR_PH]]
				; TFALWAYS: scalar.ph:
				; TFALWAYS-NEXT: [[BC_RESUME_VAL:%.]] = phi i64 [ 1024, [[MIDDLE_BLOCK]] ], [ 0, [[ENTRY:%.]] ]
	; TFALWAYS-NEXT: br label [[FOR_BODY:%.*]]			; TFALWAYS-NEXT: br label [[FOR_BODY:%.*]]
	; TFALWAYS: for.body:			; TFALWAYS: for.body:
	; TFALWAYS-NEXT: [[INDVARS_IV:%.]] = phi i64 [ 0, [[ENTRY:%.]] ], [ [[INDVARS_IV_NEXT:%.*]], [[FOR_BODY]] ]			; TFALWAYS-NEXT: [[INDVARS_IV:%.]] = phi i64 [ [[BC_RESUME_VAL]], [[SCALAR_PH]] ], [ [[INDVARS_IV_NEXT:%.]], [[FOR_BODY]] ]
	; TFALWAYS-NEXT: [[GEP:%.]] = getelementptr i64, ptr [[B:%.]], i64 [[INDVARS_IV]]			; TFALWAYS-NEXT: [[GEP:%.*]] = getelementptr i64, ptr [[B]], i64 [[INDVARS_IV]]
	; TFALWAYS-NEXT: [[LOAD:%.*]] = load i64, ptr [[GEP]], align 4			; TFALWAYS-NEXT: [[LOAD:%.*]] = load i64, ptr [[GEP]], align 4
	; TFALWAYS-NEXT: [[CALL:%.*]] = call i64 @foo(i64 [[LOAD]]) #[[ATTR4:[0-9]+]]			; TFALWAYS-NEXT: [[CALL:%.*]] = call i64 @foo(i64 [[LOAD]]) #[[ATTR7]]
	; TFALWAYS-NEXT: [[ARRAYIDX:%.]] = getelementptr inbounds i64, ptr [[A:%.]], i64 [[INDVARS_IV]]			; TFALWAYS-NEXT: [[ARRAYIDX:%.*]] = getelementptr inbounds i64, ptr [[A]], i64 [[INDVARS_IV]]
	; TFALWAYS-NEXT: store i64 [[CALL]], ptr [[ARRAYIDX]], align 4			; TFALWAYS-NEXT: store i64 [[CALL]], ptr [[ARRAYIDX]], align 4
	; TFALWAYS-NEXT: [[INDVARS_IV_NEXT]] = add nuw nsw i64 [[INDVARS_IV]], 1			; TFALWAYS-NEXT: [[INDVARS_IV_NEXT]] = add nuw nsw i64 [[INDVARS_IV]], 1
	; TFALWAYS-NEXT: [[EXITCOND:%.*]] = icmp eq i64 [[INDVARS_IV_NEXT]], 1024			; TFALWAYS-NEXT: [[EXITCOND:%.*]] = icmp eq i64 [[INDVARS_IV_NEXT]], 1024
	; TFALWAYS-NEXT: br i1 [[EXITCOND]], label [[FOR_COND_CLEANUP:%.*]], label [[FOR_BODY]]			; TFALWAYS-NEXT: br i1 [[EXITCOND]], label [[FOR_COND_CLEANUP]], label [[FOR_BODY]], !llvm.loop [[LOOP5:![0-9]+]]
	; TFALWAYS: for.cond.cleanup:			; TFALWAYS: for.cond.cleanup:
	; TFALWAYS-NEXT: ret void			; TFALWAYS-NEXT: ret void
	;			;
	; TFFALLBACK-LABEL: @test_widen_optmask(			; TFFALLBACK-LABEL: @test_widen_optmask(
	; TFFALLBACK-NEXT: entry:			; TFFALLBACK-NEXT: entry:
	; TFFALLBACK-NEXT: [[TMP0:%.*]] = call i64 @llvm.vscale.i64()			; TFFALLBACK-NEXT: br i1 false, label [[SCALAR_PH:%.]], label [[VECTOR_PH:%.]]
	; TFFALLBACK-NEXT: [[TMP1:%.*]] = mul i64 [[TMP0]], 2
	; TFFALLBACK-NEXT: [[MIN_ITERS_CHECK:%.*]] = icmp ult i64 1024, [[TMP1]]
	; TFFALLBACK-NEXT: br i1 [[MIN_ITERS_CHECK]], label [[SCALAR_PH:%.]], label [[VECTOR_PH:%.]]
	; TFFALLBACK: vector.ph:			; TFFALLBACK: vector.ph:
	; TFFALLBACK-NEXT: [[TMP2:%.*]] = call i64 @llvm.vscale.i64()
	; TFFALLBACK-NEXT: [[TMP3:%.*]] = mul i64 [[TMP2]], 2
	; TFFALLBACK-NEXT: [[N_MOD_VF:%.*]] = urem i64 1024, [[TMP3]]
	; TFFALLBACK-NEXT: [[N_VEC:%.*]] = sub i64 1024, [[N_MOD_VF]]
	; TFFALLBACK-NEXT: br label [[VECTOR_BODY:%.*]]			; TFFALLBACK-NEXT: br label [[VECTOR_BODY:%.*]]
	; TFFALLBACK: vector.body:			; TFFALLBACK: vector.body:
	; TFFALLBACK-NEXT: [[INDEX:%.]] = phi i64 [ 0, [[VECTOR_PH]] ], [ [[INDEX_NEXT:%.]], [[VECTOR_BODY]] ]			; TFFALLBACK-NEXT: [[INDEX:%.]] = phi i64 [ 0, [[VECTOR_PH]] ], [ [[INDEX_NEXT:%.]], [[PRED_CALL_CONTINUE2:%.*]] ]
	; TFFALLBACK-NEXT: [[TMP4:%.]] = getelementptr i64, ptr [[B:%.]], i64 [[INDEX]]			; TFFALLBACK-NEXT: [[ACTIVE_LANE_MASK:%.]] = phi <2 x i1> [ <i1 true, i1 true>, [[VECTOR_PH]] ], [ [[ACTIVE_LANE_MASK_NEXT:%.]], [[PRED_CALL_CONTINUE2]] ]
	; TFFALLBACK-NEXT: [[WIDE_LOAD:%.*]] = load <vscale x 2 x i64>, ptr [[TMP4]], align 4			; TFFALLBACK-NEXT: [[TMP0:%.]] = getelementptr i64, ptr [[B:%.]], i64 [[INDEX]]
	; TFFALLBACK-NEXT: [[TMP5:%.*]] = call <vscale x 2 x i64> @foo_vector_nomask(<vscale x 2 x i64> [[WIDE_LOAD]])			; TFFALLBACK-NEXT: [[WIDE_MASKED_LOAD:%.*]] = call <2 x i64> @llvm.masked.load.v2i64.p0(ptr [[TMP0]], i32 4, <2 x i1> [[ACTIVE_LANE_MASK]], <2 x i64> poison)
	; TFFALLBACK-NEXT: [[TMP6:%.]] = getelementptr inbounds i64, ptr [[A:%.]], i64 [[INDEX]]			; TFFALLBACK-NEXT: [[TMP1:%.*]] = extractelement <2 x i1> [[ACTIVE_LANE_MASK]], i32 0
	; TFFALLBACK-NEXT: store <vscale x 2 x i64> [[TMP5]], ptr [[TMP6]], align 4			; TFFALLBACK-NEXT: br i1 [[TMP1]], label [[PRED_CALL_IF:%.]], label [[PRED_CALL_CONTINUE:%.]]
	; TFFALLBACK-NEXT: [[TMP7:%.*]] = call i64 @llvm.vscale.i64()			; TFFALLBACK: pred.call.if:
	; TFFALLBACK-NEXT: [[TMP8:%.*]] = mul i64 [[TMP7]], 2			; TFFALLBACK-NEXT: [[TMP2:%.*]] = extractelement <2 x i64> [[WIDE_MASKED_LOAD]], i32 0
	; TFFALLBACK-NEXT: [[INDEX_NEXT]] = add nuw i64 [[INDEX]], [[TMP8]]			; TFFALLBACK-NEXT: [[TMP3:%.*]] = call i64 @foo(i64 [[TMP2]]) #[[ATTR7:[0-9]+]]
	; TFFALLBACK-NEXT: [[TMP9:%.*]] = icmp eq i64 [[INDEX_NEXT]], [[N_VEC]]			; TFFALLBACK-NEXT: [[TMP4:%.*]] = insertelement <2 x i64> poison, i64 [[TMP3]], i32 0
	; TFFALLBACK-NEXT: br i1 [[TMP9]], label [[MIDDLE_BLOCK:%.*]], label [[VECTOR_BODY]], !llvm.loop [[LOOP6:![0-9]+]]			; TFFALLBACK-NEXT: br label [[PRED_CALL_CONTINUE]]
				; TFFALLBACK: pred.call.continue:
				; TFFALLBACK-NEXT: [[TMP5:%.*]] = phi <2 x i64> [ poison, [[VECTOR_BODY]] ], [ [[TMP4]], [[PRED_CALL_IF]] ]
				; TFFALLBACK-NEXT: [[TMP6:%.*]] = extractelement <2 x i1> [[ACTIVE_LANE_MASK]], i32 1
				; TFFALLBACK-NEXT: br i1 [[TMP6]], label [[PRED_CALL_IF1:%.*]], label [[PRED_CALL_CONTINUE2]]
				; TFFALLBACK: pred.call.if1:
				; TFFALLBACK-NEXT: [[TMP7:%.*]] = extractelement <2 x i64> [[WIDE_MASKED_LOAD]], i32 1
				; TFFALLBACK-NEXT: [[TMP8:%.*]] = call i64 @foo(i64 [[TMP7]]) #[[ATTR7]]
				; TFFALLBACK-NEXT: [[TMP9:%.*]] = insertelement <2 x i64> [[TMP5]], i64 [[TMP8]], i32 1
				; TFFALLBACK-NEXT: br label [[PRED_CALL_CONTINUE2]]
				; TFFALLBACK: pred.call.continue2:
				; TFFALLBACK-NEXT: [[TMP10:%.*]] = phi <2 x i64> [ [[TMP5]], [[PRED_CALL_CONTINUE]] ], [ [[TMP9]], [[PRED_CALL_IF1]] ]
				; TFFALLBACK-NEXT: [[TMP11:%.]] = getelementptr inbounds i64, ptr [[A:%.]], i64 [[INDEX]]
				; TFFALLBACK-NEXT: call void @llvm.masked.store.v2i64.p0(<2 x i64> [[TMP10]], ptr [[TMP11]], i32 4, <2 x i1> [[ACTIVE_LANE_MASK]])
				; TFFALLBACK-NEXT: [[INDEX_NEXT]] = add i64 [[INDEX]], 2
				; TFFALLBACK-NEXT: [[ACTIVE_LANE_MASK_NEXT]] = call <2 x i1> @llvm.get.active.lane.mask.v2i1.i64(i64 [[INDEX_NEXT]], i64 1024)
				; TFFALLBACK-NEXT: [[TMP12:%.*]] = xor <2 x i1> [[ACTIVE_LANE_MASK_NEXT]], <i1 true, i1 true>
				; TFFALLBACK-NEXT: [[TMP13:%.*]] = extractelement <2 x i1> [[TMP12]], i32 0
				; TFFALLBACK-NEXT: br i1 [[TMP13]], label [[MIDDLE_BLOCK:%.*]], label [[VECTOR_BODY]], !llvm.loop [[LOOP6:![0-9]+]]
	; TFFALLBACK: middle.block:			; TFFALLBACK: middle.block:
	; TFFALLBACK-NEXT: [[CMP_N:%.*]] = icmp eq i64 1024, [[N_VEC]]			; TFFALLBACK-NEXT: br i1 true, label [[FOR_COND_CLEANUP:%.*]], label [[SCALAR_PH]]
	; TFFALLBACK-NEXT: br i1 [[CMP_N]], label [[FOR_COND_CLEANUP:%.*]], label [[SCALAR_PH]]
	; TFFALLBACK: scalar.ph:			; TFFALLBACK: scalar.ph:
	; TFFALLBACK-NEXT: [[BC_RESUME_VAL:%.]] = phi i64 [ [[N_VEC]], [[MIDDLE_BLOCK]] ], [ 0, [[ENTRY:%.]] ]			; TFFALLBACK-NEXT: [[BC_RESUME_VAL:%.]] = phi i64 [ 1024, [[MIDDLE_BLOCK]] ], [ 0, [[ENTRY:%.]] ]
	; TFFALLBACK-NEXT: br label [[FOR_BODY:%.*]]			; TFFALLBACK-NEXT: br label [[FOR_BODY:%.*]]
	; TFFALLBACK: for.body:			; TFFALLBACK: for.body:
	; TFFALLBACK-NEXT: [[INDVARS_IV:%.]] = phi i64 [ [[BC_RESUME_VAL]], [[SCALAR_PH]] ], [ [[INDVARS_IV_NEXT:%.]], [[FOR_BODY]] ]			; TFFALLBACK-NEXT: [[INDVARS_IV:%.]] = phi i64 [ [[BC_RESUME_VAL]], [[SCALAR_PH]] ], [ [[INDVARS_IV_NEXT:%.]], [[FOR_BODY]] ]
	; TFFALLBACK-NEXT: [[GEP:%.*]] = getelementptr i64, ptr [[B]], i64 [[INDVARS_IV]]			; TFFALLBACK-NEXT: [[GEP:%.*]] = getelementptr i64, ptr [[B]], i64 [[INDVARS_IV]]
	; TFFALLBACK-NEXT: [[LOAD:%.*]] = load i64, ptr [[GEP]], align 4			; TFFALLBACK-NEXT: [[LOAD:%.*]] = load i64, ptr [[GEP]], align 4
	; TFFALLBACK-NEXT: [[CALL:%.*]] = call i64 @foo(i64 [[LOAD]]) #[[ATTR5:[0-9]+]]			; TFFALLBACK-NEXT: [[CALL:%.*]] = call i64 @foo(i64 [[LOAD]]) #[[ATTR7]]
	; TFFALLBACK-NEXT: [[ARRAYIDX:%.*]] = getelementptr inbounds i64, ptr [[A]], i64 [[INDVARS_IV]]			; TFFALLBACK-NEXT: [[ARRAYIDX:%.*]] = getelementptr inbounds i64, ptr [[A]], i64 [[INDVARS_IV]]
	; TFFALLBACK-NEXT: store i64 [[CALL]], ptr [[ARRAYIDX]], align 4			; TFFALLBACK-NEXT: store i64 [[CALL]], ptr [[ARRAYIDX]], align 4
	; TFFALLBACK-NEXT: [[INDVARS_IV_NEXT]] = add nuw nsw i64 [[INDVARS_IV]], 1			; TFFALLBACK-NEXT: [[INDVARS_IV_NEXT]] = add nuw nsw i64 [[INDVARS_IV]], 1
	; TFFALLBACK-NEXT: [[EXITCOND:%.*]] = icmp eq i64 [[INDVARS_IV_NEXT]], 1024			; TFFALLBACK-NEXT: [[EXITCOND:%.*]] = icmp eq i64 [[INDVARS_IV_NEXT]], 1024
	; TFFALLBACK-NEXT: br i1 [[EXITCOND]], label [[FOR_COND_CLEANUP]], label [[FOR_BODY]], !llvm.loop [[LOOP7:![0-9]+]]			; TFFALLBACK-NEXT: br i1 [[EXITCOND]], label [[FOR_COND_CLEANUP]], label [[FOR_BODY]], !llvm.loop [[LOOP7:![0-9]+]]
	; TFFALLBACK: for.cond.cleanup:			; TFFALLBACK: for.cond.cleanup:
	; TFFALLBACK-NEXT: ret void			; TFFALLBACK-NEXT: ret void
	;			;
	Show All 30 Lines

llvm/test/Transforms/LoopVectorize/scalarize-masked-call.ll

	; NOTE: Assertions have been autogenerated by utils/update_test_checks.py			; NOTE: Assertions have been autogenerated by utils/update_test_checks.py
	; RUN: opt < %s -passes=loop-vectorize,instsimplify -force-vector-interleave=1 -force-vector-width=2 -S \| FileCheck %s			; RUN: opt < %s -passes=loop-vectorize,instsimplify -force-vector-interleave=1 -force-vector-width=2 -S 2>&1 \| FileCheck %s

	target datalayout = "e-m:e-i64:64-f80:128-n8:16:32:64-S128"			target datalayout = "e-m:e-i64:64-f80:128-n8:16:32:64-S128"

	define void @cond_call(ptr readonly %src, ptr noalias %dest, i64 %N) {			define void @cond_call(ptr readonly %src, ptr noalias %dest, i64 %N) {
	; CHECK-LABEL: @cond_call(			; CHECK-LABEL: @cond_call(
	; CHECK-NEXT: entry:			; CHECK-NEXT: entry:
				; CHECK-NEXT: [[MIN_ITERS_CHECK:%.]] = icmp ult i64 [[N:%.]], 2
				; CHECK-NEXT: br i1 [[MIN_ITERS_CHECK]], label [[SCALAR_PH:%.]], label [[VECTOR_PH:%.]]
				; CHECK: vector.ph:
				; CHECK-NEXT: [[N_MOD_VF:%.*]] = urem i64 [[N]], 2
				; CHECK-NEXT: [[N_VEC:%.*]] = sub i64 [[N]], [[N_MOD_VF]]
				; CHECK-NEXT: br label [[VECTOR_BODY:%.*]]
				; CHECK: vector.body:
				; CHECK-NEXT: [[INDEX:%.]] = phi i64 [ 0, [[VECTOR_PH]] ], [ [[INDEX_NEXT:%.]], [[PRED_CALL_CONTINUE2:%.*]] ]
				; CHECK-NEXT: [[TMP0:%.]] = getelementptr inbounds i64, ptr [[SRC:%.]], i64 [[INDEX]]
				; CHECK-NEXT: [[WIDE_LOAD:%.*]] = load <2 x i64>, ptr [[TMP0]], align 8
				; CHECK-NEXT: [[TMP1:%.*]] = icmp ult <2 x i64> [[WIDE_LOAD]], <i64 5, i64 5>
				; CHECK-NEXT: [[TMP2:%.*]] = extractelement <2 x i1> [[TMP1]], i32 0
				; CHECK-NEXT: br i1 [[TMP2]], label [[PRED_CALL_IF:%.]], label [[PRED_CALL_CONTINUE:%.]]
				; CHECK: pred.call.if:
				; CHECK-NEXT: [[TMP3:%.*]] = extractelement <2 x i64> [[WIDE_LOAD]], i32 0
				; CHECK-NEXT: [[TMP4:%.*]] = call i64 @foo(i64 [[TMP3]]) #[[ATTR0:[0-9]+]]
				; CHECK-NEXT: [[TMP5:%.*]] = insertelement <2 x i64> poison, i64 [[TMP4]], i32 0
				; CHECK-NEXT: br label [[PRED_CALL_CONTINUE]]
				; CHECK: pred.call.continue:
				; CHECK-NEXT: [[TMP6:%.*]] = phi <2 x i64> [ poison, [[VECTOR_BODY]] ], [ [[TMP5]], [[PRED_CALL_IF]] ]
				; CHECK-NEXT: [[TMP7:%.*]] = extractelement <2 x i1> [[TMP1]], i32 1
				; CHECK-NEXT: br i1 [[TMP7]], label [[PRED_CALL_IF1:%.*]], label [[PRED_CALL_CONTINUE2]]
				; CHECK: pred.call.if1:
				; CHECK-NEXT: [[TMP8:%.*]] = extractelement <2 x i64> [[WIDE_LOAD]], i32 1
				; CHECK-NEXT: [[TMP9:%.*]] = call i64 @foo(i64 [[TMP8]]) #[[ATTR0]]
				; CHECK-NEXT: [[TMP10:%.*]] = insertelement <2 x i64> [[TMP6]], i64 [[TMP9]], i32 1
				; CHECK-NEXT: br label [[PRED_CALL_CONTINUE2]]
				; CHECK: pred.call.continue2:
				; CHECK-NEXT: [[TMP11:%.*]] = phi <2 x i64> [ [[TMP6]], [[PRED_CALL_CONTINUE]] ], [ [[TMP10]], [[PRED_CALL_IF1]] ]
				; CHECK-NEXT: [[PREDPHI:%.*]] = select <2 x i1> [[TMP1]], <2 x i64> [[TMP11]], <2 x i64> [[WIDE_LOAD]]
				; CHECK-NEXT: [[TMP12:%.]] = getelementptr inbounds i64, ptr [[DEST:%.]], i64 [[INDEX]]
				; CHECK-NEXT: store <2 x i64> [[PREDPHI]], ptr [[TMP12]], align 8
				; CHECK-NEXT: [[INDEX_NEXT]] = add nuw i64 [[INDEX]], 2
				; CHECK-NEXT: [[TMP13:%.*]] = icmp eq i64 [[INDEX_NEXT]], [[N_VEC]]
				; CHECK-NEXT: br i1 [[TMP13]], label [[MIDDLE_BLOCK:%.*]], label [[VECTOR_BODY]], !llvm.loop [[LOOP0:![0-9]+]]
				; CHECK: middle.block:
				; CHECK-NEXT: [[CMP_N:%.*]] = icmp eq i64 [[N]], [[N_VEC]]
				; CHECK-NEXT: br i1 [[CMP_N]], label [[END:%.*]], label [[SCALAR_PH]]
				; CHECK: scalar.ph:
				; CHECK-NEXT: [[BC_RESUME_VAL:%.]] = phi i64 [ [[N_VEC]], [[MIDDLE_BLOCK]] ], [ 0, [[ENTRY:%.]] ]
	; CHECK-NEXT: br label [[FOR_BODY:%.*]]			; CHECK-NEXT: br label [[FOR_BODY:%.*]]
	; CHECK: for.body:			; CHECK: for.body:
	; CHECK-NEXT: [[IV:%.]] = phi i64 [ 0, [[ENTRY:%.]] ], [ [[IV_NEXT:%.]], [[FOR_LOOP:%.]] ]			; CHECK-NEXT: [[IV:%.]] = phi i64 [ [[BC_RESUME_VAL]], [[SCALAR_PH]] ], [ [[IV_NEXT:%.]], [[FOR_LOOP:%.*]] ]
	; CHECK-NEXT: [[LD_ADDR:%.]] = getelementptr inbounds i64, ptr [[SRC:%.]], i64 [[IV]]			; CHECK-NEXT: [[LD_ADDR:%.*]] = getelementptr inbounds i64, ptr [[SRC]], i64 [[IV]]
	; CHECK-NEXT: [[LD_VALUE:%.*]] = load i64, ptr [[LD_ADDR]], align 8			; CHECK-NEXT: [[LD_VALUE:%.*]] = load i64, ptr [[LD_ADDR]], align 8
	; CHECK-NEXT: [[IFCOND:%.*]] = icmp ult i64 [[LD_VALUE]], 5			; CHECK-NEXT: [[IFCOND:%.*]] = icmp ult i64 [[LD_VALUE]], 5
	; CHECK-NEXT: br i1 [[IFCOND]], label [[IF_THEN:%.*]], label [[FOR_LOOP]]			; CHECK-NEXT: br i1 [[IFCOND]], label [[IF_THEN:%.*]], label [[FOR_LOOP]]
	; CHECK: if.then:			; CHECK: if.then:
	; CHECK-NEXT: [[FOO_RET:%.*]] = call i64 @foo(i64 [[LD_VALUE]])			; CHECK-NEXT: [[FOO_RET:%.*]] = call i64 @foo(i64 [[LD_VALUE]]) #[[ATTR0]]
	; CHECK-NEXT: br label [[FOR_LOOP]]			; CHECK-NEXT: br label [[FOR_LOOP]]
	; CHECK: for.loop:			; CHECK: for.loop:
	; CHECK-NEXT: [[ST_VALUE:%.*]] = phi i64 [ [[LD_VALUE]], [[FOR_BODY]] ], [ [[FOO_RET]], [[IF_THEN]] ]			; CHECK-NEXT: [[ST_VALUE:%.*]] = phi i64 [ [[LD_VALUE]], [[FOR_BODY]] ], [ [[FOO_RET]], [[IF_THEN]] ]
	; CHECK-NEXT: [[ST_ADDR:%.]] = getelementptr inbounds i64, ptr [[DEST:%.]], i64 [[IV]]			; CHECK-NEXT: [[ST_ADDR:%.*]] = getelementptr inbounds i64, ptr [[DEST]], i64 [[IV]]
	; CHECK-NEXT: store i64 [[ST_VALUE]], ptr [[ST_ADDR]], align 8			; CHECK-NEXT: store i64 [[ST_VALUE]], ptr [[ST_ADDR]], align 8
	; CHECK-NEXT: [[IV_NEXT]] = add nuw nsw i64 [[IV]], 1			; CHECK-NEXT: [[IV_NEXT]] = add nuw nsw i64 [[IV]], 1
	; CHECK-NEXT: [[LOOPCOND:%.]] = icmp eq i64 [[IV_NEXT]], [[N:%.]]			; CHECK-NEXT: [[LOOPCOND:%.*]] = icmp eq i64 [[IV_NEXT]], [[N]]
	; CHECK-NEXT: br i1 [[LOOPCOND]], label [[END:%.*]], label [[FOR_BODY]]			; CHECK-NEXT: br i1 [[LOOPCOND]], label [[END]], label [[FOR_BODY]], !llvm.loop [[LOOP3:![0-9]+]]
	; CHECK: end:			; CHECK: end:
	; CHECK-NEXT: ret void			; CHECK-NEXT: ret void
	;			;
	entry:			entry:
	br label %for.body			br label %for.body

	for.body:			for.body:
	%iv = phi i64 [ 0, %entry ], [ %iv.next, %for.loop ]			%iv = phi i64 [ 0, %entry ], [ %iv.next, %for.loop ]
	%ld.addr = getelementptr inbounds i64, ptr %src, i64 %iv			%ld.addr = getelementptr inbounds i64, ptr %src, i64 %iv
	%ld.value = load i64, ptr %ld.addr, align 8			%ld.value = load i64, ptr %ld.addr, align 8
	%ifcond = icmp ult i64 %ld.value, 5			%ifcond = icmp ult i64 %ld.value, 5
	br i1 %ifcond, label %if.then, label %for.loop			br i1 %ifcond, label %if.then, label %for.loop

	if.then:			if.then:
	%foo.ret = call i64 @foo(i64 %ld.value)			%foo.ret = call i64 @foo(i64 %ld.value) #0
	br label %for.loop			br label %for.loop

	for.loop:			for.loop:
	%st.value = phi i64 [ %ld.value, %for.body ], [ %foo.ret, %if.then ]			%st.value = phi i64 [ %ld.value, %for.body ], [ %foo.ret, %if.then ]
	%st.addr = getelementptr inbounds i64, ptr %dest, i64 %iv			%st.addr = getelementptr inbounds i64, ptr %dest, i64 %iv
	store i64 %st.value, ptr %st.addr, align 8			store i64 %st.value, ptr %st.addr, align 8
	%iv.next = add nsw nuw i64 %iv, 1			%iv.next = add nsw nuw i64 %iv, 1
	%loopcond = icmp eq i64 %iv.next, %N			%loopcond = icmp eq i64 %iv.next, %N
	br i1 %loopcond, label %end, label %for.body			br i1 %loopcond, label %end, label %for.body

	end:			end:
	ret void			ret void
	}			}

	declare i64 @foo(i64)			declare i64 @foo(i64) #0
	declare <4 x i64> @vector_foo(<4 x i64>)			declare <4 x i64> @vector_foo(<4 x i64>, <4 x i1>)

	; We need a vector variant in order to allow for vectorization at present, but			; We need a vector variant in order to allow for vectorization at present, but
	; we want to test scalarization of conditional calls. If we provide a variant			; we want to test scalarization of conditional calls. If we provide a variant
	; with a different number of lanes than the VF we force via			; with a different number of lanes than the VF we force via
	; "-force-vector-width=2", then it should pass the legality checks but			; "-force-vector-width=2", then it should pass the legality checks but
	; scalarize. TODO: Remove the requirement to have a variant.			; scalarize. TODO: Remove the requirement to have a variant.
	attributes #0 = { readonly nounwind "vector-function-abi-variant"="_ZGV_LLVM_M4v_foo(vector_foo)" }			attributes #0 = { readonly nounwind "vector-function-abi-variant"="_ZGV_LLVM_M4v_foo(vector_foo)" }