This is an archive of the discontinued LLVM Phabricator instance.

Paths

Table of Contentst

-
llvm/
-
lib/Transforms/Vectorize/
-
Transforms/
-
Vectorize/
1/3
LoopVectorize.cpp
-
VPRecipeBuilder.h
-
test/Transforms/LoopVectorize/
-
Transforms/
-
LoopVectorize/
-
AArch64/
-
scalable-predicate-instruction.ll
1/3
sve-tail-folding.ll
-
RISCV/
-
scalable-divrem.ll

Differential D130164

[LV] Support predicated div/rem operations via safe-divisor select idiom
ClosedPublic

Authored by reames on Jul 20 2022, 6:10 AM.

Download Raw Diff

Details

Reviewers

david-arm
fhahn
Ayal
gilr

Commits

rGf79214d1e1fd: [LV] Support predicated div/rem operations via safe-divisor select idiom

Summary

This patch adds support for vectorizing conditionally executed div/rem operations via a variant of widening. The existing support for predicated divrem in the vectorizer requires scalarization which we can't do for scalable vectors.

The basic idea is that we can always divide (take remainder) by 1 without executing UB. As such, we can use the active lane mask to conditional select either the actual divisor for active lanes, or a constant one for inactive lanes. We already account for the cost of the active lane mask, so the only additional cost is a splat of one and the vector select.

There are some potential further improvements code wise, but I don't think any of them are worth bothering with. The difference between vectorizing and not vectorizing is huge, the remaining minor wins are well - minor. Just for the record, here's the things I currently know of:

For the block predication, we're generating a select after the div/rem. This is arguably redundant with the select before the div/rem, and we could try to fold the later one out.
For targets with masked vector div/rem operations, we could make sure to pattern match the select. (Div/Rem by 1 is the same as using the undisturbed LHS.)

This is an area of the code I'm fairly new to, and I'm not quite clear on the desired design re: VPlan. Given that, I'm going to take extra care to layout the major design decisions so that reviews can tell me I did this wrong and how to rework it. :)

Strategy/Lowering wise, we've got a couple choices:

This approach which uses the active lane mask with the select.
Forming an independent safety mask by explicitly checking for UB triggering edge cases. This works, but seems to generate strictly worse code for the generic case. (This is actually where I started.)
Lowering to VP intrinsic. I'd prefer not to couple this to the VP work, and the select lowering is profitable on its own. Even for targets which don't have VL or mask predication on the udiv itself. Given that, we'd probably want a cost model based mechanism to expand the VP op back out anyways, and I'd prefer to think about that whole problem separately.
Lowering to a loop with internal predication. We could replace replication with a recipe which generates a sub-loop. The sub-loop would extract each lane, optional do the divide, and insert it back. Doing this would avoid potential concerns about speculation profitability, but appears to be a more invasive change to the vectorizer. I'm unconvinced this is worthwhile for this case.
We could add a udiv variant which does not fault. Most (all?) real vector hardware does not fault, so we could have a variant which directly represented this. This is a tricky design space, and is a larger topic than I really want to get into now.

Code structure wise, I see three major options:

Use a new recipe specific to div/rem.
Extend the existing VPWidenRecipe to handle the safe-divide guard if a mask is provided. DivRem is the only opcode which has this safety guard, so while doing this was straight forward, it seemed oddly coupled.
Use a new recipe for arbitrary predicated neutral element guards. This could be done for e.g. all binary ops, and is arguably a useful building block towards the VP patches. This feels a bit speculative to me, and possibly falsely generic.

I really don't have any preference as to the recipe structure, and will defer to reviewers with more context on the overall VPlan design and direction. Let me know what you want, and I'll adjust.

I enabled the new code only for scalable vectors. We could also legally enable it for fixed vectors as well, but I haven't thought through the cost tradeoffs between widening and scalarization enough to know if that's profitable. I'd welcome input on how this should be structured, but am also hoping to defer actually implementing it to a follow up patch. :)

Diff Detail

Unit TestsFailed

	Time	Test
	60,110 ms	x64 debian > AddressSanitizer-x86_64-linux.TestCases::scariness_score_test.cpp

Event Timeline

reames created this revision.Jul 20 2022, 6:10 AM

Herald added a project: Restricted Project. · View Herald TranscriptJul 20 2022, 6:10 AM

Herald added subscribers: frasercrmck, luismarques, apazos and 21 others. · View Herald Transcript

reames requested review of this revision.Jul 20 2022, 6:10 AM

Herald added a project: Restricted Project. · View Herald TranscriptJul 20 2022, 6:10 AM

Herald added subscribers: alextsao1999, • pcwang-thead, vkmr, MaskRay. · View Herald Transcript

Harbormaster completed remote builds in B176489: Diff 446129.Jul 20 2022, 6:46 AM

reames edited the summary of this revision. (Show Details)Jul 20 2022, 8:22 AM

Div/Rem by 1 is the same as using the undisturbed LHS.

That's true for Div, but Rem by 1 would produce 0.

What does SelectionDAG do with a scalable vector division if it isn't legal for the target?

In D130164#3666181, @craig.topper wrote:

What does SelectionDAG do with a scalable vector division if it isn't legal for the target?

Not sure, but this patch shouldn't result in that being created. Such a target is responsible for setting a reasonable cost. If that cost is invalid, existing logic should bailout of vectorization for scalable vector factors.

In D130164#3666157, @craig.topper wrote:

Div/Rem by 1 is the same as using the undisturbed LHS.

That's true for Div, but Rem by 1 would produce 0.

True, didn't think carefully enough. If we end up trying to implement the undisturbed case, we can explore using an alternate value for the safe-divisor for rem. Not thinking about it real hard, int_max/uint_max might be reasonable.

On targets with a vector "div" instruction, the instruction never actually traps or otherwise misbehaves; the only reason we have an issue here is that the IR "sdiv" is defined to be instant UB. Maybe we could consider introducing a new IR operation to model this? Maybe it doesn't really matter, though; division is expensive enough that the cost of an extra "select" is probably insignificant.

Not thinking about it real hard, int_max/uint_max might be reasonable.

That doesn't work for srem; the absolute value of int_min is larger than int_max.

In D130164#3666301, @efriedma wrote:

On targets with a vector "div" instruction, the instruction never actually traps or otherwise misbehaves; the only reason we have an issue here is that the IR "sdiv" is defined to be instant UB. Maybe we could consider introducing a new IR operation to model this? Maybe it doesn't really matter, though; division is expensive enough that the cost of an extra "select" is probably insignificant.

I think we run the very real danger of letting perfection be the enemy of the good here. Once we can vectorize this at all, we can come back and explore better lowerings.

Sure, I didn't mean we should block this patch until we come up with the ideal solution. Just thought it was worth bringing up since you didn't mention it as a strategy.

reames edited the summary of this revision. (Show Details)Jul 20 2022, 11:50 AM

In D130164#3666366, @efriedma wrote:

Sure, I didn't mean we should block this patch until we come up with the ideal solution. Just thought it was worth bringing up since you didn't mention it as a strategy.

Fair point, added it to the strategy list above.

A random thoughts on this, just for possible future reference: "does not fault" does not always mean "is always fast". I'm not sure of the details here, but I've run into cases like this in the fast where the non-faulting behavior was so poor performance wise you had to avoid it anyways. This would be something we'd need to check for when specifying the udiv variant.

Matt added a subscriber: Matt.Jul 27 2022, 1:40 PM

fhahn added inline comments.Jul 29 2022, 7:46 AM

llvm/lib/Transforms/Vectorize/VPlan.h
948 ↗	(On Diff #446129)	Is a new recipe needed here? Would it be possible to generate a `VPWidenSelectRecipe`/ `VPInstruction` with an `Instruction::Select` opcode feeding the `VPWidenRecipe` for the `div/rem` instead?

Rework using VPInstruction as suggested by @fhahn.

I'm a bit unsure of the wiring for the VPInstruction. It appears to work, but I'm basically just copying code I don't understand. Careful review appreciated.

Harbormaster completed remote builds in B178322: Diff 448681.Jul 29 2022, 11:39 AM

ping

xbolva00 added a subscriber: xbolva00.Aug 15 2022, 9:35 AM

xbolva00 added inline comments.

llvm/test/Transforms/LoopVectorize/AArch64/sve-tail-folding.ll
700	Invalid?

reames added inline comments.Aug 15 2022, 9:42 AM

llvm/test/Transforms/LoopVectorize/AArch64/sve-tail-folding.ll
700	The comment is now stale and needs removed. Is that what you meant by "Invalid"?

xbolva00 added inline comments.Aug 15 2022, 9:57 AM

llvm/test/Transforms/LoopVectorize/AArch64/sve-tail-folding.ll
700	Yup, please update it

Address review comment and rebase

Harbormaster completed remote builds in B181338: Diff 452750.Aug 15 2022, 12:41 PM

fhahn added inline comments.Aug 16 2022, 1:07 AM

llvm/lib/Transforms/Vectorize/LoopVectorize.cpp
8343	Do you know if there's an easy way to assert this here? Seems like something that could be easily missed.

reames added inline comments.Aug 16 2022, 6:57 AM

llvm/lib/Transforms/Vectorize/LoopVectorize.cpp
8343	I don't, and this a pretty prevalent assumption in the code already. e.g. all predicated store/load handling does the same. If anything, I might lean towards removing the comment as it may falsely give the impression the assumption is unique to this code.

LGTM. thanks! I am also adding Ayal and Gil, in case they have additional comments, so it would be good to wait a day with committing so they have a chance to chime in.

llvm/lib/Transforms/Vectorize/LoopVectorize.cpp
8343	Fair enough!

This revision is now accepted and ready to land.Aug 23 2022, 12:43 PM

This revision was landed with ongoing or failed builds.Aug 24 2022, 10:08 AM

Closed by commit rGf79214d1e1fd: [LV] Support predicated div/rem operations via safe-divisor select idiom (authored by reames). · Explain Why

This revision was automatically updated to reflect the committed changes.

reames added a commit: rGf79214d1e1fd: [LV] Support predicated div/rem operations via safe-divisor select idiom.

Revision Contents

Path

Size

llvm/

lib/

Transforms/

Vectorize/

LoopVectorize.cpp

200 lines

VPRecipeBuilder.h

6 lines

test/

Transforms/

LoopVectorize/

AArch64/

scalable-predicate-instruction.ll

6 lines

sve-tail-folding.ll

54 lines

RISCV/

scalable-divrem.ll

143 lines

Diff 448681

llvm/lib/Transforms/Vectorize/LoopVectorize.cpp

This file is larger than 256 KB, so syntax highlighting is disabled by default.

Show First 20 Lines • Show All 4,455 Lines • ▼ Show 20 Lines	return isa<LoadInst>(I) ? !(isLegalMaskedLoad(Ty, Ptr, Alignment) \|\|
TTI.isLegalMaskedScatter(VTy, Alignment));		TTI.isLegalMaskedScatter(VTy, Alignment));
}		}
case Instruction::UDiv:		case Instruction::UDiv:
case Instruction::SDiv:		case Instruction::SDiv:
case Instruction::SRem:		case Instruction::SRem:
case Instruction::URem:		case Instruction::URem:
// TODO: We can use the loop-preheader as context point here and get		// TODO: We can use the loop-preheader as context point here and get
// context sensitive reasoning		// context sensitive reasoning
return !isSafeToSpeculativelyExecute(I);		// Note: Scalable can't predicate and thus must go through the widening
		// strategy. Long term, we want fixed to make a cost based decision
		// between widening and scalarization, but for now, fixed is left
		// unconditionally using the scalar path.
		return !VF.isScalable() && !isSafeToSpeculativelyExecute(I);
}		}
return false;		return false;
}		}

bool LoopVectorizationCostModel::interleavedAccessCanBeWidened(		bool LoopVectorizationCostModel::interleavedAccessCanBeWidened(
Instruction *I, ElementCount VF) {		Instruction *I, ElementCount VF) {
assert(isAccessInterleaved(I) && "Expecting interleaved access.");		assert(isAccessInterleaved(I) && "Expecting interleaved access.");
assert(getWideningDecision(I, VF) == CM_Unknown &&		assert(getWideningDecision(I, VF) == CM_Unknown &&
▲ Show 20 Lines • Show All 2,548 Lines • ▼ Show 20 Lines	if (VF.isVector() && Phi->getParent() != TheLoop->getHeader())
CmpInst::BAD_ICMP_PREDICATE, CostKind);		CmpInst::BAD_ICMP_PREDICATE, CostKind);

return TTI.getCFInstrCost(Instruction::PHI, CostKind);		return TTI.getCFInstrCost(Instruction::PHI, CostKind);
}		}
case Instruction::UDiv:		case Instruction::UDiv:
case Instruction::SDiv:		case Instruction::SDiv:
case Instruction::URem:		case Instruction::URem:
case Instruction::SRem:		case Instruction::SRem:
// If we have a predicated instruction, it may not be executed for each		if (VF.isVector() && blockNeedsPredicationForAnyReason(I->getParent()) &&
// vector lane. Get the scalarization cost and scale this amount by the		!isSafeToSpeculativelyExecute(I)) {
// probability of executing the predicated block. If the instruction is not		// If we're speculating lanes, we have two options - scalarization and
// predicated, we fall through to the next case.		// guarded widening.
if (VF.isVector() && isScalarWithPredication(I, VF)) {		if (isScalarWithPredication(I, VF)) {
		// Get the scalarization cost and scale this amount by the probability of
		// executing the predicated block. If the instruction is not predicated,
		// we fall through to the next case.
InstructionCost Cost = 0;		InstructionCost Cost = 0;

// These instructions have a non-void type, so account for the phi nodes		// These instructions have a non-void type, so account for the phi nodes
// that we will create. This cost is likely to be zero. The phi node		// that we will create. This cost is likely to be zero. The phi node
// cost, if any, should be scaled by the block probability because it		// cost, if any, should be scaled by the block probability because it
// models a copy at the end of each predicated block.		// models a copy at the end of each predicated block.
Cost += VF.getKnownMinValue() *		Cost += VF.getKnownMinValue() *
TTI.getCFInstrCost(Instruction::PHI, CostKind);		TTI.getCFInstrCost(Instruction::PHI, CostKind);

// The cost of the non-predicated instruction.		// The cost of the non-predicated instruction.
Cost += VF.getKnownMinValue() *		Cost += VF.getKnownMinValue() *
TTI.getArithmeticInstrCost(I->getOpcode(), RetTy, CostKind);		TTI.getArithmeticInstrCost(I->getOpcode(), RetTy, CostKind);

// The cost of insertelement and extractelement instructions needed for		// The cost of insertelement and extractelement instructions needed for
// scalarization.		// scalarization.
Cost += getScalarizationOverhead(I, VF);		Cost += getScalarizationOverhead(I, VF);

// Scale the cost by the probability of executing the predicated blocks.		// Scale the cost by the probability of executing the predicated blocks.
// This assumes the predicated block for each vector lane is equally		// This assumes the predicated block for each vector lane is equally
// likely.		// likely.
return Cost / getReciprocalPredBlockProb();		return Cost / getReciprocalPredBlockProb();
}		}

		InstructionCost Cost = 0;

		// The cost of the select guard to ensure all lanes are well defined
		// after we speculate above any internal control flow.
		Cost += TTI.getCmpSelInstrCost(
		Instruction::Select, ToVectorTy(I->getType(), VF),
		ToVectorTy(Type::getInt1Ty(I->getContext()), VF),
		CmpInst::BAD_ICMP_PREDICATE, CostKind);

		// Certain instructions can be cheaper to vectorize if they have a constant
		// second vector operand. One example of this are shifts on x86.
		Value *Op2 = I->getOperand(1);
		TargetTransformInfo::OperandValueProperties Op2VP;
		TargetTransformInfo::OperandValueKind Op2VK =
		TTI.getOperandInfo(Op2, Op2VP);
		if (Op2VK == TargetTransformInfo::OK_AnyValue && Legal->isUniform(Op2))
		Op2VK = TargetTransformInfo::OK_UniformValue;

		SmallVector<const Value *, 4> Operands(I->operand_values());
		Cost += TTI.getArithmeticInstrCost(
		I->getOpcode(), VectorTy, CostKind, TargetTransformInfo::OK_AnyValue,
		Op2VK, TargetTransformInfo::OP_None, Op2VP, Operands, I);

		return Cost;
		}
		// We've proven all lanes safe to speculate, fall through.
LLVM_FALLTHROUGH;		LLVM_FALLTHROUGH;
case Instruction::Add:		case Instruction::Add:
case Instruction::FAdd:		case Instruction::FAdd:
case Instruction::Sub:		case Instruction::Sub:
case Instruction::FSub:		case Instruction::FSub:
case Instruction::Mul:		case Instruction::Mul:
case Instruction::FMul:		case Instruction::FMul:
case Instruction::FDiv:		case Instruction::FDiv:
▲ Show 20 Lines • Show All 1,223 Lines • ▼ Show 20 Lines	auto WillScalarize = [this, I](ElementCount VF) -> bool {
return CM.isScalarAfterVectorization(I, VF) \|\|		return CM.isScalarAfterVectorization(I, VF) \|\|
CM.isProfitableToScalarize(I, VF) \|\|		CM.isProfitableToScalarize(I, VF) \|\|
CM.isScalarWithPredication(I, VF);		CM.isScalarWithPredication(I, VF);
};		};
return !LoopVectorizationPlanner::getDecisionAndClampRange(WillScalarize,		return !LoopVectorizationPlanner::getDecisionAndClampRange(WillScalarize,
Range);		Range);
}		}

VPWidenRecipe VPRecipeBuilder::tryToWiden(Instruction I,		VPRecipeBase VPRecipeBuilder::tryToWiden(Instruction I,
ArrayRef<VPValue *> Operands) const {		ArrayRef<VPValue *> Operands,
auto IsVectorizableOpcode = [](unsigned Opcode) {		VPBasicBlock *VPBB, VPlanPtr &Plan) {
switch (Opcode) {		switch (I->getOpcode()) {
		default:
		return nullptr;
		case Instruction::SDiv:
		case Instruction::UDiv:
		case Instruction::SRem:
		case Instruction::URem: {
		// If not provably safe, use a select to form a safe divisor before widening the
		// div/rem operation itself. Otherwise fall through to general handling below.
		// NOTE: There's a subtle assumption here that we have no exceptional exits within
		// a block, otherwise we'd need to prove speculation safety without explicit
		// block predication. If that assumption is ever invalidated, this code needs
		fhahnUnsubmitted Not Done Reply Inline Actions Do you know if there's an easy way to assert this here? Seems like something that could be easily missed. fhahn: Do you know if there's an easy way to assert this here? Seems like something that could be…
		reamesAuthorUnsubmitted Done Reply Inline Actions I don't, and this a pretty prevalent assumption in the code already. e.g. all predicated store/load handling does the same. If anything, I might lean towards removing the comment as it may falsely give the impression the assumption is unique to this code. reames: I don't, and this a pretty prevalent assumption in the code already. e.g. all predicated…
		fhahnUnsubmitted Not Done Reply Inline Actions Fair enough! fhahn: Fair enough!
		// updated.
		if (CM.blockNeedsPredicationForAnyReason(I->getParent()) &&
		!isSafeToSpeculativelyExecute(I)) {
		SmallVector<VPValue *> Ops(Operands.begin(), Operands.end());
		VPValue *Mask = createBlockInMask(I->getParent(), Plan);
		VPValue *One =
		Plan->getOrAddExternalDef(ConstantInt::get(I->getType(), 1u, false));
		auto *SafeRHS =
		new VPInstruction(Instruction::Select, {Mask, Ops[1], One},
		I->getDebugLoc());
		VPBB->appendRecipe(SafeRHS);
		Ops[1] = SafeRHS;
		return new VPWidenRecipe(*I, make_range(Ops.begin(), Ops.end()));
		}
		LLVM_FALLTHROUGH;
		}
case Instruction::Add:		case Instruction::Add:
case Instruction::And:		case Instruction::And:
case Instruction::AShr:		case Instruction::AShr:
case Instruction::BitCast:		case Instruction::BitCast:
case Instruction::FAdd:		case Instruction::FAdd:
case Instruction::FCmp:		case Instruction::FCmp:
case Instruction::FDiv:		case Instruction::FDiv:
case Instruction::FMul:		case Instruction::FMul:
case Instruction::FNeg:		case Instruction::FNeg:
case Instruction::FPExt:		case Instruction::FPExt:
case Instruction::FPToSI:		case Instruction::FPToSI:
case Instruction::FPToUI:		case Instruction::FPToUI:
case Instruction::FPTrunc:		case Instruction::FPTrunc:
case Instruction::FRem:		case Instruction::FRem:
case Instruction::FSub:		case Instruction::FSub:
case Instruction::ICmp:		case Instruction::ICmp:
case Instruction::IntToPtr:		case Instruction::IntToPtr:
case Instruction::LShr:		case Instruction::LShr:
case Instruction::Mul:		case Instruction::Mul:
case Instruction::Or:		case Instruction::Or:
case Instruction::PtrToInt:		case Instruction::PtrToInt:
case Instruction::SDiv:
case Instruction::Select:		case Instruction::Select:
case Instruction::SExt:		case Instruction::SExt:
case Instruction::Shl:		case Instruction::Shl:
case Instruction::SIToFP:		case Instruction::SIToFP:
case Instruction::SRem:
case Instruction::Sub:		case Instruction::Sub:
case Instruction::Trunc:		case Instruction::Trunc:
case Instruction::UDiv:
case Instruction::UIToFP:		case Instruction::UIToFP:
case Instruction::URem:
case Instruction::Xor:		case Instruction::Xor:
case Instruction::ZExt:		case Instruction::ZExt:
case Instruction::Freeze:		case Instruction::Freeze:
return true;
}
return false;
};

if (!IsVectorizableOpcode(I->getOpcode()))
return nullptr;

// Success: widen this instruction.
return new VPWidenRecipe(*I, make_range(Operands.begin(), Operands.end()));		return new VPWidenRecipe(*I, make_range(Operands.begin(), Operands.end()));
		};
}		}

void VPRecipeBuilder::fixHeaderPhis() {		void VPRecipeBuilder::fixHeaderPhis() {
BasicBlock *OrigLatch = OrigLoop->getLoopLatch();		BasicBlock *OrigLatch = OrigLoop->getLoopLatch();
for (VPHeaderPHIRecipe *R : PhisToFix) {		for (VPHeaderPHIRecipe *R : PhisToFix) {
auto *PN = cast<PHINode>(R->getUnderlyingValue());		auto *PN = cast<PHINode>(R->getUnderlyingValue());
VPRecipeBase *IncR =		VPRecipeBase *IncR =
getRecipe(cast<Instruction>(PN->getIncomingValueForBlock(OrigLatch)));		getRecipe(cast<Instruction>(PN->getIncomingValueForBlock(OrigLatch)));
▲ Show 20 Lines • Show All 118 Lines • ▼ Show 20 Lines	VPRecipeBuilder::createReplicateRegion(VPReplicateRecipe *PredRecipe,
VPBlockUtils::connectBlocks(Pred, Exiting);		VPBlockUtils::connectBlocks(Pred, Exiting);

return Region;		return Region;
}		}

VPRecipeOrVPValueTy		VPRecipeOrVPValueTy
VPRecipeBuilder::tryToCreateWidenRecipe(Instruction *Instr,		VPRecipeBuilder::tryToCreateWidenRecipe(Instruction *Instr,
ArrayRef<VPValue *> Operands,		ArrayRef<VPValue *> Operands,
VFRange &Range, VPlanPtr &Plan) {		VFRange &Range, VPBasicBlock *VPBB,
		VPlanPtr &Plan) {
// First, check for specific widening recipes that deal with inductions, Phi		// First, check for specific widening recipes that deal with inductions, Phi
// nodes, calls and memory operations.		// nodes, calls and memory operations.
VPRecipeBase *Recipe;		VPRecipeBase *Recipe;
if (auto Phi = dyn_cast<PHINode>(Instr)) {		if (auto Phi = dyn_cast<PHINode>(Instr)) {
if (Phi->getParent() != OrigLoop->getHeader())		if (Phi->getParent() != OrigLoop->getHeader())
return tryToBlend(Phi, Operands, Plan);		return tryToBlend(Phi, Operands, Plan);
if ((Recipe = tryToOptimizeInductionPHI(Phi, Operands, *Plan, Range)))		if ((Recipe = tryToOptimizeInductionPHI(Phi, Operands, *Plan, Range)))
return toVPRecipeResult(Recipe);		return toVPRecipeResult(Recipe);
▲ Show 20 Lines • Show All 48 Lines • ▼ Show 20 Lines	VPRecipeBuilder::tryToCreateWidenRecipe(Instruction *Instr,

if (auto *SI = dyn_cast<SelectInst>(Instr)) {		if (auto *SI = dyn_cast<SelectInst>(Instr)) {
bool InvariantCond =		bool InvariantCond =
PSE.getSE()->isLoopInvariant(PSE.getSCEV(SI->getOperand(0)), OrigLoop);		PSE.getSE()->isLoopInvariant(PSE.getSCEV(SI->getOperand(0)), OrigLoop);
return toVPRecipeResult(new VPWidenSelectRecipe(		return toVPRecipeResult(new VPWidenSelectRecipe(
*SI, make_range(Operands.begin(), Operands.end()), InvariantCond));		*SI, make_range(Operands.begin(), Operands.end()), InvariantCond));
}		}

return toVPRecipeResult(tryToWiden(Instr, Operands));		return toVPRecipeResult(tryToWiden(Instr, Operands, VPBB, Plan));
}		}

void LoopVectorizationPlanner::buildVPlansWithVPRecipes(ElementCount MinVF,		void LoopVectorizationPlanner::buildVPlansWithVPRecipes(ElementCount MinVF,
ElementCount MaxVF) {		ElementCount MaxVF) {
assert(OrigLoop->isInnermost() && "Inner loop expected.");		assert(OrigLoop->isInnermost() && "Inner loop expected.");

// Add assume instructions we need to drop to DeadInstructions, to prevent		// Add assume instructions we need to drop to DeadInstructions, to prevent
// them from being added to the VPlan.		// them from being added to the VPlan.
▲ Show 20 Lines • Show All 254 Lines • ▼ Show 20 Lines	for (Instruction &I : BB->instructionsWithoutDebug()) {
// Invariant stores inside loop will be deleted and a single store		// Invariant stores inside loop will be deleted and a single store
// with the final reduction value will be added to the exit block		// with the final reduction value will be added to the exit block
StoreInst *SI;		StoreInst *SI;
if ((SI = dyn_cast<StoreInst>(&I)) &&		if ((SI = dyn_cast<StoreInst>(&I)) &&
Legal->isInvariantAddressOfReduction(SI->getPointerOperand()))		Legal->isInvariantAddressOfReduction(SI->getPointerOperand()))
continue;		continue;

if (auto RecipeOrValue = RecipeBuilder.tryToCreateWidenRecipe(		if (auto RecipeOrValue = RecipeBuilder.tryToCreateWidenRecipe(
Instr, Operands, Range, Plan)) {		Instr, Operands, Range, VPBB, Plan)) {
// If Instr can be simplified to an existing VPValue, use it.		// If Instr can be simplified to an existing VPValue, use it.
if (RecipeOrValue.is<VPValue *>()) {		if (RecipeOrValue.is<VPValue *>()) {
auto VPV = RecipeOrValue.get<VPValue >();		auto VPV = RecipeOrValue.get<VPValue >();
Plan->addVPValue(Instr, VPV);		Plan->addVPValue(Instr, VPV);
// If the re-used value is a recipe, register the recipe for the		// If the re-used value is a recipe, register the recipe for the
// instruction, in case the recipe for Instr needs to be recorded.		// instruction, in case the recipe for Instr needs to be recorded.
if (auto *R = dyn_cast_or_null<VPRecipeBase>(VPV->getDef()))		if (auto *R = dyn_cast_or_null<VPRecipeBase>(VPV->getDef()))
RecipeBuilder.setRecipe(Instr, R);		RecipeBuilder.setRecipe(Instr, R);
▲ Show 20 Lines • Show All 1,741 Lines • Show Last 20 Lines

llvm/lib/Transforms/Vectorize/VPRecipeBuilder.h

Show First 20 Lines • Show All 94 Lines • ▼ Show 20 Lines	class VPRecipeBuilder {
/// return a new VPWidenCallRecipe. Range.End may be decreased to ensure same		/// return a new VPWidenCallRecipe. Range.End may be decreased to ensure same
/// decision from \p Range.Start to \p Range.End.		/// decision from \p Range.Start to \p Range.End.
VPWidenCallRecipe tryToWidenCall(CallInst CI, ArrayRef<VPValue *> Operands,		VPWidenCallRecipe tryToWidenCall(CallInst CI, ArrayRef<VPValue *> Operands,
VFRange &Range) const;		VFRange &Range) const;

/// Check if \p I has an opcode that can be widened and return a VPWidenRecipe		/// Check if \p I has an opcode that can be widened and return a VPWidenRecipe
/// if it can. The function should only be called if the cost-model indicates		/// if it can. The function should only be called if the cost-model indicates
/// that widening should be performed.		/// that widening should be performed.
VPWidenRecipe tryToWiden(Instruction I, ArrayRef<VPValue *> Operands) const;		VPRecipeBase tryToWiden(Instruction I, ArrayRef<VPValue *> Operands,
		VPBasicBlock *VPBB, VPlanPtr &Plan);

/// Return a VPRecipeOrValueTy with VPRecipeBase * being set. This can be used to force the use as VPRecipeBase* for recipe sub-types that also inherit from VPValue.		/// Return a VPRecipeOrValueTy with VPRecipeBase * being set. This can be used to force the use as VPRecipeBase* for recipe sub-types that also inherit from VPValue.
VPRecipeOrVPValueTy toVPRecipeResult(VPRecipeBase *R) const { return R; }		VPRecipeOrVPValueTy toVPRecipeResult(VPRecipeBase *R) const { return R; }

public:		public:
VPRecipeBuilder(Loop OrigLoop, const TargetLibraryInfo TLI,		VPRecipeBuilder(Loop OrigLoop, const TargetLibraryInfo TLI,
LoopVectorizationLegality *Legal,		LoopVectorizationLegality *Legal,
LoopVectorizationCostModel &CM,		LoopVectorizationCostModel &CM,
PredicatedScalarEvolution &PSE, VPBuilder &Builder)		PredicatedScalarEvolution &PSE, VPBuilder &Builder)
: OrigLoop(OrigLoop), TLI(TLI), Legal(Legal), CM(CM), PSE(PSE),		: OrigLoop(OrigLoop), TLI(TLI), Legal(Legal), CM(CM), PSE(PSE),
Builder(Builder) {}		Builder(Builder) {}

/// Check if an existing VPValue can be used for \p Instr or a recipe can be		/// Check if an existing VPValue can be used for \p Instr or a recipe can be
/// create for \p I withing the given VF \p Range. If an existing VPValue can		/// create for \p I withing the given VF \p Range. If an existing VPValue can
/// be used or if a recipe can be created, return it. Otherwise return a		/// be used or if a recipe can be created, return it. Otherwise return a
/// VPRecipeOrVPValueTy with nullptr.		/// VPRecipeOrVPValueTy with nullptr.
VPRecipeOrVPValueTy tryToCreateWidenRecipe(Instruction *Instr,		VPRecipeOrVPValueTy tryToCreateWidenRecipe(Instruction *Instr,
ArrayRef<VPValue *> Operands,		ArrayRef<VPValue *> Operands,
VFRange &Range, VPlanPtr &Plan);		VFRange &Range, VPBasicBlock *VPBB,
		VPlanPtr &Plan);

/// Set the recipe created for given ingredient. This operation is a no-op for		/// Set the recipe created for given ingredient. This operation is a no-op for
/// ingredients that were not marked using a nullptr entry in the map.		/// ingredients that were not marked using a nullptr entry in the map.
void setRecipe(Instruction I, VPRecipeBase R) {		void setRecipe(Instruction I, VPRecipeBase R) {
if (!Ingredient2Recipe.count(I))		if (!Ingredient2Recipe.count(I))
return;		return;
assert(Ingredient2Recipe[I] == nullptr &&		assert(Ingredient2Recipe[I] == nullptr &&
"Recipe already set for ingredient");		"Recipe already set for ingredient");
▲ Show 20 Lines • Show All 50 Lines • Show Last 20 Lines

llvm/test/Transforms/LoopVectorize/AArch64/scalable-predicate-instruction.ll

	; RUN: opt < %s -loop-vectorize -S \| FileCheck %s			; RUN: opt < %s -loop-vectorize -S \| FileCheck %s
	; RUN: opt < %s -loop-vectorize -prefer-predicate-over-epilogue=predicate-dont-vectorize -S \| FileCheck %s			; RUN: opt < %s -loop-vectorize -prefer-predicate-over-epilogue=predicate-dont-vectorize -S \| FileCheck %s

	target datalayout = "e-m:e-i8:8:32-i16:16:32-i64:64-i128:128-n32:64-S128"			target datalayout = "e-m:e-i8:8:32-i16:16:32-i64:64-i128:128-n32:64-S128"
	target triple = "aarch64-unknown-linux-gnu"			target triple = "aarch64-unknown-linux-gnu"

	; The test predication_in_loop corresponds			; The test predication_in_loop corresponds
	; to the following function			; to the following function
	; for (long long i = 0; i < 1024; i++) {			; for (long long i = 0; i < 1024; i++) {
	; if (cond[i])			; if (cond[i])
	; a[i] /= b[i];			; a[i] /= b[i];
	; }			; }

	; Scalarizing the division cannot be done for scalable vectors at the moment
	; when the loop needs predication
	; Future implementation of llvm.vp could allow this to happen

	define void @predication_in_loop(i32* %a, i32* %b, i32* %cond) #0 {			define void @predication_in_loop(i32* %a, i32* %b, i32* %cond) #0 {
	; CHECK-LABEL: @predication_in_loop			; CHECK-LABEL: @predication_in_loop
	; CHECK-NOT: sdiv <vscale x 4 x i32>			; CHECK: sdiv <vscale x 4 x i32>
	;			;
	entry:			entry:
	br label %for.body			br label %for.body

	for.cond.cleanup: ; preds = %for.inc, %entry			for.cond.cleanup: ; preds = %for.inc, %entry
	ret void			ret void

	for.body: ; preds = %entry, %for.inc			for.body: ; preds = %entry, %for.inc
	▲ Show 20 Lines • Show All 67 Lines • Show Last 20 Lines

llvm/test/Transforms/LoopVectorize/AArch64/sve-tail-folding.ll

	Show First 20 Lines • Show All 691 Lines • ▼ Show 20 Lines

	while.end.loopexit: ; preds = %while.body			while.end.loopexit: ; preds = %while.body
	ret void			ret void
	}			}

	; Negative tests where we don't expect tail-folding			; Negative tests where we don't expect tail-folding

	; Integer divides can throw exceptions and since we can't scalarize conditional			; Integer divides can throw exceptions and since we can't scalarize conditional
	; divides for scalable vectors we just don't bother vectorizing.			; divides for scalable vectors we just don't bother vectorizing.
				xbolva00Unsubmitted Not Done Reply Inline Actions Invalid? xbolva00: Invalid?
				reamesAuthorUnsubmitted Done Reply Inline Actions The comment is now stale and needs removed. Is that what you meant by "Invalid"? reames: The comment is now stale and needs removed. Is that what you meant by "Invalid"?
				xbolva00Unsubmitted Not Done Reply Inline Actions Yup, please update it xbolva00: Yup, please update it
	define void @simple_idiv(i32* noalias %dst, i32* noalias %src, i64 %n) #0 {			define void @simple_idiv(i32* noalias %dst, i32* noalias %src, i64 %n) #0 {
	; CHECK-LABEL: @simple_idiv(			; CHECK-LABEL: @simple_idiv(
	; CHECK-NEXT: entry:			; CHECK-NEXT: entry:
				; CHECK-NEXT: [[UMAX:%.]] = call i64 @llvm.umax.i64(i64 [[N:%.]], i64 1)
				; CHECK-NEXT: [[TMP0:%.*]] = sub i64 -1, [[UMAX]]
				; CHECK-NEXT: [[TMP1:%.*]] = call i64 @llvm.vscale.i64()
				; CHECK-NEXT: [[TMP2:%.*]] = mul i64 [[TMP1]], 4
				; CHECK-NEXT: [[TMP3:%.*]] = icmp ult i64 [[TMP0]], [[TMP2]]
				; CHECK-NEXT: br i1 [[TMP3]], label [[SCALAR_PH:%.]], label [[VECTOR_PH:%.]]
				; CHECK: vector.ph:
				; CHECK-NEXT: [[TMP4:%.*]] = call i64 @llvm.vscale.i64()
				; CHECK-NEXT: [[TMP5:%.*]] = mul i64 [[TMP4]], 4
				; CHECK-NEXT: [[TMP6:%.*]] = call i64 @llvm.vscale.i64()
				; CHECK-NEXT: [[TMP7:%.*]] = mul i64 [[TMP6]], 4
				; CHECK-NEXT: [[TMP8:%.*]] = sub i64 [[TMP7]], 1
				; CHECK-NEXT: [[N_RND_UP:%.*]] = add i64 [[UMAX]], [[TMP8]]
				; CHECK-NEXT: [[N_MOD_VF:%.*]] = urem i64 [[N_RND_UP]], [[TMP5]]
				; CHECK-NEXT: [[N_VEC:%.*]] = sub i64 [[N_RND_UP]], [[N_MOD_VF]]
				; CHECK-NEXT: [[ACTIVE_LANE_MASK_ENTRY:%.*]] = call <vscale x 4 x i1> @llvm.get.active.lane.mask.nxv4i1.i64(i64 0, i64 [[UMAX]])
				; CHECK-NEXT: br label [[VECTOR_BODY:%.*]]
				; CHECK: vector.body:
				; CHECK-NEXT: [[INDEX1:%.]] = phi i64 [ 0, [[VECTOR_PH]] ], [ [[INDEX_NEXT3:%.]], [[VECTOR_BODY]] ]
				; CHECK-NEXT: [[ACTIVE_LANE_MASK:%.]] = phi <vscale x 4 x i1> [ [[ACTIVE_LANE_MASK_ENTRY]], [[VECTOR_PH]] ], [ [[ACTIVE_LANE_MASK_NEXT:%.]], [[VECTOR_BODY]] ]
				; CHECK-NEXT: [[TMP9:%.*]] = add i64 [[INDEX1]], 0
				; CHECK-NEXT: [[TMP10:%.]] = getelementptr i32, i32 [[SRC:%.*]], i64 [[TMP9]]
				; CHECK-NEXT: [[TMP11:%.]] = getelementptr i32, i32 [[DST:%.*]], i64 [[TMP9]]
				; CHECK-NEXT: [[TMP12:%.]] = getelementptr i32, i32 [[TMP10]], i32 0
				; CHECK-NEXT: [[TMP13:%.]] = bitcast i32 [[TMP12]] to <vscale x 4 x i32>*
				; CHECK-NEXT: [[WIDE_MASKED_LOAD:%.]] = call <vscale x 4 x i32> @llvm.masked.load.nxv4i32.p0nxv4i32(<vscale x 4 x i32> [[TMP13]], i32 4, <vscale x 4 x i1> [[ACTIVE_LANE_MASK]], <vscale x 4 x i32> poison)
				; CHECK-NEXT: [[TMP14:%.]] = getelementptr i32, i32 [[TMP11]], i32 0
				; CHECK-NEXT: [[TMP15:%.]] = bitcast i32 [[TMP14]] to <vscale x 4 x i32>*
				; CHECK-NEXT: [[WIDE_MASKED_LOAD2:%.]] = call <vscale x 4 x i32> @llvm.masked.load.nxv4i32.p0nxv4i32(<vscale x 4 x i32> [[TMP15]], i32 4, <vscale x 4 x i1> [[ACTIVE_LANE_MASK]], <vscale x 4 x i32> poison)
				; CHECK-NEXT: [[TMP16:%.*]] = select <vscale x 4 x i1> [[ACTIVE_LANE_MASK]], <vscale x 4 x i32> [[WIDE_MASKED_LOAD2]], <vscale x 4 x i32> shufflevector (<vscale x 4 x i32> insertelement (<vscale x 4 x i32> poison, i32 1, i32 0), <vscale x 4 x i32> poison, <vscale x 4 x i32> zeroinitializer)
				; CHECK-NEXT: [[TMP17:%.*]] = udiv <vscale x 4 x i32> [[WIDE_MASKED_LOAD]], [[TMP16]]
				; CHECK-NEXT: [[TMP18:%.]] = bitcast i32 [[TMP14]] to <vscale x 4 x i32>*
				; CHECK-NEXT: call void @llvm.masked.store.nxv4i32.p0nxv4i32(<vscale x 4 x i32> [[TMP17]], <vscale x 4 x i32>* [[TMP18]], i32 4, <vscale x 4 x i1> [[ACTIVE_LANE_MASK]])
				; CHECK-NEXT: [[TMP19:%.*]] = call i64 @llvm.vscale.i64()
				; CHECK-NEXT: [[TMP20:%.*]] = mul i64 [[TMP19]], 4
				; CHECK-NEXT: [[INDEX_NEXT3]] = add i64 [[INDEX1]], [[TMP20]]
				; CHECK-NEXT: [[ACTIVE_LANE_MASK_NEXT]] = call <vscale x 4 x i1> @llvm.get.active.lane.mask.nxv4i1.i64(i64 [[INDEX_NEXT3]], i64 [[UMAX]])
				; CHECK-NEXT: [[TMP21:%.*]] = xor <vscale x 4 x i1> [[ACTIVE_LANE_MASK_NEXT]], shufflevector (<vscale x 4 x i1> insertelement (<vscale x 4 x i1> poison, i1 true, i32 0), <vscale x 4 x i1> poison, <vscale x 4 x i32> zeroinitializer)
				; CHECK-NEXT: [[TMP22:%.*]] = extractelement <vscale x 4 x i1> [[TMP21]], i32 0
				; CHECK-NEXT: br i1 [[TMP22]], label [[MIDDLE_BLOCK:%.*]], label [[VECTOR_BODY]], !llvm.loop [[LOOP20:![0-9]+]]
				; CHECK: middle.block:
				; CHECK-NEXT: br i1 true, label [[WHILE_END_LOOPEXIT:%.*]], label [[SCALAR_PH]]
				; CHECK: scalar.ph:
				; CHECK-NEXT: [[BC_RESUME_VAL:%.]] = phi i64 [ [[N_VEC]], [[MIDDLE_BLOCK]] ], [ 0, [[ENTRY:%.]] ]
	; CHECK-NEXT: br label [[WHILE_BODY:%.*]]			; CHECK-NEXT: br label [[WHILE_BODY:%.*]]
	; CHECK: while.body:			; CHECK: while.body:
	; CHECK-NEXT: [[INDEX:%.]] = phi i64 [ [[INDEX_NEXT:%.]], [[WHILE_BODY]] ], [ 0, [[ENTRY:%.*]] ]			; CHECK-NEXT: [[INDEX:%.]] = phi i64 [ [[INDEX_NEXT:%.]], [[WHILE_BODY]] ], [ [[BC_RESUME_VAL]], [[SCALAR_PH]] ]
	; CHECK-NEXT: [[GEP1:%.]] = getelementptr i32, i32 [[SRC:%.*]], i64 [[INDEX]]			; CHECK-NEXT: [[GEP1:%.]] = getelementptr i32, i32 [[SRC]], i64 [[INDEX]]
	; CHECK-NEXT: [[GEP2:%.]] = getelementptr i32, i32 [[DST:%.*]], i64 [[INDEX]]			; CHECK-NEXT: [[GEP2:%.]] = getelementptr i32, i32 [[DST]], i64 [[INDEX]]
	; CHECK-NEXT: [[VAL1:%.]] = load i32, i32 [[GEP1]], align 4			; CHECK-NEXT: [[VAL1:%.]] = load i32, i32 [[GEP1]], align 4
	; CHECK-NEXT: [[VAL2:%.]] = load i32, i32 [[GEP2]], align 4			; CHECK-NEXT: [[VAL2:%.]] = load i32, i32 [[GEP2]], align 4
	; CHECK-NEXT: [[RES:%.*]] = udiv i32 [[VAL1]], [[VAL2]]			; CHECK-NEXT: [[RES:%.*]] = udiv i32 [[VAL1]], [[VAL2]]
	; CHECK-NEXT: store i32 [[RES]], i32* [[GEP2]], align 4			; CHECK-NEXT: store i32 [[RES]], i32* [[GEP2]], align 4
	; CHECK-NEXT: [[INDEX_NEXT]] = add nsw i64 [[INDEX]], 1			; CHECK-NEXT: [[INDEX_NEXT]] = add nsw i64 [[INDEX]], 1
	; CHECK-NEXT: [[CMP10:%.]] = icmp ult i64 [[INDEX_NEXT]], [[N:%.]]			; CHECK-NEXT: [[CMP10:%.*]] = icmp ult i64 [[INDEX_NEXT]], [[N]]
	; CHECK-NEXT: br i1 [[CMP10]], label [[WHILE_BODY]], label [[WHILE_END_LOOPEXIT:%.*]], !llvm.loop [[LOOP20:![0-9]+]]			; CHECK-NEXT: br i1 [[CMP10]], label [[WHILE_BODY]], label [[WHILE_END_LOOPEXIT]], !llvm.loop [[LOOP21:![0-9]+]]
	; CHECK: while.end.loopexit:			; CHECK: while.end.loopexit:
	; CHECK-NEXT: ret void			; CHECK-NEXT: ret void
	;			;
	entry:			entry:
	br label %while.body			br label %while.body

	while.body: ; preds = %while.body, %entry			while.body: ; preds = %while.body, %entry
	%index = phi i64 [ %index.next, %while.body ], [ 0, %entry ]			%index = phi i64 [ %index.next, %while.body ], [ 0, %entry ]
	Show All 21 Lines

llvm/test/Transforms/LoopVectorize/RISCV/scalable-divrem.ll

	Show First 20 Lines • Show All 243 Lines • ▼ Show 20 Lines

	for.end:			for.end:
	ret void			ret void
	}			}

	define void @predicated_udiv(ptr noalias nocapture %a, i64 %v, i64 %n) {			define void @predicated_udiv(ptr noalias nocapture %a, i64 %v, i64 %n) {
	; CHECK-LABEL: @predicated_udiv(			; CHECK-LABEL: @predicated_udiv(
	; CHECK-NEXT: entry:			; CHECK-NEXT: entry:
				; CHECK-NEXT: [[TMP0:%.*]] = call i64 @llvm.vscale.i64()
				; CHECK-NEXT: [[MIN_ITERS_CHECK:%.*]] = icmp ult i64 1024, [[TMP0]]
				; CHECK-NEXT: br i1 [[MIN_ITERS_CHECK]], label [[SCALAR_PH:%.]], label [[VECTOR_PH:%.]]
				; CHECK: vector.ph:
				; CHECK-NEXT: [[TMP1:%.*]] = call i64 @llvm.vscale.i64()
				; CHECK-NEXT: [[N_MOD_VF:%.*]] = urem i64 1024, [[TMP1]]
				; CHECK-NEXT: [[N_VEC:%.*]] = sub i64 1024, [[N_MOD_VF]]
				; CHECK-NEXT: [[BROADCAST_SPLATINSERT:%.]] = insertelement <vscale x 1 x i64> poison, i64 [[V:%.]], i32 0
				; CHECK-NEXT: [[BROADCAST_SPLAT:%.*]] = shufflevector <vscale x 1 x i64> [[BROADCAST_SPLATINSERT]], <vscale x 1 x i64> poison, <vscale x 1 x i32> zeroinitializer
				; CHECK-NEXT: br label [[VECTOR_BODY:%.*]]
				; CHECK: vector.body:
				; CHECK-NEXT: [[INDEX:%.]] = phi i64 [ 0, [[VECTOR_PH]] ], [ [[INDEX_NEXT:%.]], [[VECTOR_BODY]] ]
				; CHECK-NEXT: [[TMP2:%.*]] = add i64 [[INDEX]], 0
				; CHECK-NEXT: [[TMP3:%.]] = getelementptr inbounds i64, ptr [[A:%.]], i64 [[TMP2]]
				; CHECK-NEXT: [[TMP4:%.*]] = getelementptr inbounds i64, ptr [[TMP3]], i32 0
				; CHECK-NEXT: [[WIDE_LOAD:%.*]] = load <vscale x 1 x i64>, ptr [[TMP4]], align 8
				; CHECK-NEXT: [[TMP5:%.*]] = icmp ne <vscale x 1 x i64> [[BROADCAST_SPLAT]], zeroinitializer
				; CHECK-NEXT: [[TMP6:%.*]] = select <vscale x 1 x i1> [[TMP5]], <vscale x 1 x i64> [[BROADCAST_SPLAT]], <vscale x 1 x i64> shufflevector (<vscale x 1 x i64> insertelement (<vscale x 1 x i64> poison, i64 1, i32 0), <vscale x 1 x i64> poison, <vscale x 1 x i32> zeroinitializer)
				; CHECK-NEXT: [[TMP7:%.*]] = udiv <vscale x 1 x i64> [[WIDE_LOAD]], [[TMP6]]
				; CHECK-NEXT: [[TMP8:%.*]] = xor <vscale x 1 x i1> [[TMP5]], shufflevector (<vscale x 1 x i1> insertelement (<vscale x 1 x i1> poison, i1 true, i32 0), <vscale x 1 x i1> poison, <vscale x 1 x i32> zeroinitializer)
				; CHECK-NEXT: [[PREDPHI:%.*]] = select <vscale x 1 x i1> [[TMP5]], <vscale x 1 x i64> [[TMP7]], <vscale x 1 x i64> [[WIDE_LOAD]]
				; CHECK-NEXT: store <vscale x 1 x i64> [[PREDPHI]], ptr [[TMP4]], align 8
				; CHECK-NEXT: [[TMP9:%.*]] = call i64 @llvm.vscale.i64()
				; CHECK-NEXT: [[INDEX_NEXT]] = add nuw i64 [[INDEX]], [[TMP9]]
				; CHECK-NEXT: [[TMP10:%.*]] = icmp eq i64 [[INDEX_NEXT]], [[N_VEC]]
				; CHECK-NEXT: br i1 [[TMP10]], label [[MIDDLE_BLOCK:%.*]], label [[VECTOR_BODY]], !llvm.loop [[LOOP10:![0-9]+]]
				; CHECK: middle.block:
				; CHECK-NEXT: [[CMP_N:%.*]] = icmp eq i64 1024, [[N_VEC]]
				; CHECK-NEXT: br i1 [[CMP_N]], label [[FOR_END:%.*]], label [[SCALAR_PH]]
				; CHECK: scalar.ph:
				; CHECK-NEXT: [[BC_RESUME_VAL:%.]] = phi i64 [ [[N_VEC]], [[MIDDLE_BLOCK]] ], [ 0, [[ENTRY:%.]] ]
	; CHECK-NEXT: br label [[FOR_BODY:%.*]]			; CHECK-NEXT: br label [[FOR_BODY:%.*]]
	; CHECK: for.body:			; CHECK: for.body:
	; CHECK-NEXT: [[IV:%.]] = phi i64 [ 0, [[ENTRY:%.]] ], [ [[IV_NEXT:%.]], [[LATCH:%.]] ]			; CHECK-NEXT: [[IV:%.]] = phi i64 [ [[BC_RESUME_VAL]], [[SCALAR_PH]] ], [ [[IV_NEXT:%.]], [[LATCH:%.*]] ]
	; CHECK-NEXT: [[ARRAYIDX:%.]] = getelementptr inbounds i64, ptr [[A:%.]], i64 [[IV]]			; CHECK-NEXT: [[ARRAYIDX:%.*]] = getelementptr inbounds i64, ptr [[A]], i64 [[IV]]
	; CHECK-NEXT: [[ELEM:%.*]] = load i64, ptr [[ARRAYIDX]], align 8			; CHECK-NEXT: [[ELEM:%.*]] = load i64, ptr [[ARRAYIDX]], align 8
	; CHECK-NEXT: [[C:%.]] = icmp ne i64 [[V:%.]], 0			; CHECK-NEXT: [[C:%.*]] = icmp ne i64 [[V]], 0
	; CHECK-NEXT: br i1 [[C]], label [[DO_OP:%.*]], label [[LATCH]]			; CHECK-NEXT: br i1 [[C]], label [[DO_OP:%.*]], label [[LATCH]]
	; CHECK: do_op:			; CHECK: do_op:
	; CHECK-NEXT: [[DIVREM:%.*]] = udiv i64 [[ELEM]], [[V]]			; CHECK-NEXT: [[DIVREM:%.*]] = udiv i64 [[ELEM]], [[V]]
	; CHECK-NEXT: br label [[LATCH]]			; CHECK-NEXT: br label [[LATCH]]
	; CHECK: latch:			; CHECK: latch:
	; CHECK-NEXT: [[PHI:%.*]] = phi i64 [ [[ELEM]], [[FOR_BODY]] ], [ [[DIVREM]], [[DO_OP]] ]			; CHECK-NEXT: [[PHI:%.*]] = phi i64 [ [[ELEM]], [[FOR_BODY]] ], [ [[DIVREM]], [[DO_OP]] ]
	; CHECK-NEXT: store i64 [[PHI]], ptr [[ARRAYIDX]], align 8			; CHECK-NEXT: store i64 [[PHI]], ptr [[ARRAYIDX]], align 8
	; CHECK-NEXT: [[IV_NEXT]] = add nuw nsw i64 [[IV]], 1			; CHECK-NEXT: [[IV_NEXT]] = add nuw nsw i64 [[IV]], 1
	; CHECK-NEXT: [[EXITCOND_NOT:%.*]] = icmp eq i64 [[IV_NEXT]], 1024			; CHECK-NEXT: [[EXITCOND_NOT:%.*]] = icmp eq i64 [[IV_NEXT]], 1024
	; CHECK-NEXT: br i1 [[EXITCOND_NOT]], label [[FOR_END:%.*]], label [[FOR_BODY]]			; CHECK-NEXT: br i1 [[EXITCOND_NOT]], label [[FOR_END]], label [[FOR_BODY]], !llvm.loop [[LOOP11:![0-9]+]]
	; CHECK: for.end:			; CHECK: for.end:
	; CHECK-NEXT: ret void			; CHECK-NEXT: ret void
	;			;
	entry:			entry:
	br label %for.body			br label %for.body

	for.body:			for.body:
	%iv = phi i64 [ 0, %entry ], [ %iv.next, %latch ]			%iv = phi i64 [ 0, %entry ], [ %iv.next, %latch ]
	Show All 13 Lines

	for.end:			for.end:
	ret void			ret void
	}			}

	define void @predicated_sdiv(ptr noalias nocapture %a, i64 %v, i64 %n) {			define void @predicated_sdiv(ptr noalias nocapture %a, i64 %v, i64 %n) {
	; CHECK-LABEL: @predicated_sdiv(			; CHECK-LABEL: @predicated_sdiv(
	; CHECK-NEXT: entry:			; CHECK-NEXT: entry:
				; CHECK-NEXT: [[TMP0:%.*]] = call i64 @llvm.vscale.i64()
				; CHECK-NEXT: [[MIN_ITERS_CHECK:%.*]] = icmp ult i64 1024, [[TMP0]]
				; CHECK-NEXT: br i1 [[MIN_ITERS_CHECK]], label [[SCALAR_PH:%.]], label [[VECTOR_PH:%.]]
				; CHECK: vector.ph:
				; CHECK-NEXT: [[TMP1:%.*]] = call i64 @llvm.vscale.i64()
				; CHECK-NEXT: [[N_MOD_VF:%.*]] = urem i64 1024, [[TMP1]]
				; CHECK-NEXT: [[N_VEC:%.*]] = sub i64 1024, [[N_MOD_VF]]
				; CHECK-NEXT: [[BROADCAST_SPLATINSERT:%.]] = insertelement <vscale x 1 x i64> poison, i64 [[V:%.]], i32 0
				; CHECK-NEXT: [[BROADCAST_SPLAT:%.*]] = shufflevector <vscale x 1 x i64> [[BROADCAST_SPLATINSERT]], <vscale x 1 x i64> poison, <vscale x 1 x i32> zeroinitializer
				; CHECK-NEXT: br label [[VECTOR_BODY:%.*]]
				; CHECK: vector.body:
				; CHECK-NEXT: [[INDEX:%.]] = phi i64 [ 0, [[VECTOR_PH]] ], [ [[INDEX_NEXT:%.]], [[VECTOR_BODY]] ]
				; CHECK-NEXT: [[TMP2:%.*]] = add i64 [[INDEX]], 0
				; CHECK-NEXT: [[TMP3:%.]] = getelementptr inbounds i64, ptr [[A:%.]], i64 [[TMP2]]
				; CHECK-NEXT: [[TMP4:%.*]] = getelementptr inbounds i64, ptr [[TMP3]], i32 0
				; CHECK-NEXT: [[WIDE_LOAD:%.*]] = load <vscale x 1 x i64>, ptr [[TMP4]], align 8
				; CHECK-NEXT: [[TMP5:%.*]] = icmp ne <vscale x 1 x i64> [[BROADCAST_SPLAT]], zeroinitializer
				; CHECK-NEXT: [[TMP6:%.*]] = select <vscale x 1 x i1> [[TMP5]], <vscale x 1 x i64> [[BROADCAST_SPLAT]], <vscale x 1 x i64> shufflevector (<vscale x 1 x i64> insertelement (<vscale x 1 x i64> poison, i64 1, i32 0), <vscale x 1 x i64> poison, <vscale x 1 x i32> zeroinitializer)
				; CHECK-NEXT: [[TMP7:%.*]] = sdiv <vscale x 1 x i64> [[WIDE_LOAD]], [[TMP6]]
				; CHECK-NEXT: [[TMP8:%.*]] = xor <vscale x 1 x i1> [[TMP5]], shufflevector (<vscale x 1 x i1> insertelement (<vscale x 1 x i1> poison, i1 true, i32 0), <vscale x 1 x i1> poison, <vscale x 1 x i32> zeroinitializer)
				; CHECK-NEXT: [[PREDPHI:%.*]] = select <vscale x 1 x i1> [[TMP5]], <vscale x 1 x i64> [[TMP7]], <vscale x 1 x i64> [[WIDE_LOAD]]
				; CHECK-NEXT: store <vscale x 1 x i64> [[PREDPHI]], ptr [[TMP4]], align 8
				; CHECK-NEXT: [[TMP9:%.*]] = call i64 @llvm.vscale.i64()
				; CHECK-NEXT: [[INDEX_NEXT]] = add nuw i64 [[INDEX]], [[TMP9]]
				; CHECK-NEXT: [[TMP10:%.*]] = icmp eq i64 [[INDEX_NEXT]], [[N_VEC]]
				; CHECK-NEXT: br i1 [[TMP10]], label [[MIDDLE_BLOCK:%.*]], label [[VECTOR_BODY]], !llvm.loop [[LOOP12:![0-9]+]]
				; CHECK: middle.block:
				; CHECK-NEXT: [[CMP_N:%.*]] = icmp eq i64 1024, [[N_VEC]]
				; CHECK-NEXT: br i1 [[CMP_N]], label [[FOR_END:%.*]], label [[SCALAR_PH]]
				; CHECK: scalar.ph:
				; CHECK-NEXT: [[BC_RESUME_VAL:%.]] = phi i64 [ [[N_VEC]], [[MIDDLE_BLOCK]] ], [ 0, [[ENTRY:%.]] ]
	; CHECK-NEXT: br label [[FOR_BODY:%.*]]			; CHECK-NEXT: br label [[FOR_BODY:%.*]]
	; CHECK: for.body:			; CHECK: for.body:
	; CHECK-NEXT: [[IV:%.]] = phi i64 [ 0, [[ENTRY:%.]] ], [ [[IV_NEXT:%.]], [[LATCH:%.]] ]			; CHECK-NEXT: [[IV:%.]] = phi i64 [ [[BC_RESUME_VAL]], [[SCALAR_PH]] ], [ [[IV_NEXT:%.]], [[LATCH:%.*]] ]
	; CHECK-NEXT: [[ARRAYIDX:%.]] = getelementptr inbounds i64, ptr [[A:%.]], i64 [[IV]]			; CHECK-NEXT: [[ARRAYIDX:%.*]] = getelementptr inbounds i64, ptr [[A]], i64 [[IV]]
	; CHECK-NEXT: [[ELEM:%.*]] = load i64, ptr [[ARRAYIDX]], align 8			; CHECK-NEXT: [[ELEM:%.*]] = load i64, ptr [[ARRAYIDX]], align 8
	; CHECK-NEXT: [[C:%.]] = icmp ne i64 [[V:%.]], 0			; CHECK-NEXT: [[C:%.*]] = icmp ne i64 [[V]], 0
	; CHECK-NEXT: br i1 [[C]], label [[DO_OP:%.*]], label [[LATCH]]			; CHECK-NEXT: br i1 [[C]], label [[DO_OP:%.*]], label [[LATCH]]
	; CHECK: do_op:			; CHECK: do_op:
	; CHECK-NEXT: [[DIVREM:%.*]] = sdiv i64 [[ELEM]], [[V]]			; CHECK-NEXT: [[DIVREM:%.*]] = sdiv i64 [[ELEM]], [[V]]
	; CHECK-NEXT: br label [[LATCH]]			; CHECK-NEXT: br label [[LATCH]]
	; CHECK: latch:			; CHECK: latch:
	; CHECK-NEXT: [[PHI:%.*]] = phi i64 [ [[ELEM]], [[FOR_BODY]] ], [ [[DIVREM]], [[DO_OP]] ]			; CHECK-NEXT: [[PHI:%.*]] = phi i64 [ [[ELEM]], [[FOR_BODY]] ], [ [[DIVREM]], [[DO_OP]] ]
	; CHECK-NEXT: store i64 [[PHI]], ptr [[ARRAYIDX]], align 8			; CHECK-NEXT: store i64 [[PHI]], ptr [[ARRAYIDX]], align 8
	; CHECK-NEXT: [[IV_NEXT]] = add nuw nsw i64 [[IV]], 1			; CHECK-NEXT: [[IV_NEXT]] = add nuw nsw i64 [[IV]], 1
	; CHECK-NEXT: [[EXITCOND_NOT:%.*]] = icmp eq i64 [[IV_NEXT]], 1024			; CHECK-NEXT: [[EXITCOND_NOT:%.*]] = icmp eq i64 [[IV_NEXT]], 1024
	; CHECK-NEXT: br i1 [[EXITCOND_NOT]], label [[FOR_END:%.*]], label [[FOR_BODY]]			; CHECK-NEXT: br i1 [[EXITCOND_NOT]], label [[FOR_END]], label [[FOR_BODY]], !llvm.loop [[LOOP13:![0-9]+]]
	; CHECK: for.end:			; CHECK: for.end:
	; CHECK-NEXT: ret void			; CHECK-NEXT: ret void
	;			;
	entry:			entry:
	br label %for.body			br label %for.body

	for.body:			for.body:
	%iv = phi i64 [ 0, %entry ], [ %iv.next, %latch ]			%iv = phi i64 [ 0, %entry ], [ %iv.next, %latch ]
	Show All 35 Lines
	; CHECK-NEXT: [[TMP5:%.*]] = icmp ne <vscale x 1 x i64> [[WIDE_LOAD]], shufflevector (<vscale x 1 x i64> insertelement (<vscale x 1 x i64> poison, i64 42, i32 0), <vscale x 1 x i64> poison, <vscale x 1 x i32> zeroinitializer)			; CHECK-NEXT: [[TMP5:%.*]] = icmp ne <vscale x 1 x i64> [[WIDE_LOAD]], shufflevector (<vscale x 1 x i64> insertelement (<vscale x 1 x i64> poison, i64 42, i32 0), <vscale x 1 x i64> poison, <vscale x 1 x i32> zeroinitializer)
	; CHECK-NEXT: [[TMP6:%.*]] = udiv <vscale x 1 x i64> [[WIDE_LOAD]], shufflevector (<vscale x 1 x i64> insertelement (<vscale x 1 x i64> poison, i64 27, i32 0), <vscale x 1 x i64> poison, <vscale x 1 x i32> zeroinitializer)			; CHECK-NEXT: [[TMP6:%.*]] = udiv <vscale x 1 x i64> [[WIDE_LOAD]], shufflevector (<vscale x 1 x i64> insertelement (<vscale x 1 x i64> poison, i64 27, i32 0), <vscale x 1 x i64> poison, <vscale x 1 x i32> zeroinitializer)
	; CHECK-NEXT: [[TMP7:%.*]] = xor <vscale x 1 x i1> [[TMP5]], shufflevector (<vscale x 1 x i1> insertelement (<vscale x 1 x i1> poison, i1 true, i32 0), <vscale x 1 x i1> poison, <vscale x 1 x i32> zeroinitializer)			; CHECK-NEXT: [[TMP7:%.*]] = xor <vscale x 1 x i1> [[TMP5]], shufflevector (<vscale x 1 x i1> insertelement (<vscale x 1 x i1> poison, i1 true, i32 0), <vscale x 1 x i1> poison, <vscale x 1 x i32> zeroinitializer)
	; CHECK-NEXT: [[PREDPHI:%.*]] = select <vscale x 1 x i1> [[TMP5]], <vscale x 1 x i64> [[TMP6]], <vscale x 1 x i64> [[WIDE_LOAD]]			; CHECK-NEXT: [[PREDPHI:%.*]] = select <vscale x 1 x i1> [[TMP5]], <vscale x 1 x i64> [[TMP6]], <vscale x 1 x i64> [[WIDE_LOAD]]
	; CHECK-NEXT: store <vscale x 1 x i64> [[PREDPHI]], ptr [[TMP4]], align 8			; CHECK-NEXT: store <vscale x 1 x i64> [[PREDPHI]], ptr [[TMP4]], align 8
	; CHECK-NEXT: [[TMP8:%.*]] = call i64 @llvm.vscale.i64()			; CHECK-NEXT: [[TMP8:%.*]] = call i64 @llvm.vscale.i64()
	; CHECK-NEXT: [[INDEX_NEXT]] = add nuw i64 [[INDEX]], [[TMP8]]			; CHECK-NEXT: [[INDEX_NEXT]] = add nuw i64 [[INDEX]], [[TMP8]]
	; CHECK-NEXT: [[TMP9:%.*]] = icmp eq i64 [[INDEX_NEXT]], [[N_VEC]]			; CHECK-NEXT: [[TMP9:%.*]] = icmp eq i64 [[INDEX_NEXT]], [[N_VEC]]
	; CHECK-NEXT: br i1 [[TMP9]], label [[MIDDLE_BLOCK:%.*]], label [[VECTOR_BODY]], !llvm.loop [[LOOP10:![0-9]+]]			; CHECK-NEXT: br i1 [[TMP9]], label [[MIDDLE_BLOCK:%.*]], label [[VECTOR_BODY]], !llvm.loop [[LOOP14:![0-9]+]]
	; CHECK: middle.block:			; CHECK: middle.block:
	; CHECK-NEXT: [[CMP_N:%.*]] = icmp eq i64 1024, [[N_VEC]]			; CHECK-NEXT: [[CMP_N:%.*]] = icmp eq i64 1024, [[N_VEC]]
	; CHECK-NEXT: br i1 [[CMP_N]], label [[FOR_END:%.*]], label [[SCALAR_PH]]			; CHECK-NEXT: br i1 [[CMP_N]], label [[FOR_END:%.*]], label [[SCALAR_PH]]
	; CHECK: scalar.ph:			; CHECK: scalar.ph:
	; CHECK-NEXT: [[BC_RESUME_VAL:%.]] = phi i64 [ [[N_VEC]], [[MIDDLE_BLOCK]] ], [ 0, [[ENTRY:%.]] ]			; CHECK-NEXT: [[BC_RESUME_VAL:%.]] = phi i64 [ [[N_VEC]], [[MIDDLE_BLOCK]] ], [ 0, [[ENTRY:%.]] ]
	; CHECK-NEXT: br label [[FOR_BODY:%.*]]			; CHECK-NEXT: br label [[FOR_BODY:%.*]]
	; CHECK: for.body:			; CHECK: for.body:
	; CHECK-NEXT: [[IV:%.]] = phi i64 [ [[BC_RESUME_VAL]], [[SCALAR_PH]] ], [ [[IV_NEXT:%.]], [[LATCH:%.*]] ]			; CHECK-NEXT: [[IV:%.]] = phi i64 [ [[BC_RESUME_VAL]], [[SCALAR_PH]] ], [ [[IV_NEXT:%.]], [[LATCH:%.*]] ]
	; CHECK-NEXT: [[ARRAYIDX:%.*]] = getelementptr inbounds i64, ptr [[A]], i64 [[IV]]			; CHECK-NEXT: [[ARRAYIDX:%.*]] = getelementptr inbounds i64, ptr [[A]], i64 [[IV]]
	; CHECK-NEXT: [[ELEM:%.*]] = load i64, ptr [[ARRAYIDX]], align 8			; CHECK-NEXT: [[ELEM:%.*]] = load i64, ptr [[ARRAYIDX]], align 8
	; CHECK-NEXT: [[C:%.*]] = icmp ne i64 [[ELEM]], 42			; CHECK-NEXT: [[C:%.*]] = icmp ne i64 [[ELEM]], 42
	; CHECK-NEXT: br i1 [[C]], label [[DO_OP:%.*]], label [[LATCH]]			; CHECK-NEXT: br i1 [[C]], label [[DO_OP:%.*]], label [[LATCH]]
	; CHECK: do_op:			; CHECK: do_op:
	; CHECK-NEXT: [[DIVREM:%.*]] = udiv i64 [[ELEM]], 27			; CHECK-NEXT: [[DIVREM:%.*]] = udiv i64 [[ELEM]], 27
	; CHECK-NEXT: br label [[LATCH]]			; CHECK-NEXT: br label [[LATCH]]
	; CHECK: latch:			; CHECK: latch:
	; CHECK-NEXT: [[PHI:%.*]] = phi i64 [ [[ELEM]], [[FOR_BODY]] ], [ [[DIVREM]], [[DO_OP]] ]			; CHECK-NEXT: [[PHI:%.*]] = phi i64 [ [[ELEM]], [[FOR_BODY]] ], [ [[DIVREM]], [[DO_OP]] ]
	; CHECK-NEXT: store i64 [[PHI]], ptr [[ARRAYIDX]], align 8			; CHECK-NEXT: store i64 [[PHI]], ptr [[ARRAYIDX]], align 8
	; CHECK-NEXT: [[IV_NEXT]] = add nuw nsw i64 [[IV]], 1			; CHECK-NEXT: [[IV_NEXT]] = add nuw nsw i64 [[IV]], 1
	; CHECK-NEXT: [[EXITCOND_NOT:%.*]] = icmp eq i64 [[IV_NEXT]], 1024			; CHECK-NEXT: [[EXITCOND_NOT:%.*]] = icmp eq i64 [[IV_NEXT]], 1024
	; CHECK-NEXT: br i1 [[EXITCOND_NOT]], label [[FOR_END]], label [[FOR_BODY]], !llvm.loop [[LOOP11:![0-9]+]]			; CHECK-NEXT: br i1 [[EXITCOND_NOT]], label [[FOR_END]], label [[FOR_BODY]], !llvm.loop [[LOOP15:![0-9]+]]
	; CHECK: for.end:			; CHECK: for.end:
	; CHECK-NEXT: ret void			; CHECK-NEXT: ret void
	;			;
	entry:			entry:
	br label %for.body			br label %for.body

	for.body:			for.body:
	%iv = phi i64 [ 0, %entry ], [ %iv.next, %latch ]			%iv = phi i64 [ 0, %entry ], [ %iv.next, %latch ]
	Show All 35 Lines
	; CHECK-NEXT: [[TMP5:%.*]] = icmp ne <vscale x 1 x i64> [[WIDE_LOAD]], shufflevector (<vscale x 1 x i64> insertelement (<vscale x 1 x i64> poison, i64 42, i32 0), <vscale x 1 x i64> poison, <vscale x 1 x i32> zeroinitializer)			; CHECK-NEXT: [[TMP5:%.*]] = icmp ne <vscale x 1 x i64> [[WIDE_LOAD]], shufflevector (<vscale x 1 x i64> insertelement (<vscale x 1 x i64> poison, i64 42, i32 0), <vscale x 1 x i64> poison, <vscale x 1 x i32> zeroinitializer)
	; CHECK-NEXT: [[TMP6:%.*]] = sdiv <vscale x 1 x i64> [[WIDE_LOAD]], shufflevector (<vscale x 1 x i64> insertelement (<vscale x 1 x i64> poison, i64 27, i32 0), <vscale x 1 x i64> poison, <vscale x 1 x i32> zeroinitializer)			; CHECK-NEXT: [[TMP6:%.*]] = sdiv <vscale x 1 x i64> [[WIDE_LOAD]], shufflevector (<vscale x 1 x i64> insertelement (<vscale x 1 x i64> poison, i64 27, i32 0), <vscale x 1 x i64> poison, <vscale x 1 x i32> zeroinitializer)
	; CHECK-NEXT: [[TMP7:%.*]] = xor <vscale x 1 x i1> [[TMP5]], shufflevector (<vscale x 1 x i1> insertelement (<vscale x 1 x i1> poison, i1 true, i32 0), <vscale x 1 x i1> poison, <vscale x 1 x i32> zeroinitializer)			; CHECK-NEXT: [[TMP7:%.*]] = xor <vscale x 1 x i1> [[TMP5]], shufflevector (<vscale x 1 x i1> insertelement (<vscale x 1 x i1> poison, i1 true, i32 0), <vscale x 1 x i1> poison, <vscale x 1 x i32> zeroinitializer)
	; CHECK-NEXT: [[PREDPHI:%.*]] = select <vscale x 1 x i1> [[TMP5]], <vscale x 1 x i64> [[TMP6]], <vscale x 1 x i64> [[WIDE_LOAD]]			; CHECK-NEXT: [[PREDPHI:%.*]] = select <vscale x 1 x i1> [[TMP5]], <vscale x 1 x i64> [[TMP6]], <vscale x 1 x i64> [[WIDE_LOAD]]
	; CHECK-NEXT: store <vscale x 1 x i64> [[PREDPHI]], ptr [[TMP4]], align 8			; CHECK-NEXT: store <vscale x 1 x i64> [[PREDPHI]], ptr [[TMP4]], align 8
	; CHECK-NEXT: [[TMP8:%.*]] = call i64 @llvm.vscale.i64()			; CHECK-NEXT: [[TMP8:%.*]] = call i64 @llvm.vscale.i64()
	; CHECK-NEXT: [[INDEX_NEXT]] = add nuw i64 [[INDEX]], [[TMP8]]			; CHECK-NEXT: [[INDEX_NEXT]] = add nuw i64 [[INDEX]], [[TMP8]]
	; CHECK-NEXT: [[TMP9:%.*]] = icmp eq i64 [[INDEX_NEXT]], [[N_VEC]]			; CHECK-NEXT: [[TMP9:%.*]] = icmp eq i64 [[INDEX_NEXT]], [[N_VEC]]
	; CHECK-NEXT: br i1 [[TMP9]], label [[MIDDLE_BLOCK:%.*]], label [[VECTOR_BODY]], !llvm.loop [[LOOP12:![0-9]+]]			; CHECK-NEXT: br i1 [[TMP9]], label [[MIDDLE_BLOCK:%.*]], label [[VECTOR_BODY]], !llvm.loop [[LOOP16:![0-9]+]]
	; CHECK: middle.block:			; CHECK: middle.block:
	; CHECK-NEXT: [[CMP_N:%.*]] = icmp eq i64 1024, [[N_VEC]]			; CHECK-NEXT: [[CMP_N:%.*]] = icmp eq i64 1024, [[N_VEC]]
	; CHECK-NEXT: br i1 [[CMP_N]], label [[FOR_END:%.*]], label [[SCALAR_PH]]			; CHECK-NEXT: br i1 [[CMP_N]], label [[FOR_END:%.*]], label [[SCALAR_PH]]
	; CHECK: scalar.ph:			; CHECK: scalar.ph:
	; CHECK-NEXT: [[BC_RESUME_VAL:%.]] = phi i64 [ [[N_VEC]], [[MIDDLE_BLOCK]] ], [ 0, [[ENTRY:%.]] ]			; CHECK-NEXT: [[BC_RESUME_VAL:%.]] = phi i64 [ [[N_VEC]], [[MIDDLE_BLOCK]] ], [ 0, [[ENTRY:%.]] ]
	; CHECK-NEXT: br label [[FOR_BODY:%.*]]			; CHECK-NEXT: br label [[FOR_BODY:%.*]]
	; CHECK: for.body:			; CHECK: for.body:
	; CHECK-NEXT: [[IV:%.]] = phi i64 [ [[BC_RESUME_VAL]], [[SCALAR_PH]] ], [ [[IV_NEXT:%.]], [[LATCH:%.*]] ]			; CHECK-NEXT: [[IV:%.]] = phi i64 [ [[BC_RESUME_VAL]], [[SCALAR_PH]] ], [ [[IV_NEXT:%.]], [[LATCH:%.*]] ]
	; CHECK-NEXT: [[ARRAYIDX:%.*]] = getelementptr inbounds i64, ptr [[A]], i64 [[IV]]			; CHECK-NEXT: [[ARRAYIDX:%.*]] = getelementptr inbounds i64, ptr [[A]], i64 [[IV]]
	; CHECK-NEXT: [[ELEM:%.*]] = load i64, ptr [[ARRAYIDX]], align 8			; CHECK-NEXT: [[ELEM:%.*]] = load i64, ptr [[ARRAYIDX]], align 8
	; CHECK-NEXT: [[C:%.*]] = icmp ne i64 [[ELEM]], 42			; CHECK-NEXT: [[C:%.*]] = icmp ne i64 [[ELEM]], 42
	; CHECK-NEXT: br i1 [[C]], label [[DO_OP:%.*]], label [[LATCH]]			; CHECK-NEXT: br i1 [[C]], label [[DO_OP:%.*]], label [[LATCH]]
	; CHECK: do_op:			; CHECK: do_op:
	; CHECK-NEXT: [[DIVREM:%.*]] = sdiv i64 [[ELEM]], 27			; CHECK-NEXT: [[DIVREM:%.*]] = sdiv i64 [[ELEM]], 27
	; CHECK-NEXT: br label [[LATCH]]			; CHECK-NEXT: br label [[LATCH]]
	; CHECK: latch:			; CHECK: latch:
	; CHECK-NEXT: [[PHI:%.*]] = phi i64 [ [[ELEM]], [[FOR_BODY]] ], [ [[DIVREM]], [[DO_OP]] ]			; CHECK-NEXT: [[PHI:%.*]] = phi i64 [ [[ELEM]], [[FOR_BODY]] ], [ [[DIVREM]], [[DO_OP]] ]
	; CHECK-NEXT: store i64 [[PHI]], ptr [[ARRAYIDX]], align 8			; CHECK-NEXT: store i64 [[PHI]], ptr [[ARRAYIDX]], align 8
	; CHECK-NEXT: [[IV_NEXT]] = add nuw nsw i64 [[IV]], 1			; CHECK-NEXT: [[IV_NEXT]] = add nuw nsw i64 [[IV]], 1
	; CHECK-NEXT: [[EXITCOND_NOT:%.*]] = icmp eq i64 [[IV_NEXT]], 1024			; CHECK-NEXT: [[EXITCOND_NOT:%.*]] = icmp eq i64 [[IV_NEXT]], 1024
	; CHECK-NEXT: br i1 [[EXITCOND_NOT]], label [[FOR_END]], label [[FOR_BODY]], !llvm.loop [[LOOP13:![0-9]+]]			; CHECK-NEXT: br i1 [[EXITCOND_NOT]], label [[FOR_END]], label [[FOR_BODY]], !llvm.loop [[LOOP17:![0-9]+]]
	; CHECK: for.end:			; CHECK: for.end:
	; CHECK-NEXT: ret void			; CHECK-NEXT: ret void
	;			;
	entry:			entry:
	br label %for.body			br label %for.body

	for.body:			for.body:
	%iv = phi i64 [ 0, %entry ], [ %iv.next, %latch ]			%iv = phi i64 [ 0, %entry ], [ %iv.next, %latch ]
	Show All 13 Lines

	for.end:			for.end:
	ret void			ret void
	}			}

	define void @predicated_sdiv_by_minus_one(ptr noalias nocapture %a, i64 %n) {			define void @predicated_sdiv_by_minus_one(ptr noalias nocapture %a, i64 %n) {
	; CHECK-LABEL: @predicated_sdiv_by_minus_one(			; CHECK-LABEL: @predicated_sdiv_by_minus_one(
	; CHECK-NEXT: entry:			; CHECK-NEXT: entry:
				; CHECK-NEXT: [[TMP0:%.*]] = call i64 @llvm.vscale.i64()
				; CHECK-NEXT: [[TMP1:%.*]] = mul i64 [[TMP0]], 16
				; CHECK-NEXT: [[MIN_ITERS_CHECK:%.*]] = icmp ult i64 1024, [[TMP1]]
				; CHECK-NEXT: br i1 [[MIN_ITERS_CHECK]], label [[SCALAR_PH:%.]], label [[VECTOR_PH:%.]]
				; CHECK: vector.ph:
				; CHECK-NEXT: [[TMP2:%.*]] = call i64 @llvm.vscale.i64()
				; CHECK-NEXT: [[TMP3:%.*]] = mul i64 [[TMP2]], 16
				; CHECK-NEXT: [[N_MOD_VF:%.*]] = urem i64 1024, [[TMP3]]
				; CHECK-NEXT: [[N_VEC:%.*]] = sub i64 1024, [[N_MOD_VF]]
				; CHECK-NEXT: br label [[VECTOR_BODY:%.*]]
				; CHECK: vector.body:
				; CHECK-NEXT: [[INDEX:%.]] = phi i64 [ 0, [[VECTOR_PH]] ], [ [[INDEX_NEXT:%.]], [[VECTOR_BODY]] ]
				; CHECK-NEXT: [[TMP4:%.*]] = add i64 [[INDEX]], 0
				; CHECK-NEXT: [[TMP5:%.*]] = call i64 @llvm.vscale.i64()
				; CHECK-NEXT: [[TMP6:%.*]] = mul i64 [[TMP5]], 8
				; CHECK-NEXT: [[TMP7:%.*]] = add i64 [[TMP6]], 0
				; CHECK-NEXT: [[TMP8:%.*]] = mul i64 [[TMP7]], 1
				; CHECK-NEXT: [[TMP9:%.*]] = add i64 [[INDEX]], [[TMP8]]
				; CHECK-NEXT: [[TMP10:%.]] = getelementptr inbounds i8, ptr [[A:%.]], i64 [[TMP4]]
				; CHECK-NEXT: [[TMP11:%.*]] = getelementptr inbounds i8, ptr [[A]], i64 [[TMP9]]
				; CHECK-NEXT: [[TMP12:%.*]] = getelementptr inbounds i8, ptr [[TMP10]], i32 0
				; CHECK-NEXT: [[WIDE_LOAD:%.*]] = load <vscale x 8 x i8>, ptr [[TMP12]], align 1
				; CHECK-NEXT: [[TMP13:%.*]] = call i32 @llvm.vscale.i32()
				; CHECK-NEXT: [[TMP14:%.*]] = mul i32 [[TMP13]], 8
				; CHECK-NEXT: [[TMP15:%.*]] = getelementptr inbounds i8, ptr [[TMP10]], i32 [[TMP14]]
				; CHECK-NEXT: [[WIDE_LOAD1:%.*]] = load <vscale x 8 x i8>, ptr [[TMP15]], align 1
				; CHECK-NEXT: [[TMP16:%.*]] = icmp ne <vscale x 8 x i8> [[WIDE_LOAD]], shufflevector (<vscale x 8 x i8> insertelement (<vscale x 8 x i8> poison, i8 -128, i32 0), <vscale x 8 x i8> poison, <vscale x 8 x i32> zeroinitializer)
				; CHECK-NEXT: [[TMP17:%.*]] = icmp ne <vscale x 8 x i8> [[WIDE_LOAD1]], shufflevector (<vscale x 8 x i8> insertelement (<vscale x 8 x i8> poison, i8 -128, i32 0), <vscale x 8 x i8> poison, <vscale x 8 x i32> zeroinitializer)
				; CHECK-NEXT: [[TMP18:%.*]] = select <vscale x 8 x i1> [[TMP16]], <vscale x 8 x i8> shufflevector (<vscale x 8 x i8> insertelement (<vscale x 8 x i8> poison, i8 -1, i32 0), <vscale x 8 x i8> poison, <vscale x 8 x i32> zeroinitializer), <vscale x 8 x i8> shufflevector (<vscale x 8 x i8> insertelement (<vscale x 8 x i8> poison, i8 1, i32 0), <vscale x 8 x i8> poison, <vscale x 8 x i32> zeroinitializer)
				; CHECK-NEXT: [[TMP19:%.*]] = select <vscale x 8 x i1> [[TMP17]], <vscale x 8 x i8> shufflevector (<vscale x 8 x i8> insertelement (<vscale x 8 x i8> poison, i8 -1, i32 0), <vscale x 8 x i8> poison, <vscale x 8 x i32> zeroinitializer), <vscale x 8 x i8> shufflevector (<vscale x 8 x i8> insertelement (<vscale x 8 x i8> poison, i8 1, i32 0), <vscale x 8 x i8> poison, <vscale x 8 x i32> zeroinitializer)
				; CHECK-NEXT: [[TMP20:%.*]] = sdiv <vscale x 8 x i8> [[WIDE_LOAD]], [[TMP18]]
				; CHECK-NEXT: [[TMP21:%.*]] = sdiv <vscale x 8 x i8> [[WIDE_LOAD1]], [[TMP19]]
				; CHECK-NEXT: [[TMP22:%.*]] = xor <vscale x 8 x i1> [[TMP16]], shufflevector (<vscale x 8 x i1> insertelement (<vscale x 8 x i1> poison, i1 true, i32 0), <vscale x 8 x i1> poison, <vscale x 8 x i32> zeroinitializer)
				; CHECK-NEXT: [[TMP23:%.*]] = xor <vscale x 8 x i1> [[TMP17]], shufflevector (<vscale x 8 x i1> insertelement (<vscale x 8 x i1> poison, i1 true, i32 0), <vscale x 8 x i1> poison, <vscale x 8 x i32> zeroinitializer)
				; CHECK-NEXT: [[PREDPHI:%.*]] = select <vscale x 8 x i1> [[TMP16]], <vscale x 8 x i8> [[TMP20]], <vscale x 8 x i8> [[WIDE_LOAD]]
				; CHECK-NEXT: [[PREDPHI2:%.*]] = select <vscale x 8 x i1> [[TMP17]], <vscale x 8 x i8> [[TMP21]], <vscale x 8 x i8> [[WIDE_LOAD1]]
				; CHECK-NEXT: store <vscale x 8 x i8> [[PREDPHI]], ptr [[TMP12]], align 1
				; CHECK-NEXT: [[TMP24:%.*]] = call i32 @llvm.vscale.i32()
				; CHECK-NEXT: [[TMP25:%.*]] = mul i32 [[TMP24]], 8
				; CHECK-NEXT: [[TMP26:%.*]] = getelementptr inbounds i8, ptr [[TMP10]], i32 [[TMP25]]
				; CHECK-NEXT: store <vscale x 8 x i8> [[PREDPHI2]], ptr [[TMP26]], align 1
				; CHECK-NEXT: [[TMP27:%.*]] = call i64 @llvm.vscale.i64()
				; CHECK-NEXT: [[TMP28:%.*]] = mul i64 [[TMP27]], 16
				; CHECK-NEXT: [[INDEX_NEXT]] = add nuw i64 [[INDEX]], [[TMP28]]
				; CHECK-NEXT: [[TMP29:%.*]] = icmp eq i64 [[INDEX_NEXT]], [[N_VEC]]
				; CHECK-NEXT: br i1 [[TMP29]], label [[MIDDLE_BLOCK:%.*]], label [[VECTOR_BODY]], !llvm.loop [[LOOP18:![0-9]+]]
				; CHECK: middle.block:
				; CHECK-NEXT: [[CMP_N:%.*]] = icmp eq i64 1024, [[N_VEC]]
				; CHECK-NEXT: br i1 [[CMP_N]], label [[FOR_END:%.*]], label [[SCALAR_PH]]
				; CHECK: scalar.ph:
				; CHECK-NEXT: [[BC_RESUME_VAL:%.]] = phi i64 [ [[N_VEC]], [[MIDDLE_BLOCK]] ], [ 0, [[ENTRY:%.]] ]
	; CHECK-NEXT: br label [[FOR_BODY:%.*]]			; CHECK-NEXT: br label [[FOR_BODY:%.*]]
	; CHECK: for.body:			; CHECK: for.body:
	; CHECK-NEXT: [[IV:%.]] = phi i64 [ 0, [[ENTRY:%.]] ], [ [[IV_NEXT:%.]], [[LATCH:%.]] ]			; CHECK-NEXT: [[IV:%.]] = phi i64 [ [[BC_RESUME_VAL]], [[SCALAR_PH]] ], [ [[IV_NEXT:%.]], [[LATCH:%.*]] ]
	; CHECK-NEXT: [[ARRAYIDX:%.]] = getelementptr inbounds i8, ptr [[A:%.]], i64 [[IV]]			; CHECK-NEXT: [[ARRAYIDX:%.*]] = getelementptr inbounds i8, ptr [[A]], i64 [[IV]]
	; CHECK-NEXT: [[ELEM:%.*]] = load i8, ptr [[ARRAYIDX]], align 1			; CHECK-NEXT: [[ELEM:%.*]] = load i8, ptr [[ARRAYIDX]], align 1
	; CHECK-NEXT: [[C:%.*]] = icmp ne i8 [[ELEM]], -128			; CHECK-NEXT: [[C:%.*]] = icmp ne i8 [[ELEM]], -128
	; CHECK-NEXT: br i1 [[C]], label [[DO_OP:%.*]], label [[LATCH]]			; CHECK-NEXT: br i1 [[C]], label [[DO_OP:%.*]], label [[LATCH]]
	; CHECK: do_op:			; CHECK: do_op:
	; CHECK-NEXT: [[DIVREM:%.*]] = sdiv i8 [[ELEM]], -1			; CHECK-NEXT: [[DIVREM:%.*]] = sdiv i8 [[ELEM]], -1
	; CHECK-NEXT: br label [[LATCH]]			; CHECK-NEXT: br label [[LATCH]]
	; CHECK: latch:			; CHECK: latch:
	; CHECK-NEXT: [[PHI:%.*]] = phi i8 [ [[ELEM]], [[FOR_BODY]] ], [ [[DIVREM]], [[DO_OP]] ]			; CHECK-NEXT: [[PHI:%.*]] = phi i8 [ [[ELEM]], [[FOR_BODY]] ], [ [[DIVREM]], [[DO_OP]] ]
	; CHECK-NEXT: store i8 [[PHI]], ptr [[ARRAYIDX]], align 1			; CHECK-NEXT: store i8 [[PHI]], ptr [[ARRAYIDX]], align 1
	; CHECK-NEXT: [[IV_NEXT]] = add nuw nsw i64 [[IV]], 1			; CHECK-NEXT: [[IV_NEXT]] = add nuw nsw i64 [[IV]], 1
	; CHECK-NEXT: [[EXITCOND_NOT:%.*]] = icmp eq i64 [[IV_NEXT]], 1024			; CHECK-NEXT: [[EXITCOND_NOT:%.*]] = icmp eq i64 [[IV_NEXT]], 1024
	; CHECK-NEXT: br i1 [[EXITCOND_NOT]], label [[FOR_END:%.*]], label [[FOR_BODY]]			; CHECK-NEXT: br i1 [[EXITCOND_NOT]], label [[FOR_END]], label [[FOR_BODY]], !llvm.loop [[LOOP19:![0-9]+]]
	; CHECK: for.end:			; CHECK: for.end:
	; CHECK-NEXT: ret void			; CHECK-NEXT: ret void
	;			;
	entry:			entry:
	br label %for.body			br label %for.body

	for.body:			for.body:
	%iv = phi i64 [ 0, %entry ], [ %iv.next, %latch ]			%iv = phi i64 [ 0, %entry ], [ %iv.next, %latch ]
	Show All 17 Lines