This is an archive of the discontinued LLVM Phabricator instance.

Paths

Table of Contentst

-
llvm/
-
include/llvm/IR/
-
llvm/
-
IR/
-
IRBuilder.h
-
lib/
-
IR/
-
IRBuilder.cpp
-
Transforms/Vectorize/
-
Vectorize/
-
LoopVectorize.cpp
-
test/Transforms/LoopVectorize/
-
Transforms/
-
LoopVectorize/
-
scalable-loop-unpredicated-body-scalar-tail.ll

Differential D90343

[POC][LoopVectorizer] Vectorize a simple loop with a scalable VF.
AbandonedPublic

Authored by sdesmalen on Oct 28 2020, 2:09 PM.

Download Raw Diff

Details

Reviewers: None

Summary

This patch is part of a proof of concept for vectorising a loop using
scalable vectors. The patch is shared for reference and there is no
expectation for this patch to land in the current form.

Removed a bunch of asserts that were previously added to prevent vectorization for scalable VFs.
Steps are scaled by vscale, a runtime value.
Changes to circumvent the cost-model for now (temporary) so that the cost-model can be implemented separately.

This vectorizes:

void loop(int N, double *a, double *b) {
  #pragma clang loop vectorize_width(4, scalable)
  for (int i = 0; i < N; i++) {
    a[i] = b[i] + 1.0;
  }   
}

Diff Detail

Event Timeline

sdesmalen created this revision.Oct 28 2020, 2:09 PM

Herald added a project: Restricted Project. · View Herald TranscriptOct 28 2020, 2:09 PM

Herald added subscribers: dexonsmith, dmgreen, hiraditya. · View Herald Transcript

sdesmalen requested review of this revision.Oct 28 2020, 2:09 PM

sdesmalen added a parent revision: D90342: [POC][LoopVectorizer] Propagate ElementCount to interfaces in preparation for scalable auto-vec. .Oct 28 2020, 2:12 PM

sdesmalen added a child revision: D90344: [POC][LoopVectorizer] Allow invariant loads/stores using masked gather/scatter for a scalable VF..

Harbormaster completed remote builds in B76810: Diff 301423.Oct 28 2020, 2:37 PM

steleman added a subscriber: steleman.Oct 28 2020, 4:40 PM

khchen added a subscriber: khchen.Oct 28 2020, 5:43 PM

This isn't as many changes as I was expecting. I had expected to need a lot of legality changes to make sure scalable vectorization was going to be correct too.

dancgr added a subscriber: dancgr.Nov 3 2020, 9:56 AM

sdesmalen mentioned this in D88962: [SVE] Add support for scalable vectors with vectorize.scalable.enable loop attribute.Nov 5 2020, 8:59 AM

This POC has been split up into separate patches: D91059, D91060 and D91077.

In D90343#2371495, @dmgreen wrote:

This isn't as many changes as I was expecting. I had expected to need a lot of legality changes to make sure scalable vectorization was going to be correct too.

Sorry, I missed this comment earlier.

There are indeed not that many changes required for scalable vectors as you might expect to achieve some initial auto-vectorization. Some of the mechanisms are already able to cope with scalable vectors, such as reductions, for which we introduced the intrinsics a couple years ago because they can't be handled in a scalar reduction loop. For legalisation the most critical part that needs changing is the selection of the VF when there is a data-dependence. For scalable vectors, the maximum vector width must somehow take vscale into account, which must be sufficiently large/conservative for the vectorizer to guarantee that a loop with dependence distance of N bytes can be safely vectorized. In the absence of extra information provided by the user that tells about the min/max vector-width of a scalable vector, we can benefit from an architectural maximum vector length for AArch64 SVE/SVE2.

kawashima-fj added a subscriber: kawashima-fj.Nov 13 2020, 7:50 AM

This patch has been superseded by D91077

Revision Contents

Path

Size

llvm/

include/

llvm/

IR/

IRBuilder.h

4 lines

lib/

IR/

IRBuilder.cpp

11 lines

Transforms/

Vectorize/

LoopVectorize.cpp

87 lines

test/

Transforms/

LoopVectorize/

scalable-loop-unpredicated-body-scalar-tail.ll

65 lines

Diff 301423

llvm/include/llvm/IR/IRBuilder.h

Show First 20 Lines • Show All 855 Lines • ▼ Show 20 Lines	public:
/// Create a call to the experimental.gc.relocate intrinsics to		/// Create a call to the experimental.gc.relocate intrinsics to
/// project the relocated value of one pointer from the statepoint.		/// project the relocated value of one pointer from the statepoint.
CallInst CreateGCRelocate(Instruction Statepoint,		CallInst CreateGCRelocate(Instruction Statepoint,
int BaseOffset,		int BaseOffset,
int DerivedOffset,		int DerivedOffset,
Type *ResultType,		Type *ResultType,
const Twine &Name = "");		const Twine &Name = "");

		/// Create a call to llvm.vscale, multiplied by \p Scaling. The type of VScale
		/// will be the same type as that of \p Scaling.
		Value CreateVScale(Constant Scaling, const Twine &Name = "");

/// Create a call to intrinsic \p ID with 1 operand which is mangled on its		/// Create a call to intrinsic \p ID with 1 operand which is mangled on its
/// type.		/// type.
CallInst CreateUnaryIntrinsic(Intrinsic::ID ID, Value V,		CallInst CreateUnaryIntrinsic(Intrinsic::ID ID, Value V,
Instruction *FMFSource = nullptr,		Instruction *FMFSource = nullptr,
const Twine &Name = "");		const Twine &Name = "");

/// Create a call to intrinsic \p ID with 2 operands which is mangled on the		/// Create a call to intrinsic \p ID with 2 operands which is mangled on the
/// first type.		/// first type.
▲ Show 20 Lines • Show All 1,769 Lines • Show Last 20 Lines

llvm/lib/IR/IRBuilder.cpp

Show First 20 Lines • Show All 74 Lines • ▼ Show 20 Lines	static CallInst createCallHelper(Function Callee, ArrayRef<Value *> Ops,
Instruction *FMFSource = nullptr,		Instruction *FMFSource = nullptr,
ArrayRef<OperandBundleDef> OpBundles = {}) {		ArrayRef<OperandBundleDef> OpBundles = {}) {
CallInst *CI = Builder->CreateCall(Callee, Ops, OpBundles, Name);		CallInst *CI = Builder->CreateCall(Callee, Ops, OpBundles, Name);
if (FMFSource)		if (FMFSource)
CI->copyFastMathFlags(FMFSource);		CI->copyFastMathFlags(FMFSource);
return CI;		return CI;
}		}

		Value IRBuilderBase::CreateVScale(Constant Scaling, const Twine &Name) {
		Module *M = GetInsertBlock()->getParent()->getParent();
		assert (isa<ConstantInt>(Scaling) && "Expected constant integer");
		Function *TheFn =
		Intrinsic::getDeclaration(M, Intrinsic::vscale, {Scaling->getType()});
		CallInst *CI = createCallHelper(TheFn, {}, this, Name);
		return cast<ConstantInt>(Scaling)->getSExtValue() == 1
		? CI
		: CreateMul(CI, Scaling);
		}

CallInst IRBuilderBase::CreateMemSet(Value Ptr, Value Val, Value Size,		CallInst IRBuilderBase::CreateMemSet(Value Ptr, Value Val, Value Size,
MaybeAlign Align, bool isVolatile,		MaybeAlign Align, bool isVolatile,
MDNode TBAATag, MDNode ScopeTag,		MDNode TBAATag, MDNode ScopeTag,
MDNode *NoAliasTag) {		MDNode *NoAliasTag) {
Ptr = getCastedInt8PtrValue(Ptr);		Ptr = getCastedInt8PtrValue(Ptr);
Value *Ops[] = {Ptr, Val, Size, getInt1(isVolatile)};		Value *Ops[] = {Ptr, Val, Size, getInt1(isVolatile)};
Type *Tys[] = { Ptr->getType(), Size->getType() };		Type *Tys[] = { Ptr->getType(), Size->getType() };
Module *M = BB->getParent()->getParent();		Module *M = BB->getParent()->getParent();
▲ Show 20 Lines • Show All 1,057 Lines • Show Last 20 Lines

llvm/lib/Transforms/Vectorize/LoopVectorize.cpp

This file is larger than 256 KB, so syntax highlighting is disabled by default.

Show First 20 Lines • Show All 342 Lines • ▼ Show 20 Lines	if (auto *LI = dyn_cast<LoadInst>(I))
return LI->getType();		return LI->getType();
return cast<StoreInst>(I)->getValueOperand()->getType();		return cast<StoreInst>(I)->getValueOperand()->getType();
}		}

/// A helper function that returns true if the given type is irregular. The		/// A helper function that returns true if the given type is irregular. The
/// type is irregular if its allocated size doesn't equal the store size of an		/// type is irregular if its allocated size doesn't equal the store size of an
/// element of the corresponding vector type at the given vectorization factor.		/// element of the corresponding vector type at the given vectorization factor.
static bool hasIrregularType(Type *Ty, const DataLayout &DL, ElementCount VF) {		static bool hasIrregularType(Type *Ty, const DataLayout &DL, ElementCount VF) {
assert(!VF.isScalable() && "scalable vectors not yet supported.");
// Determine if an array of VF elements of type Ty is "bitcast compatible"		// Determine if an array of VF elements of type Ty is "bitcast compatible"
// with a <VF x Ty> vector.		// with a <VF x Ty> vector.
if (VF.isVector()) {		if (VF.isVector()) {
auto *VectorTy = VectorType::get(Ty, VF);		auto *VectorTy = VectorType::get(Ty, VF);
return TypeSize::get(VF.getKnownMinValue() *		return TypeSize::get(VF.getKnownMinValue() *
DL.getTypeAllocSize(Ty).getFixedValue(),		DL.getTypeAllocSize(Ty).getFixedValue(),
VF.isScalable()) != DL.getTypeStoreSize(VectorTy);		VF.isScalable()) != DL.getTypeStoreSize(VectorTy);
}		}
▲ Show 20 Lines • Show All 598 Lines • ▼ Show 20 Lines	if (I->getDebugLoc())
DL = I->getDebugLoc();		DL = I->getDebugLoc();
}		}

OptimizationRemarkAnalysis R(PassName, RemarkName, DL, CodeRegion);		OptimizationRemarkAnalysis R(PassName, RemarkName, DL, CodeRegion);
R << "loop not vectorized: ";		R << "loop not vectorized: ";
return R;		return R;
}		}

		/// Return a value for Step multiplied by VF.
		static Value createStepForVF(IRBuilder<> &B, Constant Step, ElementCount VF) {
		assert(isa<ConstantInt>(Step) && "Expected an integer step");
		Constant *StepVal = ConstantInt::get(
		Step->getType(),
		cast<ConstantInt>(Step)->getSExtValue() * VF.getKnownMinValue());
		return VF.isScalable() ? B.CreateVScale(StepVal) : StepVal;
		}

namespace llvm {		namespace llvm {

void reportVectorizationFailure(const StringRef DebugMsg,		void reportVectorizationFailure(const StringRef DebugMsg,
const StringRef OREMsg, const StringRef ORETag,		const StringRef OREMsg, const StringRef ORETag,
OptimizationRemarkEmitter ORE, Loop TheLoop, Instruction *I) {		OptimizationRemarkEmitter ORE, Loop TheLoop, Instruction *I) {
LLVM_DEBUG(debugVectorizationFailure(DebugMsg, I));		LLVM_DEBUG(debugVectorizationFailure(DebugMsg, I));
LoopVectorizeHints Hints(TheLoop, true /* doesn't matter /, ORE);		LoopVectorizeHints Hints(TheLoop, true /* doesn't matter /, ORE);
ORE->emit(createLVAnalysis(Hints.vectorizeAnalysisPassName(),		ORE->emit(createLVAnalysis(Hints.vectorizeAnalysisPassName(),
▲ Show 20 Lines • Show All 246 Lines • ▼ Show 20 Lines	for (unsigned i = 0; i < Grp->getFactor(); ++i) {
}		}
}		}
}		}

/// Return the cost model decision for the given instruction \p I and vector		/// Return the cost model decision for the given instruction \p I and vector
/// width \p VF. Return CM_Unknown if this instruction did not pass		/// width \p VF. Return CM_Unknown if this instruction did not pass
/// through the cost modeling.		/// through the cost modeling.
InstWidening getWideningDecision(Instruction *I, ElementCount VF) {		InstWidening getWideningDecision(Instruction *I, ElementCount VF) {
assert(!VF.isScalable() && "scalable vectors not yet supported.");		assert(VF.isVector() && "Expected VF to be a vector VF");
assert(VF.isVector() && "Expected VF >=2");

// Cost model is not run in the VPlan-native path - return conservative		// Cost model is not run in the VPlan-native path - return conservative
// result until this changes.		// result until this changes.
if (EnableVPlanNativePath)		if (EnableVPlanNativePath)
return CM_GatherScatter;		return CM_GatherScatter;

std::pair<Instruction *, ElementCount> InstOnVF = std::make_pair(I, VF);		std::pair<Instruction *, ElementCount> InstOnVF = std::make_pair(I, VF);
auto Itr = WideningDecisions.find(InstOnVF);		auto Itr = WideningDecisions.find(InstOnVF);
if (Itr == WideningDecisions.end())		if (Itr == WideningDecisions.end())
▲ Show 20 Lines • Show All 860 Lines • ▼ Show 20 Lines	Value InnerLoopVectorizer::getStepVector(Value Val, int StartIdx, Value *Step,
return BOp;		return BOp;
}		}

void InnerLoopVectorizer::buildScalarSteps(Value ScalarIV, Value Step,		void InnerLoopVectorizer::buildScalarSteps(Value ScalarIV, Value Step,
Instruction *EntryVal,		Instruction *EntryVal,
const InductionDescriptor &ID) {		const InductionDescriptor &ID) {
// We shouldn't have to build scalar steps if we aren't vectorizing.		// We shouldn't have to build scalar steps if we aren't vectorizing.
assert(VF.isVector() && "VF should be greater than one");		assert(VF.isVector() && "VF should be greater than one");
assert(!VF.isScalable() &&
"the code below assumes a fixed number of elements at compile time");
// Get the value type and ensure it and the step have the same integer type.		// Get the value type and ensure it and the step have the same integer type.
Type *ScalarIVTy = ScalarIV->getType()->getScalarType();		Type *ScalarIVTy = ScalarIV->getType()->getScalarType();
assert(ScalarIVTy == Step->getType() &&		assert(ScalarIVTy == Step->getType() &&
"Val and Step should have the same type");		"Val and Step should have the same type");

// We build scalar steps for both integer and floating-point induction		// We build scalar steps for both integer and floating-point induction
// variables. Here, we determine the kind of arithmetic we will perform.		// variables. Here, we determine the kind of arithmetic we will perform.
Instruction::BinaryOps AddOp;		Instruction::BinaryOps AddOp;
Instruction::BinaryOps MulOp;		Instruction::BinaryOps MulOp;
if (ScalarIVTy->isIntegerTy()) {		if (ScalarIVTy->isIntegerTy()) {
AddOp = Instruction::Add;		AddOp = Instruction::Add;
MulOp = Instruction::Mul;		MulOp = Instruction::Mul;
} else {		} else {
AddOp = ID.getInductionOpcode();		AddOp = ID.getInductionOpcode();
MulOp = Instruction::FMul;		MulOp = Instruction::FMul;
}		}

// Determine the number of scalars we need to generate for each unroll		// Determine the number of scalars we need to generate for each unroll
// iteration. If EntryVal is uniform, we only need to generate the first		// iteration. If EntryVal is uniform, we only need to generate the first
// lane. Otherwise, we generate all VF values.		// lane. Otherwise, we generate all VF values.
unsigned Lanes =		unsigned Lanes =
Cost->isUniformAfterVectorization(cast<Instruction>(EntryVal), VF)		Cost->isUniformAfterVectorization(cast<Instruction>(EntryVal), VF)
? 1		? 1
: VF.getKnownMinValue();		: VF.getKnownMinValue();
		assert((!VF.isScalable() \|\| Lanes == 1) &&
		"Should never scalarize a scalable vector");
// Compute the scalar steps and save the results in VectorLoopValueMap.		// Compute the scalar steps and save the results in VectorLoopValueMap.
for (unsigned Part = 0; Part < UF; ++Part) {		for (unsigned Part = 0; Part < UF; ++Part) {
for (unsigned Lane = 0; Lane < Lanes; ++Lane) {		for (unsigned Lane = 0; Lane < Lanes; ++Lane) {
auto *StartIdx = getSignedIntOrFpConstant(		auto *StartIdx = getSignedIntOrFpConstant(
ScalarIVTy, VF.getKnownMinValue() * Part + Lane);		ScalarIVTy, VF.getKnownMinValue() * Part + Lane);
auto *Mul = addFastMathFlag(Builder.CreateBinOp(MulOp, StartIdx, Step));		auto *Mul = addFastMathFlag(Builder.CreateBinOp(MulOp, StartIdx, Step));
auto *Add = addFastMathFlag(Builder.CreateBinOp(AddOp, ScalarIV, Mul));		auto *Add = addFastMathFlag(Builder.CreateBinOp(AddOp, ScalarIV, Mul));
VectorLoopValueMap.setScalarValue(EntryVal, {Part, Lane}, Add);		VectorLoopValueMap.setScalarValue(EntryVal, {Part, Lane}, Add);
Show All 31 Lines	if (VF.isScalar()) {
VectorLoopValueMap.setVectorValue(V, Part, ScalarValue);		VectorLoopValueMap.setVectorValue(V, Part, ScalarValue);
return ScalarValue;		return ScalarValue;
}		}

// Get the last scalar instruction we generated for V and Part. If the value		// Get the last scalar instruction we generated for V and Part. If the value
// is known to be uniform after vectorization, this corresponds to lane zero		// is known to be uniform after vectorization, this corresponds to lane zero
// of the Part unroll iteration. Otherwise, the last instruction is the one		// of the Part unroll iteration. Otherwise, the last instruction is the one
// we created for the last vector lane of the Part unroll iteration.		// we created for the last vector lane of the Part unroll iteration.
assert(!VF.isScalable() && "scalable vectors not yet supported.");
unsigned LastLane = Cost->isUniformAfterVectorization(I, VF)		unsigned LastLane = Cost->isUniformAfterVectorization(I, VF)
? 0		? 0
: VF.getKnownMinValue() - 1;		: VF.getKnownMinValue() - 1;
		assert((!VF.isScalable() \|\| LastLane == 0) &&
		"Scalable vectorization can't lead to any scalarized values.");
auto *LastInst = cast<Instruction>(		auto *LastInst = cast<Instruction>(
VectorLoopValueMap.getScalarValue(V, {Part, LastLane}));		VectorLoopValueMap.getScalarValue(V, {Part, LastLane}));

// Set the insert point after the last scalarized instruction. This ensures		// Set the insert point after the last scalarized instruction. This ensures
// the insertelement sequence will directly follow the scalar definitions.		// the insertelement sequence will directly follow the scalar definitions.
auto OldIP = Builder.saveIP();		auto OldIP = Builder.saveIP();
auto NewIP = std::next(BasicBlock::iterator(LastInst));		auto NewIP = std::next(BasicBlock::iterator(LastInst));
Builder.SetInsertPoint(&*NewIP);		Builder.SetInsertPoint(&*NewIP);
▲ Show 20 Lines • Show All 328 Lines • ▼ Show 20 Lines	LoopVectorizationCostModel::InstWidening Decision =
Cost->getWideningDecision(Instr, VF);		Cost->getWideningDecision(Instr, VF);
assert((Decision == LoopVectorizationCostModel::CM_Widen \|\|		assert((Decision == LoopVectorizationCostModel::CM_Widen \|\|
Decision == LoopVectorizationCostModel::CM_Widen_Reverse \|\|		Decision == LoopVectorizationCostModel::CM_Widen_Reverse \|\|
Decision == LoopVectorizationCostModel::CM_GatherScatter) &&		Decision == LoopVectorizationCostModel::CM_GatherScatter) &&
"CM decision is not to widen the memory instruction");		"CM decision is not to widen the memory instruction");

Type *ScalarDataTy = getMemInstValueType(Instr);		Type *ScalarDataTy = getMemInstValueType(Instr);

assert(!VF.isScalable() && "scalable vectors not yet supported.");
auto *DataTy = VectorType::get(ScalarDataTy, VF);		auto *DataTy = VectorType::get(ScalarDataTy, VF);
const Align Alignment = getLoadStoreAlignment(Instr);		const Align Alignment = getLoadStoreAlignment(Instr);

// Determine if the pointer operand of the access is either consecutive or		// Determine if the pointer operand of the access is either consecutive or
// reverse consecutive.		// reverse consecutive.
bool Reverse = (Decision == LoopVectorizationCostModel::CM_Widen_Reverse);		bool Reverse = (Decision == LoopVectorizationCostModel::CM_Widen_Reverse);
bool ConsecutiveStride =		bool ConsecutiveStride =
Reverse \|\| (Decision == LoopVectorizationCostModel::CM_Widen);		Reverse \|\| (Decision == LoopVectorizationCostModel::CM_Widen);
Show All 16 Lines	const auto CreateVecPtr = [&](unsigned Part, Value Ptr) -> Value {
// Calculate the pointer for the specific unroll-part.		// Calculate the pointer for the specific unroll-part.
GetElementPtrInst *PartPtr = nullptr;		GetElementPtrInst *PartPtr = nullptr;

bool InBounds = false;		bool InBounds = false;
if (auto *gep = dyn_cast<GetElementPtrInst>(Ptr->stripPointerCasts()))		if (auto *gep = dyn_cast<GetElementPtrInst>(Ptr->stripPointerCasts()))
InBounds = gep->isInBounds();		InBounds = gep->isInBounds();

if (Reverse) {		if (Reverse) {
		Value *Increment = createStepForVF(Builder, Builder.getInt32(-Part), VF);

// If the address is consecutive but reversed, then the		// If the address is consecutive but reversed, then the
// wide store needs to start at the last vector element.		// wide store needs to start at the last vector element.
PartPtr = cast<GetElementPtrInst>(Builder.CreateGEP(		PartPtr = cast<GetElementPtrInst>(Builder.CreateGEP(
ScalarDataTy, Ptr, Builder.getInt32(-Part * VF.getKnownMinValue())));		ScalarDataTy, Ptr, Increment));
PartPtr->setIsInBounds(InBounds);		PartPtr->setIsInBounds(InBounds);
		Value *Offset =
		Builder.CreateSub(Builder.getInt32(1),
		createStepForVF(Builder, Builder.getInt32(1), VF));
PartPtr = cast<GetElementPtrInst>(Builder.CreateGEP(		PartPtr = cast<GetElementPtrInst>(Builder.CreateGEP(
ScalarDataTy, PartPtr, Builder.getInt32(1 - VF.getKnownMinValue())));		ScalarDataTy, PartPtr, Offset));
PartPtr->setIsInBounds(InBounds);		PartPtr->setIsInBounds(InBounds);
if (isMaskRequired) // Reverse of a null all-one mask is a null mask.		if (isMaskRequired) // Reverse of a null all-one mask is a null mask.
BlockInMaskParts[Part] = reverseVector(BlockInMaskParts[Part]);		BlockInMaskParts[Part] = reverseVector(BlockInMaskParts[Part]);
} else {		} else {
PartPtr = cast<GetElementPtrInst>(Builder.CreateGEP(		Value *Increment = createStepForVF(Builder, Builder.getInt32(Part), VF);
ScalarDataTy, Ptr, Builder.getInt32(Part * VF.getKnownMinValue())));		PartPtr = cast<GetElementPtrInst>(
		Builder.CreateGEP(ScalarDataTy, Ptr, Increment));
PartPtr->setIsInBounds(InBounds);		PartPtr->setIsInBounds(InBounds);
}		}

unsigned AddressSpace = Ptr->getType()->getPointerAddressSpace();		unsigned AddressSpace = Ptr->getType()->getPointerAddressSpace();
return Builder.CreateBitCast(PartPtr, DataTy->getPointerTo(AddressSpace));		return Builder.CreateBitCast(PartPtr, DataTy->getPointerTo(AddressSpace));
};		};

// Handle Stores:		// Handle Stores:
▲ Show 20 Lines • Show All 179 Lines • ▼ Show 20 Lines
Value InnerLoopVectorizer::getOrCreateVectorTripCount(Loop L) {		Value InnerLoopVectorizer::getOrCreateVectorTripCount(Loop L) {
if (VectorTripCount)		if (VectorTripCount)
return VectorTripCount;		return VectorTripCount;

Value *TC = getOrCreateTripCount(L);		Value *TC = getOrCreateTripCount(L);
IRBuilder<> Builder(L->getLoopPreheader()->getTerminator());		IRBuilder<> Builder(L->getLoopPreheader()->getTerminator());

Type *Ty = TC->getType();		Type *Ty = TC->getType();

// This is where we can make the step a runtime constant.		// This is where we can make the step a runtime constant.
assert(!VF.isScalable() && "scalable vectorization is not supported yet");		Value *Step = createStepForVF(Builder, ConstantInt::get(Ty, UF), VF);
Constant Step = ConstantInt::get(Ty, VF.getKnownMinValue() UF);

// If the tail is to be folded by masking, round the number of iterations N		// If the tail is to be folded by masking, round the number of iterations N
// up to a multiple of Step instead of rounding down. This is done by first		// up to a multiple of Step instead of rounding down. This is done by first
// adding Step-1 and then rounding down. Note that it's ok if this addition		// adding Step-1 and then rounding down. Note that it's ok if this addition
// overflows: the vector induction variable will eventually wrap to zero given		// overflows: the vector induction variable will eventually wrap to zero given
// that it starts at zero and its Step is a power of two; the loop will then		// that it starts at zero and its Step is a power of two; the loop will then
// exit, with the last early-exit vector comparison also producing all-true.		// exit, with the last early-exit vector comparison also producing all-true.
if (Cost->foldTailByMasking()) {		if (Cost->foldTailByMasking()) {
assert(isPowerOf2_32(VF.getKnownMinValue() * UF) &&		assert(isPowerOf2_32(VF.getKnownMinValue() * UF) &&
"VF*UF must be a power of 2 when folding tail by masking");		"VF*UF must be a power of 2 when folding tail by masking");
		assert(!VF.isScalable() &&
		"Tail folding not yet supported for scalable vectors");
TC = Builder.CreateAdd(		TC = Builder.CreateAdd(
TC, ConstantInt::get(Ty, VF.getKnownMinValue() * UF - 1), "n.rnd.up");		TC, ConstantInt::get(Ty, VF.getKnownMinValue() * UF - 1), "n.rnd.up");
}		}

// Now we need to generate the expression for the part of the loop that the		// Now we need to generate the expression for the part of the loop that the
// vectorized body will execute. This is equal to N - (N % Step) if scalar		// vectorized body will execute. This is equal to N - (N % Step) if scalar
// iterations are not required for correctness, or N - Step, otherwise. Step		// iterations are not required for correctness, or N - Step, otherwise. Step
// is equal to the vectorization factor (number of SIMD elements) times the		// is equal to the vectorization factor (number of SIMD elements) times the
▲ Show 20 Lines • Show All 62 Lines • ▼ Show 20 Lines	void InnerLoopVectorizer::emitMinimumIterationCountCheck(Loop *L,
// to the backedge-taken count overflowed leading to an incorrect trip count		// to the backedge-taken count overflowed leading to an incorrect trip count
// of zero. In this case we will also jump to the scalar loop.		// of zero. In this case we will also jump to the scalar loop.
auto P = Cost->requiresScalarEpilogue() ? ICmpInst::ICMP_ULE		auto P = Cost->requiresScalarEpilogue() ? ICmpInst::ICMP_ULE
: ICmpInst::ICMP_ULT;		: ICmpInst::ICMP_ULT;

// If tail is to be folded, vector loop takes care of all iterations.		// If tail is to be folded, vector loop takes care of all iterations.
Value *CheckMinIters = Builder.getFalse();		Value *CheckMinIters = Builder.getFalse();
if (!Cost->foldTailByMasking()) {		if (!Cost->foldTailByMasking()) {
assert(!VF.isScalable() && "scalable vectors not yet supported.");		Value *Step =
CheckMinIters = Builder.CreateICmp(		createStepForVF(Builder, ConstantInt::get(Count->getType(), UF), VF);
P, Count,		CheckMinIters = Builder.CreateICmp(P, Count, Step, "min.iters.check");
ConstantInt::get(Count->getType(), VF.getKnownMinValue() * UF),
"min.iters.check");
}		}
// Create new preheader for vector loop.		// Create new preheader for vector loop.
LoopVectorPreHeader =		LoopVectorPreHeader =
SplitBlock(TCCheckBlock, TCCheckBlock->getTerminator(), DT, LI, nullptr,		SplitBlock(TCCheckBlock, TCCheckBlock->getTerminator(), DT, LI, nullptr,
"vector.ph");		"vector.ph");

assert(DT->properlyDominates(DT->getNode(TCCheckBlock),		assert(DT->properlyDominates(DT->getNode(TCCheckBlock),
DT->getNode(Bypass)->getIDom()) &&		DT->getNode(Bypass)->getIDom()) &&
▲ Show 20 Lines • Show All 439 Lines • ▼ Show 20 Lines	BasicBlock *InnerLoopVectorizer::createVectorizedLoopSkeleton() {
// - counts from zero, stepping by one		// - counts from zero, stepping by one
// - is the size of the widest induction variable type		// - is the size of the widest induction variable type
// then we create a new one.		// then we create a new one.
OldInduction = Legal->getPrimaryInduction();		OldInduction = Legal->getPrimaryInduction();
Type *IdxTy = Legal->getWidestInductionType();		Type *IdxTy = Legal->getWidestInductionType();
Value *StartIdx = ConstantInt::get(IdxTy, 0);		Value *StartIdx = ConstantInt::get(IdxTy, 0);
// The loop step is equal to the vectorization factor (num of SIMD elements)		// The loop step is equal to the vectorization factor (num of SIMD elements)
// times the unroll factor (num of SIMD instructions).		// times the unroll factor (num of SIMD instructions).
assert(!VF.isScalable() && "scalable vectors not yet supported.");		Builder.SetInsertPoint(&*Lp->getHeader()->getFirstInsertionPt());
Constant Step = ConstantInt::get(IdxTy, VF.getKnownMinValue() UF);		Value *Step = createStepForVF(Builder, ConstantInt::get(IdxTy, UF), VF);
Value *CountRoundDown = getOrCreateVectorTripCount(Lp);		Value *CountRoundDown = getOrCreateVectorTripCount(Lp);
Induction =		Induction =
createInductionVariable(Lp, StartIdx, CountRoundDown, Step,		createInductionVariable(Lp, StartIdx, CountRoundDown, Step,
getDebugLocFromInstOrOperands(OldInduction));		getDebugLocFromInstOrOperands(OldInduction));

// Emit phis for the new starting index of the scalar loop.		// Emit phis for the new starting index of the scalar loop.
createInductionResumeValues(Lp, CountRoundDown);		createInductionResumeValues(Lp, CountRoundDown);

▲ Show 20 Lines • Show All 363 Lines • ▼ Show 20 Lines	void InnerLoopVectorizer::fixVectorizedLoop() {
// loop iterations are now distributed among them. Note that original loop		// loop iterations are now distributed among them. Note that original loop
// represented by LoopScalarBody becomes remainder loop after vectorization.		// represented by LoopScalarBody becomes remainder loop after vectorization.
//		//
// For cases like foldTailByMasking() and requiresScalarEpiloque() we may		// For cases like foldTailByMasking() and requiresScalarEpiloque() we may
// end up getting slightly roughened result but that should be OK since		// end up getting slightly roughened result but that should be OK since
// profile is not inherently precise anyway. Note also possible bypass of		// profile is not inherently precise anyway. Note also possible bypass of
// vector code caused by legality checks is ignored, assigning all the weight		// vector code caused by legality checks is ignored, assigning all the weight
// to the vector loop, optimistically.		// to the vector loop, optimistically.
assert(!VF.isScalable() &&
"cannot use scalable ElementCount to determine unroll factor");		// TODO: Consider changing this calculation for scalable vectors.
setProfileInfoAfterUnrolling(		setProfileInfoAfterUnrolling(
LI->getLoopFor(LoopScalarBody), LI->getLoopFor(LoopVectorBody),		LI->getLoopFor(LoopScalarBody), LI->getLoopFor(LoopVectorBody),
LI->getLoopFor(LoopScalarBody), VF.getKnownMinValue() * UF);		LI->getLoopFor(LoopScalarBody), VF.getKnownMinValue() * UF);
}		}

void InnerLoopVectorizer::fixCrossIterationPHIs() {		void InnerLoopVectorizer::fixCrossIterationPHIs() {
// In order to support recurrences we need to be able to vectorize Phi nodes.		// In order to support recurrences we need to be able to vectorize Phi nodes.
// Phi nodes have cycles, so we need to vectorize them in two stages. This is		// Phi nodes have cycles, so we need to vectorize them in two stages. This is
▲ Show 20 Lines • Show All 446 Lines • ▼ Show 20 Lines	for (User *U : Cur->users()) {
if ((Cur != LoopExitInstr \|\| OrigLoop->contains(UI->getParent())) &&		if ((Cur != LoopExitInstr \|\| OrigLoop->contains(UI->getParent())) &&
Visited.insert(UI).second)		Visited.insert(UI).second)
Worklist.push_back(UI);		Worklist.push_back(UI);
}		}
}		}
}		}

void InnerLoopVectorizer::fixLCSSAPHIs() {		void InnerLoopVectorizer::fixLCSSAPHIs() {
assert(!VF.isScalable() && "the code below assumes fixed width vectors");
for (PHINode &LCSSAPhi : LoopExitBlock->phis()) {		for (PHINode &LCSSAPhi : LoopExitBlock->phis()) {
if (LCSSAPhi.getNumIncomingValues() == 1) {		if (LCSSAPhi.getNumIncomingValues() == 1) {
auto *IncomingValue = LCSSAPhi.getIncomingValue(0);		auto *IncomingValue = LCSSAPhi.getIncomingValue(0);
// Non-instruction incoming values will have only one value.		// Non-instruction incoming values will have only one value.
unsigned LastLane = 0;		unsigned LastLane = 0;
if (isa<Instruction>(IncomingValue))		if (isa<Instruction>(IncomingValue))
LastLane = Cost->isUniformAfterVectorization(		LastLane = Cost->isUniformAfterVectorization(
cast<Instruction>(IncomingValue), VF)		cast<Instruction>(IncomingValue), VF)
? 0		? 0
: VF.getKnownMinValue() - 1;		: VF.getKnownMinValue() - 1;
		assert((!VF.isScalable() \|\| LastLane == 0) &&
		"scalable vectors dont support non-uniform scalars yet");
// Can be a loop invariant incoming value or the last scalar value to be		// Can be a loop invariant incoming value or the last scalar value to be
// extracted from the vectorized loop.		// extracted from the vectorized loop.
Builder.SetInsertPoint(LoopMiddleBlock->getTerminator());		Builder.SetInsertPoint(LoopMiddleBlock->getTerminator());
Value *lastIncomingValue =		Value *lastIncomingValue =
getOrCreateScalarValue(IncomingValue, { UF - 1, LastLane });		getOrCreateScalarValue(IncomingValue, { UF - 1, LastLane });
LCSSAPhi.addIncoming(lastIncomingValue, LoopMiddleBlock);		LCSSAPhi.addIncoming(lastIncomingValue, LoopMiddleBlock);
}		}
}		}
▲ Show 20 Lines • Show All 315 Lines • ▼ Show 20 Lines	assert((I.getOpcode() == Instruction::UDiv \|\|
"Unexpected instruction");		"Unexpected instruction");
Value *Divisor = I.getOperand(1);		Value *Divisor = I.getOperand(1);
auto *CInt = dyn_cast<ConstantInt>(Divisor);		auto *CInt = dyn_cast<ConstantInt>(Divisor);
return !CInt \|\| CInt->isZero();		return !CInt \|\| CInt->isZero();
}		}

void InnerLoopVectorizer::widenInstruction(Instruction &I, VPUser &User,		void InnerLoopVectorizer::widenInstruction(Instruction &I, VPUser &User,
VPTransformState &State) {		VPTransformState &State) {
assert(!VF.isScalable() && "scalable vectors not yet supported.");
switch (I.getOpcode()) {		switch (I.getOpcode()) {
case Instruction::Call:		case Instruction::Call:
case Instruction::Br:		case Instruction::Br:
case Instruction::PHI:		case Instruction::PHI:
case Instruction::GetElementPtr:		case Instruction::GetElementPtr:
case Instruction::Select:		case Instruction::Select:
llvm_unreachable("This instruction is handled by a different recipe.");		llvm_unreachable("This instruction is handled by a different recipe.");
case Instruction::UDiv:		case Instruction::UDiv:
▲ Show 20 Lines • Show All 372 Lines • ▼ Show 20 Lines	LLVM_DEBUG(dbgs() << "LV: Found scalar instruction: " << *IndUpdate
<< "\n");		<< "\n");
}		}

Scalars[VF].insert(Worklist.begin(), Worklist.end());		Scalars[VF].insert(Worklist.begin(), Worklist.end());
}		}

bool LoopVectorizationCostModel::isScalarWithPredication(Instruction *I,		bool LoopVectorizationCostModel::isScalarWithPredication(Instruction *I,
ElementCount VF) {		ElementCount VF) {
assert(!VF.isScalable() && "scalable vectors not yet supported.");
if (!blockNeedsPredication(I->getParent()))		if (!blockNeedsPredication(I->getParent()))
return false;		return false;
switch(I->getOpcode()) {		switch(I->getOpcode()) {
default:		default:
break;		break;
case Instruction::Load:		case Instruction::Load:
case Instruction::Store: {		case Instruction::Store: {
if (!Legal->isMaskRequired(I))		if (!Legal->isMaskRequired(I))
▲ Show 20 Lines • Show All 681 Lines • ▼ Show 20 Lines	if (EnableIndVarRegisterHeur) {
PowerOf2Floor((TargetNumRegisters - LoopInvariantRegs - 1) /		PowerOf2Floor((TargetNumRegisters - LoopInvariantRegs - 1) /
std::max(1U, (MaxLocalUsers - 1)));		std::max(1U, (MaxLocalUsers - 1)));
}		}

IC = std::min(IC, TmpIC);		IC = std::min(IC, TmpIC);
}		}

// Clamp the interleave ranges to reasonable counts.		// Clamp the interleave ranges to reasonable counts.
assert(!VF.isScalable() && "scalable vectors not yet supported.");
unsigned MaxInterleaveCount =		unsigned MaxInterleaveCount =
TTI.getMaxInterleaveFactor(VF.getKnownMinValue());		TTI.getMaxInterleaveFactor(VF.getKnownMinValue());

// Check if the user has overridden the max.		// Check if the user has overridden the max.
if (VF.isScalar()) {		if (VF.isScalar()) {
if (ForceTargetMaxScalarInterleaveFactor.getNumOccurrences() > 0)		if (ForceTargetMaxScalarInterleaveFactor.getNumOccurrences() > 0)
MaxInterleaveCount = ForceTargetMaxScalarInterleaveFactor;		MaxInterleaveCount = ForceTargetMaxScalarInterleaveFactor;
} else {		} else {
▲ Show 20 Lines • Show All 185 Lines • ▼ Show 20 Lines	LoopVectorizationCostModel::calculateRegisterUsage(ArrayRef<ElementCount> VFs) {

LLVM_DEBUG(dbgs() << "LV(REG): Calculating max register usage:\n");		LLVM_DEBUG(dbgs() << "LV(REG): Calculating max register usage:\n");

// A lambda that gets the register usage for the given type and VF.		// A lambda that gets the register usage for the given type and VF.
auto GetRegUsage = [&DL, WidestRegister](Type *Ty, ElementCount VF) {		auto GetRegUsage = [&DL, WidestRegister](Type *Ty, ElementCount VF) {
if (Ty->isTokenTy())		if (Ty->isTokenTy())
return 0U;		return 0U;
unsigned TypeSize = DL.getTypeSizeInBits(Ty->getScalarType());		unsigned TypeSize = DL.getTypeSizeInBits(Ty->getScalarType());
assert(!VF.isScalable() && "scalable vectors not yet supported.");		// This assert can be removed, because the answer probably wouldn't be
		// any different than for fixed-width vectors.
		// assert(!VF.isScalable() && "scalable vectors not yet supported.");
return std::max<unsigned>(1, VF.getKnownMinValue() * TypeSize /		return std::max<unsigned>(1, VF.getKnownMinValue() * TypeSize /
WidestRegister);		WidestRegister);
};		};

for (unsigned int i = 0, s = IdxToInstr.size(); i < s; ++i) {		for (unsigned int i = 0, s = IdxToInstr.size(); i < s; ++i) {
Instruction *I = IdxToInstr[i];		Instruction *I = IdxToInstr[i];

// Remove all of the instructions that end at this location.		// Remove all of the instructions that end at this location.
▲ Show 20 Lines • Show All 261 Lines • ▼ Show 20 Lines	while (!Worklist.empty()) {
ScalarCosts[I] = ScalarCost;		ScalarCosts[I] = ScalarCost;
}		}

return Discount;		return Discount;
}		}

LoopVectorizationCostModel::VectorizationCostTy		LoopVectorizationCostModel::VectorizationCostTy
LoopVectorizationCostModel::expectedCost(ElementCount VF) {		LoopVectorizationCostModel::expectedCost(ElementCount VF) {
assert(!VF.isScalable() && "scalable vectors not yet supported.");
VectorizationCostTy Cost;		VectorizationCostTy Cost;

// For each block.		// For each block.
for (BasicBlock *BB : TheLoop->blocks()) {		for (BasicBlock *BB : TheLoop->blocks()) {
VectorizationCostTy BlockCost;		VectorizationCostTy BlockCost;

// For each instruction in the old loop.		// For each instruction in the old loop.
for (Instruction &I : BB->instructionsWithoutDebug()) {		for (Instruction &I : BB->instructionsWithoutDebug()) {
▲ Show 20 Lines • Show All 230 Lines • ▼ Show 20 Lines	return TTI.getAddressComputationCost(ValTy) +
TTI::TCK_RecipThroughput, I);		TTI::TCK_RecipThroughput, I);
}		}
return getWideningCost(I, VF);		return getWideningCost(I, VF);
}		}

LoopVectorizationCostModel::VectorizationCostTy		LoopVectorizationCostModel::VectorizationCostTy
LoopVectorizationCostModel::getInstructionCost(Instruction *I,		LoopVectorizationCostModel::getInstructionCost(Instruction *I,
ElementCount VF) {		ElementCount VF) {
assert(!VF.isScalable() &&
"the cost model is not yet implemented for scalable vectorization");
// If we know that this instruction will remain uniform, check the cost of		// If we know that this instruction will remain uniform, check the cost of
// the scalar version.		// the scalar version.
if (isUniformAfterVectorization(I, VF))		if (isUniformAfterVectorization(I, VF))
VF = ElementCount::getFixed(1);		VF = ElementCount::getFixed(1);

		// FIXME: Implement a proper cost-model for scalable vectors.
		// For now disable the LoopVectorizationCostModel for scalable vectors,
		// because there are too many code-paths that have either:
		// `assert(!VF.isScalable())` or `cast<FixedVectorType>(..)`.
		// After we fix up those code-paths, we can remove this shortcut.
		// Because the only way to vectorize a loop using scalable vectors is by
		// forcing it through the LoopHints, the cost-model for scalable vectors
		// is not yet relevant anyway.
		if (VF.isScalable())
		return {1, true};

if (VF.isVector() && isProfitableToScalarize(I, VF))		if (VF.isVector() && isProfitableToScalarize(I, VF))
return VectorizationCostTy(InstsToScalarize[VF][I], false);		return VectorizationCostTy(InstsToScalarize[VF][I], false);

// Forced scalars do not have any scalarization overhead.		// Forced scalars do not have any scalarization overhead.
auto ForcedScalar = ForcedScalars.find(VF);		auto ForcedScalar = ForcedScalars.find(VF);
if (VF.isVector() && ForcedScalar != ForcedScalars.end()) {		if (VF.isVector() && ForcedScalar != ForcedScalars.end()) {
auto InstSet = ForcedScalar->second;		auto InstSet = ForcedScalar->second;
if (InstSet.count(I))		if (InstSet.count(I))
▲ Show 20 Lines • Show All 42 Lines • ▼ Show 20 Lines	unsigned LoopVectorizationCostModel::getScalarizationOverhead(Instruction *I,

// Skip operands that do not require extraction/scalarization and do not incur		// Skip operands that do not require extraction/scalarization and do not incur
// any overhead.		// any overhead.
return Cost + TTI.getOperandsScalarizationOverhead(		return Cost + TTI.getOperandsScalarizationOverhead(
filterExtractingOperands(Ops, VF), VF.getKnownMinValue());		filterExtractingOperands(Ops, VF), VF.getKnownMinValue());
}		}

void LoopVectorizationCostModel::setCostBasedWideningDecision(ElementCount VF) {		void LoopVectorizationCostModel::setCostBasedWideningDecision(ElementCount VF) {
assert(!VF.isScalable() && "scalable vectors not yet supported.");
if (VF.isScalar())		if (VF.isScalar())
return;		return;
NumPredStores = 0;		NumPredStores = 0;
for (BasicBlock *BB : TheLoop->blocks()) {		for (BasicBlock *BB : TheLoop->blocks()) {
// For each instruction in the old loop.		// For each instruction in the old loop.
for (Instruction &I : *BB) {		for (Instruction &I : *BB) {
Value *Ptr = getLoadStorePointerOperand(&I);		Value *Ptr = getLoadStorePointerOperand(&I);
if (!Ptr)		if (!Ptr)
▲ Show 20 Lines • Show All 572 Lines • ▼ Show 20 Lines	LoopVectorizationPlanner::planInVPlanNativePath(ElementCount UserVF) {
LLVM_DEBUG(		LLVM_DEBUG(
dbgs() << "LV: Not vectorizing. Inner loops aren't supported in the "		dbgs() << "LV: Not vectorizing. Inner loops aren't supported in the "
"VPlan-native path.\n");		"VPlan-native path.\n");
return VectorizationFactor::Disabled();		return VectorizationFactor::Disabled();
}		}

Optional<VectorizationFactor>		Optional<VectorizationFactor>
LoopVectorizationPlanner::plan(ElementCount UserVF, unsigned UserIC) {		LoopVectorizationPlanner::plan(ElementCount UserVF, unsigned UserIC) {
assert(!UserVF.isScalable() && "scalable vectorization not yet handled");
assert(OrigLoop->isInnermost() && "Inner loop expected.");		assert(OrigLoop->isInnermost() && "Inner loop expected.");
Optional<ElementCount> MaybeMaxVF =		Optional<ElementCount> MaybeMaxVF =
CM.computeMaxVF(UserVF, UserIC);		CM.computeMaxVF(UserVF, UserIC);
if (!MaybeMaxVF) // Cases that should not to be vectorized nor interleaved.		if (!MaybeMaxVF) // Cases that should not to be vectorized nor interleaved.
return None;		return None;

// Invalidate interleave groups if all blocks of loop will be predicated.		// Invalidate interleave groups if all blocks of loop will be predicated.
if (CM.blockNeedsPredication(OrigLoop->getHeader()) &&		if (CM.blockNeedsPredication(OrigLoop->getHeader()) &&
▲ Show 20 Lines • Show All 1,087 Lines • ▼ Show 20 Lines	if (Kind == RecurrenceDescriptor::RK_IntegerMinMax \|\|
(Instruction::BinaryOps)I->getOpcode(), NewRed, PrevInChain);		(Instruction::BinaryOps)I->getOpcode(), NewRed, PrevInChain);
}		}
State.ValueMap.setVectorValue(I, Part, NextInChain);		State.ValueMap.setVectorValue(I, Part, NextInChain);
}		}
}		}

void VPReplicateRecipe::execute(VPTransformState &State) {		void VPReplicateRecipe::execute(VPTransformState &State) {
if (State.Instance) { // Generate a single instance.		if (State.Instance) { // Generate a single instance.
		assert(!State.VF.isScalable() && "Can't scalarise a scalable vector");
State.ILV->scalarizeInstruction(Ingredient, this, State.Instance,		State.ILV->scalarizeInstruction(Ingredient, this, State.Instance,
IsPredicated, State);		IsPredicated, State);
// Insert scalar instance packing it into a vector.		// Insert scalar instance packing it into a vector.
if (AlsoPack && State.VF.isVector()) {		if (AlsoPack && State.VF.isVector()) {
// If we're constructing lane 0, initialize to start from undef.		// If we're constructing lane 0, initialize to start from undef.
if (State.Instance->Lane == 0) {		if (State.Instance->Lane == 0) {
assert(!State.VF.isScalable() && "VF is assumed to be non scalable.");		assert(!State.VF.isScalable() && "VF is assumed to be non scalable.");
Value *Undef =		Value *Undef =
UndefValue::get(VectorType::get(Ingredient->getType(), State.VF));		UndefValue::get(VectorType::get(Ingredient->getType(), State.VF));
State.ValueMap.setVectorValue(Ingredient, State.Instance->Part, Undef);		State.ValueMap.setVectorValue(Ingredient, State.Instance->Part, Undef);
}		}
State.ILV->packScalarIntoVectorValue(Ingredient, *State.Instance);		State.ILV->packScalarIntoVectorValue(Ingredient, *State.Instance);
}		}
return;		return;
}		}

// Generate scalar instances for all VF lanes of all UF parts, unless the		// Generate scalar instances for all VF lanes of all UF parts, unless the
// instruction is uniform inwhich case generate only the first lane for each		// instruction is uniform inwhich case generate only the first lane for each
// of the UF parts.		// of the UF parts.
unsigned EndLane = IsUniform ? 1 : State.VF.getKnownMinValue();		unsigned EndLane = IsUniform ? 1 : State.VF.getKnownMinValue();
		assert((!State.VF.isScalable() \|\| IsUniform) &&
		"Can't scalarise a scalable vector");
for (unsigned Part = 0; Part < State.UF; ++Part)		for (unsigned Part = 0; Part < State.UF; ++Part)
for (unsigned Lane = 0; Lane < EndLane; ++Lane)		for (unsigned Lane = 0; Lane < EndLane; ++Lane)
State.ILV->scalarizeInstruction(Ingredient, *this, {Part, Lane},		State.ILV->scalarizeInstruction(Ingredient, *this, {Part, Lane},
IsPredicated, State);		IsPredicated, State);
}		}

void VPBranchOnMaskRecipe::execute(VPTransformState &State) {		void VPBranchOnMaskRecipe::execute(VPTransformState &State) {
assert(State.Instance && "Branch on Mask works only on single instance.");		assert(State.Instance && "Branch on Mask works only on single instance.");
▲ Show 20 Lines • Show All 575 Lines • Show Last 20 Lines

llvm/test/Transforms/LoopVectorize/scalable-loop-unpredicated-body-scalar-tail.ll

This file was added.

				; For now this test requires aarch64-registered-target, until we can
				; also pass the loop hint as a 'force-vector-width' flag to opt.
				; REQUIRES: aarch64-registered-target

				; RUN: opt -S -loop-vectorize -instcombine < %s \| FileCheck %s

				source_filename = "loop.c"
				target datalayout = "e-m:e-i8:8:32-i16:16:32-i64:64-i128:128-n32:64-S128"
				target triple = "aarch64-unknown-linux-gnu"

				; CHECK: for.body.preheader:
				; CHECK-DAG: %wide.trip.count = zext i32 %N to i64
				; CHECK-DAG: %[[VSCALE:.*]] = call i64 @llvm.vscale.i64()
				; CHECK-DAG: %[[VSCALEX4:.*]] = shl i64 %[[VSCALE]], 2
				; CHECK-DAG: %min.iters.check = icmp ugt i64 %[[VSCALEX4]], %wide.trip.count

				; CHECK: vector.ph:
				; CHECK-DAG: %[[VSCALE:.*]] = call i64 @llvm.vscale.i64()
				; CHECK-DAG: %[[VSCALEX4:.*]] = shl i64 %[[VSCALE]], 2
				; CHECK-DAG: %n.mod.vf = urem i64 %wide.trip.count, %[[VSCALEX4]]
				; CHECK: %n.vec = sub nsw i64 %wide.trip.count, %n.mod.vf

				; CHECK: vector.body:
				; CHECK: %index = phi i64 [ 0, %vector.ph ], [ %index.next, %vector.body ]
				; CHECK: %[[IDXB:.]] = getelementptr inbounds double, double %b, i64 %index
				; CHECK: %[[IDXB_CAST:.]] = bitcast double %[[IDXB]] to <vscale x 4 x double>*
				; CHECK: %wide.load = load <vscale x 4 x double>, <vscale x 4 x double>* %[[IDXB_CAST]], align 8, !alias.scope !0
				; CHECK: %[[FADD:.*]] = fadd <vscale x 4 x double> %wide.load, shufflevector (<vscale x 4 x double> insertelement (<vscale x 4 x double> undef, double 1.000000e+00, i32 0), <vscale x 4 x double> undef, <vscale x 4 x i32> zeroinitializer)
				; CHECK: %[[IDXA:.]] = getelementptr inbounds double, double %a, i64 %index
				; CHECK: %[[IDXA_CAST:.]] = bitcast double %[[IDXA]] to <vscale x 4 x double>*
				; CHECK: store <vscale x 4 x double> %[[FADD]], <vscale x 4 x double>* %[[IDXA_CAST]], align 8, !alias.scope !3, !noalias !0
				; CHECK: %[[VSCALE:.*]] = call i64 @llvm.vscale.i64()
				; CHECK: %[[VSCALEX4:.*]] = shl i64 %[[VSCALE]], 2
				; CHECK: %index.next = add i64 %index, %[[VSCALEX4]]
				; CHECK: %[[CMP:.*]] = icmp eq i64 %index.next, %n.vec
				; CHECK: br i1 %[[CMP]], label %middle.block, label %vector.body, !llvm.loop !5

				define void @loop(i32 %N, double* nocapture %a, double* nocapture readonly %b) {
				entry:
				%cmp7 = icmp sgt i32 %N, 0
				br i1 %cmp7, label %for.body.preheader, label %for.cond.cleanup

				for.body.preheader: ; preds = %entry
				%wide.trip.count = zext i32 %N to i64
				br label %for.body

				for.cond.cleanup: ; preds = %for.body, %entry
				ret void

				for.body: ; preds = %for.body.preheader, %for.body
				%indvars.iv = phi i64 [ 0, %for.body.preheader ], [ %indvars.iv.next, %for.body ]
				%arrayidx = getelementptr inbounds double, double* %b, i64 %indvars.iv
				%0 = load double, double* %arrayidx, align 8
				%add = fadd double %0, 1.000000e+00
				%arrayidx2 = getelementptr inbounds double, double* %a, i64 %indvars.iv
				store double %add, double* %arrayidx2, align 8
				%indvars.iv.next = add nuw nsw i64 %indvars.iv, 1
				%exitcond.not = icmp eq i64 %indvars.iv.next, %wide.trip.count
				br i1 %exitcond.not, label %for.cond.cleanup, label %for.body, !llvm.loop !1
				}

				!1 = distinct !{!1, !2, !3}
				!2 = !{!"llvm.loop.vectorize.width", !4}
				!3 = !{!"llvm.loop.interleave.count", i32 1}
				!4 = !{i32 4, i1 1}