This is an archive of the discontinued LLVM Phabricator instance.

[LV] Avoid unnecessary IV scalar-to-vector-to-scalar conversions
ClosedPublic

Authored by mssimpso on Jul 11 2016, 2:41 PM.

Download Raw Diff

Details

Reviewers

Summary

This patch prevents increases in the number of instructions, pre-instcombine, due to induction variable scalarization. An increase in instructions can lead to an increase in the compile-time required to simplify the induction variables. We now maintain a new map for scalarized induction variables to prevent us from converting between the scalar and vector forms.

Diff Detail

Event Timeline

mssimpso updated this revision to Diff 63576.Jul 11 2016, 2:41 PM

mssimpso retitled this revision from to [LV] Avoid unnecessary IV scalar-to-vector-to-scalar conversions.

mssimpso updated this object.

mssimpso added a reviewer: anemet.

mssimpso added subscribers: cmatthews, llvm-commits.

Herald added subscribers: mzolotukhin, mcrosier. · View Herald TranscriptJul 11 2016, 2:41 PM

Hi Matt,

Looks basically good.

A bit more high-level info about the approach would be nice next time. I think I understand that you are using the new Map in the scalarizeInst.

You're missing tests.

Adam

lib/Transforms/Vectorize/LoopVectorize.cpp
604–607	Please explain the relation to WidenMap. It might also be a good idea to move it closer there.

Addressed Adam's comments.

Renamed ScalarMap to ScalarIVMap since we're currently only using it for induction variables (though it's possible we may want to reuse this in the future to further limit the scalar/vector conversions for other cases).
Updated ScalarIVMap comment and moved declaration.
Added comment in vectorizeMemoryInstruction
Added some pre-instcombine tests.
Verified compile-time improvement. I see significant speedups in >25 tests from the test-suite with no slowdowns.

Adam,

Thanks for the feedback, and sorry for the lack of explanation. Yes, you understand correctly; here's a quick overview.

When we build the steps for the induction variables we want to be scalar, instead of inserting them into a vector to be maintained in WidenMap, we now leave them as-is and store the steps in ScalarIVMap. Then, whenever we need to scalarize a use of an induction variable, instead of accessing WidenMap and creating an extractelement instruction, we just grab the appropriate value from ScalarIVMap. The relevant extracts are in two places: one in scalarizeInstruction, and one in vectorizeMemoryInstruction.

The approach cuts the number of instructions, pre-instcombine, related to induction variable scalarization by 2/3 since we eliminate, for each step, an insertelement and an extractelement instruction. Compared to always vectorizing the induction variables, scalarizing now doesn't increase the number of instructions. This is because where we previously would have had an extract from the induction variable vector, we now have a scalar step computation. I hope that makes sense!

Thanks for the changes, just one more thing. For absolute paranoia, can you please also add a test with interleaving>1 (no instcombine)?

lib/Transforms/Vectorize/LoopVectorize.cpp
2475	This sounds a bit confusing. How about a comment talking about this IV having a scalarized version, like what you say in the next hunk?

Addressed Adam's comments.

Updated comment about GepOperand
Added tests with interleave > 1 and no instcombine

Thanks!

LGTM!

This revision is now accepted and ready to land.Jul 13 2016, 2:56 PM

Committed in rL275419. Thanks, Adam!

Revision Contents

Path

Size

lib/

Transforms/

Vectorize/

LoopVectorize.cpp

114 lines

test/

Transforms/

LoopVectorize/

induction.ll

49 lines

reverse_induction.ll

24 lines

Diff 63863

lib/Transforms/Vectorize/LoopVectorize.cpp

This file is larger than 256 KB, so syntax highlighting is disabled by default.

Show First 20 Lines • Show All 397 Lines • ▼ Show 20 Lines	protected:
/// this is needed because each iteration in the loop corresponds to a SIMD		/// this is needed because each iteration in the loop corresponds to a SIMD
/// element.		/// element.
virtual Value getBroadcastInstrs(Value V);		virtual Value getBroadcastInstrs(Value V);

/// This function adds (StartIdx, StartIdx + Step, StartIdx + 2*Step, ...)		/// This function adds (StartIdx, StartIdx + Step, StartIdx + 2*Step, ...)
/// to each vector element of Val. The sequence starts at StartIndex.		/// to each vector element of Val. The sequence starts at StartIndex.
virtual Value getStepVector(Value Val, int StartIdx, Value *Step);		virtual Value getStepVector(Value Val, int StartIdx, Value *Step);

/// Compute a step vector like the above function, but scalarize the		/// Compute scalar induction steps. \p ScalarIV is the scalar induction
/// arithmetic instead. The results of the computation are inserted into a		/// variable on which to base the steps, \p Step is the size of the step, and
/// new vector with VF elements. \p Val is the initial value, \p Step is the		/// \p EntryVal is the value from the original loop that maps to the steps.
/// size of the step, and \p StartIdx indicates the index of the increment		/// Note that \p EntryVal doesn't have to be an induction variable (e.g., it
/// from which to start computing the steps.		/// can be a truncate instruction).
Value getScalarizedStepVector(Value Val, int StartIdx, Value *Step);		void buildScalarSteps(Value ScalarIV, Value Step, Value *EntryVal);

/// Create a vector induction phi node based on an existing scalar one. This		/// Create a vector induction phi node based on an existing scalar one. This
/// currently only works for integer induction variables with a constant		/// currently only works for integer induction variables with a constant
/// step. If \p TruncType is non-null, instead of widening the original IV,		/// step. If \p TruncType is non-null, instead of widening the original IV,
/// we widen a version of the IV truncated to \p TruncType.		/// we widen a version of the IV truncated to \p TruncType.
void createVectorIntInductionPHI(const InductionDescriptor &II,		void createVectorIntInductionPHI(const InductionDescriptor &II,
VectorParts &Entry, IntegerType *TruncType);		VectorParts &Entry, IntegerType *TruncType);

/// Widen an integer induction variable \p IV. If \p TruncType is provided,		/// Widen an integer induction variable \p IV. If \p Trunc is provided, the
/// the induction variable will first be truncated to the specified type. The		/// induction variable will first be truncated to the corresponding type. The
/// widened values are placed in \p Entry.		/// widened values are placed in \p Entry.
void widenIntInduction(PHINode *IV, VectorParts &Entry,		void widenIntInduction(PHINode *IV, VectorParts &Entry,
IntegerType *TruncType = nullptr);		TruncInst *Trunc = nullptr);

/// When we go over instructions in the basic block we rely on previous		/// When we go over instructions in the basic block we rely on previous
/// values within the current basic block or on loop invariant values.		/// values within the current basic block or on loop invariant values.
/// When we widen (vectorize) values we place them in the map. If the values		/// When we widen (vectorize) values we place them in the map. If the values
/// are not within the map, they have to be loop invariant, so we simply		/// are not within the map, they have to be loop invariant, so we simply
/// broadcast them into a vector.		/// broadcast them into a vector.
VectorParts &getVectorValue(Value *V);		VectorParts &getVectorValue(Value *V);

▲ Show 20 Lines • Show All 134 Lines • ▼ Show 20 Lines	protected:
SmallVector<BasicBlock *, 4> LoopBypassBlocks;		SmallVector<BasicBlock *, 4> LoopBypassBlocks;

/// The new Induction variable which was added to the new block.		/// The new Induction variable which was added to the new block.
PHINode *Induction;		PHINode *Induction;
/// The induction variable of the old basic block.		/// The induction variable of the old basic block.
PHINode *OldInduction;		PHINode *OldInduction;
/// Maps scalars to widened vectors.		/// Maps scalars to widened vectors.
ValueMap WidenMap;		ValueMap WidenMap;

		/// A map of induction variables from the original loop to their
		/// corresponding VF * UF scalarized values in the vectorized loop. The
		/// purpose of ScalarIVMap is similar to that of WidenMap. Whereas WidenMap
		/// maps original loop values to their vector versions in the new loop,
		/// ScalarIVMap maps induction variables from the original loop that are not
		/// vectorized to their scalar equivalents in the vector loop. Maintaining a
		/// separate map for scalarized induction variables allows us to avoid
		/// unnecessary scalar-to-vector-to-scalar conversions.
		DenseMap<Value , SmallVector<Value , 8>> ScalarIVMap;

/// Store instructions that should be predicated, as a pair		/// Store instructions that should be predicated, as a pair
/// <StoreInst, Predicate>		/// <StoreInst, Predicate>
SmallVector<std::pair<StoreInst , Value >, 4> PredicatedStores;		SmallVector<std::pair<StoreInst , Value >, 4> PredicatedStores;
EdgeMaskCache MaskCache;		EdgeMaskCache MaskCache;
/// Trip count of the original loop.		/// Trip count of the original loop.
Value *TripCount;		Value *TripCount;
/// Trip count of the widened loop (TripCount - TripCount % (VF*UF))		/// Trip count of the widened loop (TripCount - TripCount % (VF*UF))
Value *VectorTripCount;		Value *VectorTripCount;

/// Map of scalar integer values to the smallest bitwidth they can be legally		/// Map of scalar integer values to the smallest bitwidth they can be legally
/// represented as. The vector equivalents of these values should be truncated		/// represented as. The vector equivalents of these values should be truncated
/// to this type.		/// to this type.
const MapVector<Instruction , uint64_t> MinBWs;		const MapVector<Instruction , uint64_t> MinBWs;

/// A set of values that should not be widened. This is taken from		/// A set of values that should not be widened. This is taken from
/// VecValuesToIgnore in the cost model.		/// VecValuesToIgnore in the cost model.
SmallPtrSetImpl<const Value > ValuesNotWidened;		SmallPtrSetImpl<const Value > ValuesNotWidened;

LoopVectorizationLegality *Legal;		LoopVectorizationLegality *Legal;

// Record whether runtime checks are added.		// Record whether runtime checks are added.
bool AddedSafetyChecks;		bool AddedSafetyChecks;
		anemetUnsubmitted Done Reply Inline Actions Please explain the relation to WidenMap. It might also be a good idea to move it closer there. anemet: Please explain the relation to WidenMap. It might also be a good idea to move it closer there.
};		};

class InnerLoopUnroller : public InnerLoopVectorizer {		class InnerLoopUnroller : public InnerLoopVectorizer {
public:		public:
InnerLoopUnroller(Loop *OrigLoop, PredicatedScalarEvolution &PSE,		InnerLoopUnroller(Loop *OrigLoop, PredicatedScalarEvolution &PSE,
LoopInfo LI, DominatorTree DT,		LoopInfo LI, DominatorTree DT,
const TargetLibraryInfo *TLI,		const TargetLibraryInfo *TLI,
const TargetTransformInfo TTI, AssumptionCache AC,		const TargetTransformInfo TTI, AssumptionCache AC,
▲ Show 20 Lines • Show All 1,278 Lines • ▼ Show 20 Lines	for (unsigned Part = 0; Part < UF; ++Part) {
LastInduction = Builder.CreateAdd(LastInduction, SplatVF, "step.add");		LastInduction = Builder.CreateAdd(LastInduction, SplatVF, "step.add");
}		}

VecInd->addIncoming(SteppedStart, LoopVectorPreHeader);		VecInd->addIncoming(SteppedStart, LoopVectorPreHeader);
VecInd->addIncoming(LastInduction, LoopVectorBody);		VecInd->addIncoming(LastInduction, LoopVectorBody);
}		}

void InnerLoopVectorizer::widenIntInduction(PHINode *IV, VectorParts &Entry,		void InnerLoopVectorizer::widenIntInduction(PHINode *IV, VectorParts &Entry,
IntegerType *TruncType) {		TruncInst *Trunc) {

auto II = Legal->getInductionVars()->find(IV);		auto II = Legal->getInductionVars()->find(IV);
assert(II != Legal->getInductionVars()->end() && "IV is not an induction");		assert(II != Legal->getInductionVars()->end() && "IV is not an induction");

auto ID = II->second;		auto ID = II->second;
assert(IV->getType() == ID.getStartValue()->getType() && "Types must match");		assert(IV->getType() == ID.getStartValue()->getType() && "Types must match");

		// If a truncate instruction was provided, get the smaller type.
		auto *TruncType = Trunc ? cast<IntegerType>(Trunc->getType()) : nullptr;

// The step of the induction.		// The step of the induction.
Value *Step = nullptr;		Value *Step = nullptr;

// If the induction variable has a constant integer step value, go ahead and		// If the induction variable has a constant integer step value, go ahead and
// get it now.		// get it now.
if (ID.getConstIntStepValue())		if (ID.getConstIntStepValue())
Step = ID.getConstIntStepValue();		Step = ID.getConstIntStepValue();

Show All 26 Lines	if (TruncType) {
}		}
if (!Step) {		if (!Step) {
SCEVExpander Exp(*PSE.getSE(), DL, "induction");		SCEVExpander Exp(*PSE.getSE(), DL, "induction");
Step = Exp.expandCodeFor(ID.getStep(), ID.getStep()->getType(),		Step = Exp.expandCodeFor(ID.getStep(), ID.getStep()->getType(),
&*Builder.GetInsertPoint());		&*Builder.GetInsertPoint());
}		}
}		}

// If an induction variable is only used for counting loop iterations or		// Splat the scalar induction variable, and build the necessary step vectors.
// calculating addresses, it shouldn't be widened. Scalarize the step vector
// to give InstCombine a better chance of simplifying it.
if (VF > 1 && ValuesNotWidened->count(IV)) {
for (unsigned Part = 0; Part < UF; ++Part)
Entry[Part] = getScalarizedStepVector(ScalarIV, VF * Part, Step);
return;
}

// Finally, splat the scalar induction variable, and build the necessary step
// vectors.
Value *Broadcasted = getBroadcastInstrs(ScalarIV);		Value *Broadcasted = getBroadcastInstrs(ScalarIV);
for (unsigned Part = 0; Part < UF; ++Part)		for (unsigned Part = 0; Part < UF; ++Part)
Entry[Part] = getStepVector(Broadcasted, VF * Part, Step);		Entry[Part] = getStepVector(Broadcasted, VF * Part, Step);

		// If an induction variable is only used for counting loop iterations or
		// calculating addresses, it doesn't need to be widened. Create scalar steps
		// that can be used by instructions we will later scalarize. Note that the
		// addition of the scalar steps will not increase the number of instructions
		// in the loop in the common case prior to InstCombine. We will be trading
		// one vector extract for each scalar step.
		if (VF > 1 && ValuesNotWidened->count(IV)) {
		auto *EntryVal = Trunc ? cast<Value>(Trunc) : IV;
		buildScalarSteps(ScalarIV, Step, EntryVal);
		}
}		}

Value InnerLoopVectorizer::getStepVector(Value Val, int StartIdx,		Value InnerLoopVectorizer::getStepVector(Value Val, int StartIdx,
Value *Step) {		Value *Step) {
assert(Val->getType()->isVectorTy() && "Must be a vector");		assert(Val->getType()->isVectorTy() && "Must be a vector");
assert(Val->getType()->getScalarType()->isIntegerTy() &&		assert(Val->getType()->getScalarType()->isIntegerTy() &&
"Elem must be an integer");		"Elem must be an integer");
assert(Step->getType() == Val->getType()->getScalarType() &&		assert(Step->getType() == Val->getType()->getScalarType() &&
Show All 14 Lines	Value InnerLoopVectorizer::getStepVector(Value Val, int StartIdx,
Step = Builder.CreateVectorSplat(VLen, Step);		Step = Builder.CreateVectorSplat(VLen, Step);
assert(Step->getType() == Val->getType() && "Invalid step vec");		assert(Step->getType() == Val->getType() && "Invalid step vec");
// FIXME: The newly created binary instructions should contain nsw/nuw flags,		// FIXME: The newly created binary instructions should contain nsw/nuw flags,
// which can be found from the original scalar operations.		// which can be found from the original scalar operations.
Step = Builder.CreateMul(Cv, Step);		Step = Builder.CreateMul(Cv, Step);
return Builder.CreateAdd(Val, Step, "induction");		return Builder.CreateAdd(Val, Step, "induction");
}		}

Value InnerLoopVectorizer::getScalarizedStepVector(Value Val, int StartIdx,		void InnerLoopVectorizer::buildScalarSteps(Value ScalarIV, Value Step,
Value *Step) {		Value *EntryVal) {

// We can't create a vector with less than two elements.		// We shouldn't have to build scalar steps if we aren't vectorizing.
assert(VF > 1 && "VF should be greater than one");		assert(VF > 1 && "VF should be greater than one");

// Get the value type and ensure it and the step have the same integer type.		// Get the value type and ensure it and the step have the same integer type.
Type *ValTy = Val->getType()->getScalarType();		Type *ScalarIVTy = ScalarIV->getType()->getScalarType();
assert(ValTy->isIntegerTy() && ValTy == Step->getType() &&		assert(ScalarIVTy->isIntegerTy() && ScalarIVTy == Step->getType() &&
"Val and Step should have the same integer type");		"Val and Step should have the same integer type");

// Compute the scalarized step vector. We perform scalar arithmetic and then		// Compute the scalar steps and save the results in ScalarIVMap.
// insert the results into the step vector.		for (unsigned Part = 0; Part < UF; ++Part)
Value *StepVector = UndefValue::get(ToVectorTy(ValTy, VF));
for (unsigned I = 0; I < VF; ++I) {		for (unsigned I = 0; I < VF; ++I) {
auto *Mul = Builder.CreateMul(ConstantInt::get(ValTy, StartIdx + I), Step);		auto StartIdx = ConstantInt::get(ScalarIVTy, VF Part + I);
auto *Add = Builder.CreateAdd(Val, Mul);		auto *Mul = Builder.CreateMul(StartIdx, Step);
StepVector = Builder.CreateInsertElement(StepVector, Add, I);		auto *Add = Builder.CreateAdd(ScalarIV, Mul);
		ScalarIVMap[EntryVal].push_back(Add);
}		}

return StepVector;
}		}

int LoopVectorizationLegality::isConsecutivePtr(Value *Ptr) {		int LoopVectorizationLegality::isConsecutivePtr(Value *Ptr) {
assert(Ptr->getType()->isPointerTy() && "Unexpected non-ptr");		assert(Ptr->getType()->isPointerTy() && "Unexpected non-ptr");
auto *SE = PSE.getSE();		auto *SE = PSE.getSE();
// Make sure that the pointer does not point to structs.		// Make sure that the pointer does not point to structs.
if (Ptr->getType()->getPointerElementType()->isAggregateType())		if (Ptr->getType()->getPointerElementType()->isAggregateType())
return 0;		return 0;
▲ Show 20 Lines • Show All 439 Lines • ▼ Show 20 Lines	if (Gep && Legal->isInductionVariable(Gep->getPointerOperand())) {
if (i == InductionOperand \|\|		if (i == InductionOperand \|\|
(GepOperandInst && OrigLoop->contains(GepOperandInst))) {		(GepOperandInst && OrigLoop->contains(GepOperandInst))) {
assert((i == InductionOperand \|\|		assert((i == InductionOperand \|\|
PSE.getSE()->isLoopInvariant(PSE.getSCEV(GepOperandInst),		PSE.getSE()->isLoopInvariant(PSE.getSCEV(GepOperandInst),
OrigLoop)) &&		OrigLoop)) &&
"Must be last index or loop invariant");		"Must be last index or loop invariant");

VectorParts &GEPParts = getVectorValue(GepOperand);		VectorParts &GEPParts = getVectorValue(GepOperand);
Value *Index = GEPParts[0];
Index = Builder.CreateExtractElement(Index, Zero);		// If GepOperand is an induction variable, and there's a scalarized
		anemetUnsubmitted Done Reply Inline Actions This sounds a bit confusing. How about a comment talking about this IV having a scalarized version, like what you say in the next hunk? anemet: This sounds a bit confusing. How about a comment talking about this IV having a scalarized…
		// version of it available, use it. Otherwise, we will need to create
		// an extractelement instruction.
		Value *Index = ScalarIVMap.count(GepOperand)
		? ScalarIVMap[GepOperand][0]
		: Builder.CreateExtractElement(GEPParts[0], Zero);

Gep2->setOperand(i, Index);		Gep2->setOperand(i, Index);
Gep2->setName("gep.indvar.idx");		Gep2->setName("gep.indvar.idx");
}		}
}		}
Ptr = Builder.Insert(Gep2);		Ptr = Builder.Insert(Gep2);
} else { // No GEP		} else { // No GEP
// Use the induction element ptr.		// Use the induction element ptr.
assert(isa<PHINode>(Ptr) && "Invalid induction ptr");		assert(isa<PHINode>(Ptr) && "Invalid induction ptr");
▲ Show 20 Lines • Show All 188 Lines • ▼ Show 20 Lines	for (unsigned Width = 0; Width < VF; ++Width) {
ConstantInt::get(Cmp->getType(), 1));		ConstantInt::get(Cmp->getType(), 1));
}		}

Instruction *Cloned = Instr->clone();		Instruction *Cloned = Instr->clone();
if (!IsVoidRetTy)		if (!IsVoidRetTy)
Cloned->setName(Instr->getName() + ".cloned");		Cloned->setName(Instr->getName() + ".cloned");
// Replace the operands of the cloned instructions with extracted scalars.		// Replace the operands of the cloned instructions with extracted scalars.
for (unsigned op = 0, e = Instr->getNumOperands(); op != e; ++op) {		for (unsigned op = 0, e = Instr->getNumOperands(); op != e; ++op) {
Value *Op = Params[op][Part];
// Param is a vector. Need to extract the right lane.		// If the operand is an induction variable, and there's a scalarized
if (Op->getType()->isVectorTy())		// version of it available, use it. Otherwise, we will need to create
Op = Builder.CreateExtractElement(Op, Builder.getInt32(Width));		// an extractelement instruction if vectorizing.
Cloned->setOperand(op, Op);		auto *NewOp = Params[op][Part];
		auto *ScalarOp = Instr->getOperand(op);
		if (ScalarIVMap.count(ScalarOp))
		NewOp = ScalarIVMap[ScalarOp][VF * Part + Width];
		else if (NewOp->getType()->isVectorTy())
		NewOp = Builder.CreateExtractElement(NewOp, Builder.getInt32(Width));
		Cloned->setOperand(op, NewOp);
}		}
addNewMetadata(Cloned, Instr);		addNewMetadata(Cloned, Instr);

// Place the cloned scalar in the new loop.		// Place the cloned scalar in the new loop.
Builder.Insert(Cloned);		Builder.Insert(Cloned);

// If we just cloned a new assumption, add it the assumption cache.		// If we just cloned a new assumption, add it the assumption cache.
if (auto *II = dyn_cast<IntrinsicInst>(Cloned))		if (auto *II = dyn_cast<IntrinsicInst>(Cloned))
▲ Show 20 Lines • Show All 1,488 Lines • ▼ Show 20 Lines	case Instruction::BitCast: {

// Optimize the special case where the source is a constant integer		// Optimize the special case where the source is a constant integer
// induction variable. Notice that we can only optimize the 'trunc' case		// induction variable. Notice that we can only optimize the 'trunc' case
// because (a) FP conversions lose precision, (b) sext/zext may wrap, and		// because (a) FP conversions lose precision, (b) sext/zext may wrap, and
// (c) other casts depend on pointer size.		// (c) other casts depend on pointer size.
auto ID = Legal->getInductionVars()->lookup(OldInduction);		auto ID = Legal->getInductionVars()->lookup(OldInduction);
if (isa<TruncInst>(CI) && CI->getOperand(0) == OldInduction &&		if (isa<TruncInst>(CI) && CI->getOperand(0) == OldInduction &&
ID.getConstIntStepValue()) {		ID.getConstIntStepValue()) {
auto *TruncType = cast<IntegerType>(CI->getType());		widenIntInduction(OldInduction, Entry, cast<TruncInst>(CI));
widenIntInduction(OldInduction, Entry, TruncType);
addMetadata(Entry, &*it);		addMetadata(Entry, &*it);
break;		break;
}		}

/// Vectorize casts.		/// Vectorize casts.
Type *DestTy =		Type *DestTy =
(VF == 1) ? CI->getType() : VectorType::get(CI->getType(), VF);		(VF == 1) ? CI->getType() : VectorType::get(CI->getType(), VF);

▲ Show 20 Lines • Show All 2,513 Lines • Show Last 20 Lines

test/Transforms/LoopVectorize/induction.ll

	; RUN: opt < %s -loop-vectorize -force-vector-interleave=1 -force-vector-width=2 -S \| FileCheck %s			; RUN: opt < %s -loop-vectorize -force-vector-interleave=1 -force-vector-width=2 -S \| FileCheck %s
	; RUN: opt < %s -loop-vectorize -force-vector-interleave=1 -force-vector-width=2 -instcombine -S \| FileCheck %s --check-prefix=IND			; RUN: opt < %s -loop-vectorize -force-vector-interleave=1 -force-vector-width=2 -instcombine -S \| FileCheck %s --check-prefix=IND
	; RUN: opt < %s -loop-vectorize -force-vector-interleave=2 -force-vector-width=2 -instcombine -S \| FileCheck %s --check-prefix=UNROLL			; RUN: opt < %s -loop-vectorize -force-vector-interleave=2 -force-vector-width=2 -instcombine -S \| FileCheck %s --check-prefix=UNROLL
				; RUN: opt < %s -loop-vectorize -force-vector-interleave=2 -force-vector-width=2 -S \| FileCheck %s --check-prefix=UNROLL-NO-IC
	; RUN: opt < %s -loop-vectorize -force-vector-interleave=2 -force-vector-width=4 -enable-interleaved-mem-accesses -instcombine -S \| FileCheck %s --check-prefix=INTERLEAVE			; RUN: opt < %s -loop-vectorize -force-vector-interleave=2 -force-vector-width=4 -enable-interleaved-mem-accesses -instcombine -S \| FileCheck %s --check-prefix=INTERLEAVE

	target datalayout = "e-p:64:64:64-i1:8:8-i8:8:8-i16:16:16-i32:32:32-i64:64:64-f32:32:32-f64:64:64-v64:64:64-v128:128:128-a0:0:64-s0:64:64-f80:128:128-n8:16:32:64-S128"			target datalayout = "e-p:64:64:64-i1:8:8-i8:8:8-i16:16:16-i32:32:32-i64:64:64-f32:32:32-f64:64:64-v64:64:64-v128:128:128-a0:0:64-s0:64:64-f80:128:128-n8:16:32:64-S128"

	; Make sure that we can handle multiple integer induction variables.			; Make sure that we can handle multiple integer induction variables.
	; CHECK-LABEL: @multi_int_induction(			; CHECK-LABEL: @multi_int_induction(
	; CHECK: vector.body:			; CHECK: vector.body:
	; CHECK: %index = phi i64 [ 0, %vector.ph ], [ %index.next, %vector.body ]			; CHECK: %index = phi i64 [ 0, %vector.ph ], [ %index.next, %vector.body ]
	▲ Show 20 Lines • Show All 56 Lines • ▼ Show 20 Lines
	}			}

	; Make sure we don't create a vector induction phi node that is unused.			; Make sure we don't create a vector induction phi node that is unused.
	; Scalarize the step vectors instead.			; Scalarize the step vectors instead.
	;			;
	; for (int i = 0; i < n; ++i)			; for (int i = 0; i < n; ++i)
	; sum += a[i];			; sum += a[i];
	;			;
				; CHECK-LABEL: @scalarize_induction_variable_01(
				; CHECK: vector.body:
				; CHECK: %index = phi i64 [ 0, %vector.ph ], [ %index.next, %vector.body ]
				; CHECK: %[[i0:.+]] = add i64 %index, 0
				; CHECK: %[[i1:.+]] = add i64 %index, 1
				; CHECK: getelementptr inbounds i64, i64* %a, i64 %[[i0]]
				; CHECK: getelementptr inbounds i64, i64* %a, i64 %[[i1]]
				;
				; UNROLL-NO-IC-LABEL: @scalarize_induction_variable_01(
				; UNROLL-NO-IC: vector.body:
				; UNROLL-NO-IC: %index = phi i64 [ 0, %vector.ph ], [ %index.next, %vector.body ]
				; UNROLL-NO-IC: %[[i0:.+]] = add i64 %index, 0
				; UNROLL-NO-IC: %[[i1:.+]] = add i64 %index, 1
				; UNROLL-NO-IC: %[[i2:.+]] = add i64 %index, 2
				; UNROLL-NO-IC: %[[i3:.+]] = add i64 %index, 3
				; UNROLL-NO-IC: getelementptr inbounds i64, i64* %a, i64 %[[i0]]
				; UNROLL-NO-IC: getelementptr inbounds i64, i64* %a, i64 %[[i1]]
				; UNROLL-NO-IC: getelementptr inbounds i64, i64* %a, i64 %[[i2]]
				; UNROLL-NO-IC: getelementptr inbounds i64, i64* %a, i64 %[[i3]]
				;
	; IND-LABEL: @scalarize_induction_variable_01(			; IND-LABEL: @scalarize_induction_variable_01(
	; IND: vector.body:			; IND: vector.body:
	; IND: %index = phi i64 [ 0, %vector.ph ], [ %index.next, %vector.body ]			; IND: %index = phi i64 [ 0, %vector.ph ], [ %index.next, %vector.body ]
	; IND-NOT: add i64 {{.*}}, 2			; IND-NOT: add i64 {{.*}}, 2
	; IND: getelementptr inbounds i64, i64* %a, i64 %index			; IND: getelementptr inbounds i64, i64* %a, i64 %index
	;			;
	; UNROLL-LABEL: @scalarize_induction_variable_01(			; UNROLL-LABEL: @scalarize_induction_variable_01(
	; UNROLL: vector.body:			; UNROLL: vector.body:
	Show All 23 Lines

	; Make sure we scalarize the step vectors used for the pointer arithmetic. We			; Make sure we scalarize the step vectors used for the pointer arithmetic. We
	; can't easily simplify vectorized step vectors.			; can't easily simplify vectorized step vectors.
	;			;
	; float s = 0;			; float s = 0;
	; for (int i ; 0; i < n; i += 8)			; for (int i ; 0; i < n; i += 8)
	; s += (a[i] + b[i] + 1.0f);			; s += (a[i] + b[i] + 1.0f);
	;			;
				; CHECK-LABEL: @scalarize_induction_variable_02(
				; CHECK: vector.body:
				; CHECK: %index = phi i64 [ 0, %vector.ph ], [ %index.next, %vector.body ]
				; CHECK: %offset.idx = shl i64 %index, 3
				; CHECK: %[[i0:.+]] = add i64 %offset.idx, 0
				; CHECK: %[[i1:.+]] = add i64 %offset.idx, 8
				; CHECK: getelementptr inbounds float, float* %a, i64 %[[i0]]
				; CHECK: getelementptr inbounds float, float* %a, i64 %[[i1]]
				; CHECK: getelementptr inbounds float, float* %b, i64 %[[i0]]
				; CHECK: getelementptr inbounds float, float* %b, i64 %[[i1]]
				;
				; UNROLL-NO-IC-LABEL: @scalarize_induction_variable_02(
				; UNROLL-NO-IC: vector.body:
				; UNROLL-NO-IC: %index = phi i64 [ 0, %vector.ph ], [ %index.next, %vector.body ]
				; UNROLL-NO-IC: %offset.idx = shl i64 %index, 3
				; UNROLL-NO-IC: %[[i0:.+]] = add i64 %offset.idx, 0
				; UNROLL-NO-IC: %[[i1:.+]] = add i64 %offset.idx, 8
				; UNROLL-NO-IC: %[[i2:.+]] = add i64 %offset.idx, 16
				; UNROLL-NO-IC: %[[i3:.+]] = add i64 %offset.idx, 24
				; UNROLL-NO-IC: getelementptr inbounds float, float* %a, i64 %[[i0]]
				; UNROLL-NO-IC: getelementptr inbounds float, float* %a, i64 %[[i1]]
				; UNROLL-NO-IC: getelementptr inbounds float, float* %a, i64 %[[i2]]
				; UNROLL-NO-IC: getelementptr inbounds float, float* %a, i64 %[[i3]]
				; UNROLL-NO-IC: getelementptr inbounds float, float* %b, i64 %[[i0]]
				; UNROLL-NO-IC: getelementptr inbounds float, float* %b, i64 %[[i1]]
				; UNROLL-NO-IC: getelementptr inbounds float, float* %b, i64 %[[i2]]
				; UNROLL-NO-IC: getelementptr inbounds float, float* %b, i64 %[[i3]]
				;
	; IND-LABEL: @scalarize_induction_variable_02(			; IND-LABEL: @scalarize_induction_variable_02(
	; IND: vector.body:			; IND: vector.body:
	; IND: %index = phi i64 [ 0, %vector.ph ], [ %index.next, %vector.body ]			; IND: %index = phi i64 [ 0, %vector.ph ], [ %index.next, %vector.body ]
	; IND: %[[i0:.+]] = shl i64 %index, 3			; IND: %[[i0:.+]] = shl i64 %index, 3
	; IND: %[[i1:.+]] = or i64 %[[i0]], 8			; IND: %[[i1:.+]] = or i64 %[[i0]], 8
	; IND: getelementptr inbounds float, float* %a, i64 %[[i0]]			; IND: getelementptr inbounds float, float* %a, i64 %[[i0]]
	; IND: getelementptr inbounds float, float* %a, i64 %[[i1]]			; IND: getelementptr inbounds float, float* %a, i64 %[[i1]]
	;			;
	▲ Show 20 Lines • Show All 363 Lines • Show Last 20 Lines

test/Transforms/LoopVectorize/reverse_induction.ll

	; RUN: opt < %s -loop-vectorize -force-vector-interleave=2 -force-vector-width=4 -S \| FileCheck %s			; RUN: opt < %s -loop-vectorize -force-vector-interleave=2 -force-vector-width=4 -S \| FileCheck %s

	target datalayout = "e-p:64:64:64-i1:8:8-i8:8:8-i16:16:16-i32:32:32-i64:64:64-f32:32:32-f64:64:64-v64:64:64-v128:128:128-a0:0:64-s0:64:64-f80:128:128-n8:16:32:64-S128"			target datalayout = "e-p:64:64:64-i1:8:8-i8:8:8-i16:16:16-i32:32:32-i64:64:64-f32:32:32-f64:64:64-v64:64:64-v128:128:128-a0:0:64-s0:64:64-f80:128:128-n8:16:32:64-S128"

	; Make sure consecutive vector generates correct negative indices.			; Make sure consecutive vector generates correct negative indices.
	; PR15882			; PR15882

	; CHECK: %index = phi i64 [ 0, %vector.ph ], [ %index.next, %vector.body ]			; CHECK: %index = phi i64 [ 0, %vector.ph ], [ %index.next, %vector.body ]
	; CHECK: %offset.idx = sub i64 %startval, %index			; CHECK: %offset.idx = sub i64 %startval, %index
	; CHECK: %[[a0:.+]] = add i64 %offset.idx, 0			; CHECK: %[[a0:.+]] = add i64 %offset.idx, 0
	; CHECK: %[[v0:.+]] = insertelement <4 x i64> undef, i64 %[[a0]], i64 0
	; CHECK: %[[a1:.+]] = add i64 %offset.idx, -1			; CHECK: %[[a1:.+]] = add i64 %offset.idx, -1
	; CHECK: %[[v1:.+]] = insertelement <4 x i64> %[[v0]], i64 %[[a1]], i64 1
	; CHECK: %[[a2:.+]] = add i64 %offset.idx, -2			; CHECK: %[[a2:.+]] = add i64 %offset.idx, -2
	; CHECK: %[[v2:.+]] = insertelement <4 x i64> %[[v1]], i64 %[[a2]], i64 2
	; CHECK: %[[a3:.+]] = add i64 %offset.idx, -3			; CHECK: %[[a3:.+]] = add i64 %offset.idx, -3
	; CHECK: %[[v3:.+]] = insertelement <4 x i64> %[[v2]], i64 %[[a3]], i64 3
	; CHECK: %[[a4:.+]] = add i64 %offset.idx, -4			; CHECK: %[[a4:.+]] = add i64 %offset.idx, -4
	; CHECK: %[[v4:.+]] = insertelement <4 x i64> undef, i64 %[[a4]], i64 0
	; CHECK: %[[a5:.+]] = add i64 %offset.idx, -5			; CHECK: %[[a5:.+]] = add i64 %offset.idx, -5
	; CHECK: %[[v5:.+]] = insertelement <4 x i64> %[[v4]], i64 %[[a5]], i64 1
	; CHECK: %[[a6:.+]] = add i64 %offset.idx, -6			; CHECK: %[[a6:.+]] = add i64 %offset.idx, -6
	; CHECK: %[[v6:.+]] = insertelement <4 x i64> %[[v5]], i64 %[[a6]], i64 2
	; CHECK: %[[a7:.+]] = add i64 %offset.idx, -7			; CHECK: %[[a7:.+]] = add i64 %offset.idx, -7
	; CHECK: %[[v7:.+]] = insertelement <4 x i64> %[[v6]], i64 %[[a7]], i64 3

	define i32 @reverse_induction_i64(i64 %startval, i32 * %ptr) {			define i32 @reverse_induction_i64(i64 %startval, i32 * %ptr) {
	entry:			entry:
	br label %for.body			br label %for.body

	for.body:			for.body:
	%add.i7 = phi i64 [ %startval, %entry ], [ %add.i, %for.body ]			%add.i7 = phi i64 [ %startval, %entry ], [ %add.i, %for.body ]
	%i.06 = phi i32 [ 0, %entry ], [ %inc4, %for.body ]			%i.06 = phi i32 [ 0, %entry ], [ %inc4, %for.body ]
	Show All 9 Lines
	loopend:			loopend:
	ret i32 %inc.redux			ret i32 %inc.redux
	}			}

	; CHECK-LABEL: @reverse_induction_i128(			; CHECK-LABEL: @reverse_induction_i128(
	; CHECK: %index = phi i128 [ 0, %vector.ph ], [ %index.next, %vector.body ]			; CHECK: %index = phi i128 [ 0, %vector.ph ], [ %index.next, %vector.body ]
	; CHECK: %offset.idx = sub i128 %startval, %index			; CHECK: %offset.idx = sub i128 %startval, %index
	; CHECK: %[[a0:.+]] = add i128 %offset.idx, 0			; CHECK: %[[a0:.+]] = add i128 %offset.idx, 0
	; CHECK: %[[v0:.+]] = insertelement <4 x i128> undef, i128 %[[a0]], i64 0
	; CHECK: %[[a1:.+]] = add i128 %offset.idx, -1			; CHECK: %[[a1:.+]] = add i128 %offset.idx, -1
	; CHECK: %[[v1:.+]] = insertelement <4 x i128> %[[v0]], i128 %[[a1]], i64 1
	; CHECK: %[[a2:.+]] = add i128 %offset.idx, -2			; CHECK: %[[a2:.+]] = add i128 %offset.idx, -2
	; CHECK: %[[v2:.+]] = insertelement <4 x i128> %[[v1]], i128 %[[a2]], i64 2
	; CHECK: %[[a3:.+]] = add i128 %offset.idx, -3			; CHECK: %[[a3:.+]] = add i128 %offset.idx, -3
	; CHECK: %[[v3:.+]] = insertelement <4 x i128> %[[v2]], i128 %[[a3]], i64 3
	; CHECK: %[[a4:.+]] = add i128 %offset.idx, -4			; CHECK: %[[a4:.+]] = add i128 %offset.idx, -4
	; CHECK: %[[v4:.+]] = insertelement <4 x i128> undef, i128 %[[a4]], i64 0
	; CHECK: %[[a5:.+]] = add i128 %offset.idx, -5			; CHECK: %[[a5:.+]] = add i128 %offset.idx, -5
	; CHECK: %[[v5:.+]] = insertelement <4 x i128> %[[v4]], i128 %[[a5]], i64 1
	; CHECK: %[[a6:.+]] = add i128 %offset.idx, -6			; CHECK: %[[a6:.+]] = add i128 %offset.idx, -6
	; CHECK: %[[v6:.+]] = insertelement <4 x i128> %[[v5]], i128 %[[a6]], i64 2
	; CHECK: %[[a7:.+]] = add i128 %offset.idx, -7			; CHECK: %[[a7:.+]] = add i128 %offset.idx, -7
	; CHECK: %[[v7:.+]] = insertelement <4 x i128> %[[v6]], i128 %[[a7]], i64 3

	define i32 @reverse_induction_i128(i128 %startval, i32 * %ptr) {			define i32 @reverse_induction_i128(i128 %startval, i32 * %ptr) {
	entry:			entry:
	br label %for.body			br label %for.body

	for.body:			for.body:
	%add.i7 = phi i128 [ %startval, %entry ], [ %add.i, %for.body ]			%add.i7 = phi i128 [ %startval, %entry ], [ %add.i, %for.body ]
	%i.06 = phi i32 [ 0, %entry ], [ %inc4, %for.body ]			%i.06 = phi i32 [ 0, %entry ], [ %inc4, %for.body ]
	Show All 9 Lines
	loopend:			loopend:
	ret i32 %inc.redux			ret i32 %inc.redux
	}			}

	; CHECK-LABEL: @reverse_induction_i16(			; CHECK-LABEL: @reverse_induction_i16(
	; CHECK: %index = phi i32 [ 0, %vector.ph ], [ %index.next, %vector.body ]			; CHECK: %index = phi i32 [ 0, %vector.ph ], [ %index.next, %vector.body ]
	; CHECK: %offset.idx = sub i16 %startval, {{.*}}			; CHECK: %offset.idx = sub i16 %startval, {{.*}}
	; CHECK: %[[a0:.+]] = add i16 %offset.idx, 0			; CHECK: %[[a0:.+]] = add i16 %offset.idx, 0
	; CHECK: %[[v0:.+]] = insertelement <4 x i16> undef, i16 %[[a0]], i64 0
	; CHECK: %[[a1:.+]] = add i16 %offset.idx, -1			; CHECK: %[[a1:.+]] = add i16 %offset.idx, -1
	; CHECK: %[[v1:.+]] = insertelement <4 x i16> %[[v0]], i16 %[[a1]], i64 1
	; CHECK: %[[a2:.+]] = add i16 %offset.idx, -2			; CHECK: %[[a2:.+]] = add i16 %offset.idx, -2
	; CHECK: %[[v2:.+]] = insertelement <4 x i16> %[[v1]], i16 %[[a2]], i64 2
	; CHECK: %[[a3:.+]] = add i16 %offset.idx, -3			; CHECK: %[[a3:.+]] = add i16 %offset.idx, -3
	; CHECK: %[[v3:.+]] = insertelement <4 x i16> %[[v2]], i16 %[[a3]], i64 3
	; CHECK: %[[a4:.+]] = add i16 %offset.idx, -4			; CHECK: %[[a4:.+]] = add i16 %offset.idx, -4
	; CHECK: %[[v4:.+]] = insertelement <4 x i16> undef, i16 %[[a4]], i64 0
	; CHECK: %[[a5:.+]] = add i16 %offset.idx, -5			; CHECK: %[[a5:.+]] = add i16 %offset.idx, -5
	; CHECK: %[[v5:.+]] = insertelement <4 x i16> %[[v4]], i16 %[[a5]], i64 1
	; CHECK: %[[a6:.+]] = add i16 %offset.idx, -6			; CHECK: %[[a6:.+]] = add i16 %offset.idx, -6
	; CHECK: %[[v6:.+]] = insertelement <4 x i16> %[[v5]], i16 %[[a6]], i64 2
	; CHECK: %[[a7:.+]] = add i16 %offset.idx, -7			; CHECK: %[[a7:.+]] = add i16 %offset.idx, -7
	; CHECK: %[[v7:.+]] = insertelement <4 x i16> %[[v6]], i16 %[[a7]], i64 3

	define i32 @reverse_induction_i16(i16 %startval, i32 * %ptr) {			define i32 @reverse_induction_i16(i16 %startval, i32 * %ptr) {
	entry:			entry:
	br label %for.body			br label %for.body

	for.body:			for.body:
	%add.i7 = phi i16 [ %startval, %entry ], [ %add.i, %for.body ]			%add.i7 = phi i16 [ %startval, %entry ], [ %add.i, %for.body ]
	%i.06 = phi i32 [ 0, %entry ], [ %inc4, %for.body ]			%i.06 = phi i32 [ 0, %entry ], [ %inc4, %for.body ]
	▲ Show 20 Lines • Show All 82 Lines • Show Last 20 Lines