This is an archive of the discontinued LLVM Phabricator instance.

Differential D19258

Loop vectorization with induction variable with non-constant step.
ClosedPublic

Authored by delena on Apr 19 2016, 4:09 AM.

Download Raw Diff

Details

Reviewers

anemet
Ayal
aschwaighofer
hfinkel

Commits

rGc434d091c5e8: [LoopVectorize] Handling induction variable with non-constant step.
rL269023: [LoopVectorize] Handling induction variable with non-constant step.

Summary

One of vectorization limitation today is non-const step of induction variable.
In this patch, I allow vectorization when the step is a loop-invariant variable.
This is the loop example that was vectorized after my patch:
int int_inc;
int bar(int init, int *restrict A, int N) {

int x = init;
 for (int i=0;i<N;i++){
   A[i] = x;
   x += int_inc;
 }
 return x;

}
In this case "x" is an induction variable with"init_inc" step. Loop exit count is calculated from another induction variable "i".
The following loop remains scalar right now:

for (int i=0; i<N; i+=int_inc) {
  A[i] = i;
}

In this case the vectorizer can't calculate the exit count.

Diff Detail

Repository: rL LLVM

Event Timeline

delena updated this revision to Diff 54172.Apr 19 2016, 4:09 AM

delena retitled this revision from to Loop vectorization with induction variable with non-constant step..

delena updated this object.

delena added reviewers: Ayal, anemet, hfinkel.

delena set the repository for this revision to rL LLVM.

delena added a subscriber: llvm-commits.

Herald added a subscriber: mzolotukhin. · View Herald TranscriptApr 19 2016, 4:09 AM

Ping.

hfinkel added inline comments.Apr 26 2016, 3:38 PM

../lib/Transforms/Utils/LoopUtils.cpp
738–739 ↗	(On Diff #54172)	I don't think this is a good way to handle this, because we'll run into all kinds of non-trivial situations where SCEV has computed an expression for the step that does not exactly correspond to a pre-existing IR value. We won't handle those cases, and it will be nearly impossible for a user to understand why. Instead, convert StepValue in InductionDescriptor to a const SCEV , and then when we need the Value when generating IR, get one from the SCEVBuilder. It will reuse an existing Value* should one exist.

delena added inline comments.Apr 27 2016, 11:17 AM

../lib/Transforms/Utils/LoopUtils.cpp
738–739 ↗	(On Diff #54172)	I can't find a way to build a Value from SCEV. If the Step is "SCEVAddRecExpr", how do I get Value* from it?

mssimpso added a subscriber: mssimpso.Apr 27 2016, 11:40 AM

mssimpso added inline comments.

../lib/Transforms/Utils/LoopUtils.cpp
738–739 ↗	(On Diff #54172)	Elena, I think Hal was suggesting you use a SCEVExpander to build a Value from a SCEV expression. SCEVExpander's are used elsewhere in the loop vectorizer.

hfinkel added inline comments.Apr 28 2016, 7:41 AM

../lib/Transforms/Utils/LoopUtils.cpp
738–739 ↗	(On Diff #54172)	Yes, exactly. SCEVExpander will reuse an existing value if an appropriate one is known, or otherwise materialize some appropriate IR value.

Following Hal's and Matthew comments, I keep Step as a SCEV inside InductionDescriptor. It allows to vectorize loop with induction variable with any *loop-invariant* step.
Again, we are talking about a "secondary" induction variable, which is being updated inside loop, but does not participate in trip count calculation.

I added one more example, where the step is a SCEV from outer loop:

for (int j = 0; j < M; j++) {
  for (int i=0; i<N; i++){
    A[i] = x;
    x += j; // loop-invariant step
  }
}

Herald added a subscriber: sanjoy. · View Herald TranscriptApr 29 2016, 5:49 AM

hfinkel added inline comments.Apr 29 2016, 8:09 AM

../lib/Transforms/Utils/LoopUtils.cpp
661 ↗	(On Diff #55587)	Comments should be complete sentences and end with a period.
668 ↗	(On Diff #55587)	See above.
703 ↗	(On Diff #55587)	The logic here is now repeating things that SCEVExpander should already know how to do (plus SCEV might know about even more simplification rules). Try calculating the expression using SCEV and then expand that. This seems to be computing Start + IndexStep, so we can generate: const SCEV S = SE->getAddExpr(SE->getSCEV(Start), SE->getMulExpr(SE->getSCEV(Index), Step); return Exp.expandCodeFor(S, StartValue->getType(), &*B.GetInsertPoint()); If you use StartValue->getType() as I did above, this might just work for the pointer inductions too.

delena added inline comments.May 3 2016, 6:41 AM

../lib/Transforms/Utils/LoopUtils.cpp
703 ↗	(On Diff #55587)	I tried to apply your proposal. The generated code is right but not always match the original. Some tests fail due to this replacement. It happens because getAddExpr() tries to combine all addends recursively. In some cases the result is better. But I saw one case at least, where the code became less optimal.

I added SCEV calculation where it was possible without changing the existing tests.
I also changed some comments.

Hi Hal,

I checked again. SE->getAddExpr(SE->getSCEV(A), SE->getSCEV(B)) does not always give better results than just createAdd(A, B).
It happens when we mix getAddExpr on SCEVs and ADD/SUB operations. As a result, we receive redundant intermediate values being calculated in different ways and Instcombine is unable to reduce them all.

Could you, please, continue reviewing this patch. Thank you.

hfinkel accepted this revision.May 9 2016, 3:47 PM

hfinkel edited edge metadata.

hfinkel added inline comments.

../lib/Transforms/Utils/LoopUtils.cpp
702 ↗	(On Diff #55988)	Okay. That's unfortunate. Please add a FIXME comment here explaining the situation. Otherwise, this LGTM.

This revision is now accepted and ready to land.May 9 2016, 3:47 PM

Thanks a lot for the review! I added a comment about SCEV expressions.

Closed by commit rL269023: [LoopVectorize] Handling induction variable with non-constant step. (authored by delena). · Explain WhyMay 10 2016, 12:39 AM

This revision was automatically updated to reflect the committed changes.

Revision Contents

Path

Size

llvm/

trunk/

include/

llvm/

Transforms/

Utils/

LoopUtils.h

12 lines

lib/

Transforms/

Utils/

LoopUtils.cpp

92 lines

Vectorize/

LoopVectorize.cpp

76 lines

test/

Transforms/

LoopVectorize/

induction-step.ll

124 lines

Diff 56667

llvm/trunk/include/llvm/Transforms/Utils/LoopUtils.h

Show First 20 Lines • Show All 263 Lines • ▼ Show 20 Lines	enum InductionKind {
IK_NoInduction, ///< Not an induction variable.		IK_NoInduction, ///< Not an induction variable.
IK_IntInduction, ///< Integer induction variable. Step = C.		IK_IntInduction, ///< Integer induction variable. Step = C.
IK_PtrInduction ///< Pointer induction var. Step = C / sizeof(elem).		IK_PtrInduction ///< Pointer induction var. Step = C / sizeof(elem).
};		};

public:		public:
/// Default constructor - creates an invalid induction.		/// Default constructor - creates an invalid induction.
InductionDescriptor()		InductionDescriptor()
: StartValue(nullptr), IK(IK_NoInduction), StepValue(nullptr) {}		: StartValue(nullptr), IK(IK_NoInduction), Step(nullptr) {}

/// Get the consecutive direction. Returns:		/// Get the consecutive direction. Returns:
/// 0 - unknown or non-consecutive.		/// 0 - unknown or non-consecutive.
/// 1 - consecutive and increasing.		/// 1 - consecutive and increasing.
/// -1 - consecutive and decreasing.		/// -1 - consecutive and decreasing.
int getConsecutiveDirection() const;		int getConsecutiveDirection() const;

/// Compute the transformed value of Index at offset StartValue using step		/// Compute the transformed value of Index at offset StartValue using step
/// StepValue.		/// StepValue.
/// For integer induction, returns StartValue + Index * StepValue.		/// For integer induction, returns StartValue + Index * StepValue.
/// For pointer induction, returns StartValue[Index * StepValue].		/// For pointer induction, returns StartValue[Index * StepValue].
/// FIXME: The newly created binary instructions should contain nsw/nuw		/// FIXME: The newly created binary instructions should contain nsw/nuw
/// flags, which can be found from the original scalar operations.		/// flags, which can be found from the original scalar operations.
Value transform(IRBuilder<> &B, Value Index) const;		Value transform(IRBuilder<> &B, Value Index, ScalarEvolution *SE,
		const DataLayout& DL) const;

Value *getStartValue() const { return StartValue; }		Value *getStartValue() const { return StartValue; }
InductionKind getKind() const { return IK; }		InductionKind getKind() const { return IK; }
ConstantInt *getStepValue() const { return StepValue; }		const SCEV *getStep() const { return Step; }
		ConstantInt *getConstIntStepValue() const;

/// Returns true if \p Phi is an induction. If \p Phi is an induction,		/// Returns true if \p Phi is an induction. If \p Phi is an induction,
/// the induction descriptor \p D will contain the data describing this		/// the induction descriptor \p D will contain the data describing this
/// induction. If by some other means the caller has a better SCEV		/// induction. If by some other means the caller has a better SCEV
/// expression for \p Phi than the one returned by the ScalarEvolution		/// expression for \p Phi than the one returned by the ScalarEvolution
/// analysis, it can be passed through \p Expr.		/// analysis, it can be passed through \p Expr.
static bool isInductionPHI(PHINode Phi, ScalarEvolution SE,		static bool isInductionPHI(PHINode Phi, ScalarEvolution SE,
InductionDescriptor &D,		InductionDescriptor &D,
const SCEV *Expr = nullptr);		const SCEV *Expr = nullptr);

/// Returns true if \p Phi is an induction, in the context associated with		/// Returns true if \p Phi is an induction, in the context associated with
/// the run-time predicate of PSE. If \p Assume is true, this can add further		/// the run-time predicate of PSE. If \p Assume is true, this can add further
/// SCEV predicates to \p PSE in order to prove that \p Phi is an induction.		/// SCEV predicates to \p PSE in order to prove that \p Phi is an induction.
/// If \p Phi is an induction, \p D will contain the data describing this		/// If \p Phi is an induction, \p D will contain the data describing this
/// induction.		/// induction.
static bool isInductionPHI(PHINode *Phi, PredicatedScalarEvolution &PSE,		static bool isInductionPHI(PHINode *Phi, PredicatedScalarEvolution &PSE,
InductionDescriptor &D, bool Assume = false);		InductionDescriptor &D, bool Assume = false);

private:		private:
/// Private constructor - used by \c isInductionPHI.		/// Private constructor - used by \c isInductionPHI.
InductionDescriptor(Value Start, InductionKind K, ConstantInt Step);		InductionDescriptor(Value Start, InductionKind K, const SCEV Step);

/// Start value.		/// Start value.
TrackingVH<Value> StartValue;		TrackingVH<Value> StartValue;
/// Induction kind.		/// Induction kind.
InductionKind IK;		InductionKind IK;
/// Step value.		/// Step value.
ConstantInt *StepValue;		const SCEV *Step;
};		};

BasicBlock InsertPreheaderForLoop(Loop L, DominatorTree DT, LoopInfo LI,		BasicBlock InsertPreheaderForLoop(Loop L, DominatorTree DT, LoopInfo LI,
bool PreserveLCSSA);		bool PreserveLCSSA);

/// \brief Simplify each loop in a loop nest recursively.		/// \brief Simplify each loop in a loop nest recursively.
///		///
/// This takes a potentially un-simplified loop L (and its children) and turns		/// This takes a potentially un-simplified loop L (and its children) and turns
▲ Show 20 Lines • Show All 96 Lines • Show Last 20 Lines

llvm/trunk/lib/Transforms/Utils/LoopUtils.cpp

Show All 10 Lines
//		//
//===----------------------------------------------------------------------===//		//===----------------------------------------------------------------------===//

#include "llvm/Analysis/AliasAnalysis.h"		#include "llvm/Analysis/AliasAnalysis.h"
#include "llvm/Analysis/BasicAliasAnalysis.h"		#include "llvm/Analysis/BasicAliasAnalysis.h"
#include "llvm/Analysis/LoopInfo.h"		#include "llvm/Analysis/LoopInfo.h"
#include "llvm/Analysis/GlobalsModRef.h"		#include "llvm/Analysis/GlobalsModRef.h"
#include "llvm/Analysis/ScalarEvolution.h"		#include "llvm/Analysis/ScalarEvolution.h"
		#include "llvm/Analysis/ScalarEvolutionExpander.h"
#include "llvm/Analysis/ScalarEvolutionExpressions.h"		#include "llvm/Analysis/ScalarEvolutionExpressions.h"
#include "llvm/Analysis/ScalarEvolutionAliasAnalysis.h"		#include "llvm/Analysis/ScalarEvolutionAliasAnalysis.h"
#include "llvm/IR/Dominators.h"		#include "llvm/IR/Dominators.h"
#include "llvm/IR/Instructions.h"		#include "llvm/IR/Instructions.h"
#include "llvm/IR/Module.h"		#include "llvm/IR/Module.h"
#include "llvm/IR/PatternMatch.h"		#include "llvm/IR/PatternMatch.h"
#include "llvm/IR/ValueHandle.h"		#include "llvm/IR/ValueHandle.h"
#include "llvm/Pass.h"		#include "llvm/Pass.h"
▲ Show 20 Lines • Show All 621 Lines • ▼ Show 20 Lines	Value *RecurrenceDescriptor::createMinMaxOp(IRBuilder<> &Builder,
else		else
Cmp = Builder.CreateICmp(P, Left, Right, "rdx.minmax.cmp");		Cmp = Builder.CreateICmp(P, Left, Right, "rdx.minmax.cmp");

Value *Select = Builder.CreateSelect(Cmp, Left, Right, "rdx.minmax.select");		Value *Select = Builder.CreateSelect(Cmp, Left, Right, "rdx.minmax.select");
return Select;		return Select;
}		}

InductionDescriptor::InductionDescriptor(Value *Start, InductionKind K,		InductionDescriptor::InductionDescriptor(Value *Start, InductionKind K,
ConstantInt *Step)		const SCEV *Step)
: StartValue(Start), IK(K), StepValue(Step) {		: StartValue(Start), IK(K), Step(Step) {
assert(IK != IK_NoInduction && "Not an induction");		assert(IK != IK_NoInduction && "Not an induction");

		// Start value type should match the induction kind and the value
		// itself should not be null.
assert(StartValue && "StartValue is null");		assert(StartValue && "StartValue is null");
assert(StepValue && !StepValue->isZero() && "StepValue is zero");
assert((IK != IK_PtrInduction \|\| StartValue->getType()->isPointerTy()) &&		assert((IK != IK_PtrInduction \|\| StartValue->getType()->isPointerTy()) &&
"StartValue is not a pointer for pointer induction");		"StartValue is not a pointer for pointer induction");
assert((IK != IK_IntInduction \|\| StartValue->getType()->isIntegerTy()) &&		assert((IK != IK_IntInduction \|\| StartValue->getType()->isIntegerTy()) &&
"StartValue is not an integer for integer induction");		"StartValue is not an integer for integer induction");
assert(StepValue->getType()->isIntegerTy() &&
"StepValue is not an integer");		// Check the Step Value. It should be non-zero integer value.
		assert((!getConstIntStepValue() \|\| !getConstIntStepValue()->isZero()) &&
		"Step value is zero");

		assert((IK != IK_PtrInduction \|\| getConstIntStepValue()) &&
		"Step value should be constant for pointer induction");
		assert(Step->getType()->isIntegerTy() && "StepValue is not an integer");
}		}

int InductionDescriptor::getConsecutiveDirection() const {		int InductionDescriptor::getConsecutiveDirection() const {
if (StepValue && (StepValue->isOne() \|\| StepValue->isMinusOne()))		ConstantInt *ConstStep = getConstIntStepValue();
return StepValue->getSExtValue();		if (ConstStep && (ConstStep->isOne() \|\| ConstStep->isMinusOne()))
		return ConstStep->getSExtValue();
return 0;		return 0;
}		}

Value InductionDescriptor::transform(IRBuilder<> &B, Value Index) const {		ConstantInt *InductionDescriptor::getConstIntStepValue() const {
		if (isa<SCEVConstant>(Step))
		return dyn_cast<ConstantInt>(cast<SCEVConstant>(Step)->getValue());
		return nullptr;
		}

		Value InductionDescriptor::transform(IRBuilder<> &B, Value Index,
		ScalarEvolution *SE,
		const DataLayout& DL) const {

		SCEVExpander Exp(*SE, DL, "induction");
switch (IK) {		switch (IK) {
case IK_IntInduction:		case IK_IntInduction: {
assert(Index->getType() == StartValue->getType() &&		assert(Index->getType() == StartValue->getType() &&
"Index type does not match StartValue type");		"Index type does not match StartValue type");
if (StepValue->isMinusOne())
		// FIXME: Theoretically, we can call getAddExpr() of ScalarEvolution
		// and calculate (Start + Index * Step) for all cases, without
		// special handling for "isOne" and "isMinusOne".
		// But in the real life the result code getting worse. We mix SCEV
		// expressions and ADD/SUB operations and receive redundant
		// intermediate values being calculated in different ways and
		// Instcombine is unable to reduce them all.

		if (getConstIntStepValue() &&
		getConstIntStepValue()->isMinusOne())
return B.CreateSub(StartValue, Index);		return B.CreateSub(StartValue, Index);
if (!StepValue->isOne())		if (getConstIntStepValue() &&
Index = B.CreateMul(Index, StepValue);		getConstIntStepValue()->isOne())
return B.CreateAdd(StartValue, Index);		return B.CreateAdd(StartValue, Index);
		const SCEV *S = SE->getAddExpr(SE->getSCEV(StartValue),
case IK_PtrInduction:		SE->getMulExpr(Step, SE->getSCEV(Index)));
assert(Index->getType() == StepValue->getType() &&		return Exp.expandCodeFor(S, StartValue->getType(), &*B.GetInsertPoint());
		}
		case IK_PtrInduction: {
		assert(Index->getType() == Step->getType() &&
"Index type does not match StepValue type");		"Index type does not match StepValue type");
if (StepValue->isMinusOne())		assert(isa<SCEVConstant>(Step) &&
Index = B.CreateNeg(Index);		"Expected constant step for pointer induction");
else if (!StepValue->isOne())		const SCEV *S = SE->getMulExpr(SE->getSCEV(Index), Step);
Index = B.CreateMul(Index, StepValue);		Index = Exp.expandCodeFor(S, Index->getType(), &*B.GetInsertPoint());
return B.CreateGEP(nullptr, StartValue, Index);		return B.CreateGEP(nullptr, StartValue, Index);
		}
case IK_NoInduction:		case IK_NoInduction:
return nullptr;		return nullptr;
}		}
llvm_unreachable("invalid enum");		llvm_unreachable("invalid enum");
}		}

bool InductionDescriptor::isInductionPHI(PHINode *Phi,		bool InductionDescriptor::isInductionPHI(PHINode *Phi,
PredicatedScalarEvolution &PSE,		PredicatedScalarEvolution &PSE,
Show All 38 Lines	bool InductionDescriptor::isInductionPHI(PHINode *Phi,
}		}

assert(AR->getLoop()->getHeader() == Phi->getParent() &&		assert(AR->getLoop()->getHeader() == Phi->getParent() &&
"PHI is an AddRec for a different loop?!");		"PHI is an AddRec for a different loop?!");
Value *StartValue =		Value *StartValue =
Phi->getIncomingValueForBlock(AR->getLoop()->getLoopPreheader());		Phi->getIncomingValueForBlock(AR->getLoop()->getLoopPreheader());
const SCEV Step = AR->getStepRecurrence(SE);		const SCEV Step = AR->getStepRecurrence(SE);
// Calculate the pointer stride and check if it is consecutive.		// Calculate the pointer stride and check if it is consecutive.
const SCEVConstant *C = dyn_cast<SCEVConstant>(Step);		// The stride may be a constant or a loop invariant integer value.
if (!C)		const SCEVConstant *ConstStep = dyn_cast<SCEVConstant>(Step);
		if (!ConstStep && !SE->isLoopInvariant(Step, AR->getLoop()))
return false;		return false;

ConstantInt *CV = C->getValue();
if (PhiTy->isIntegerTy()) {		if (PhiTy->isIntegerTy()) {
D = InductionDescriptor(StartValue, IK_IntInduction, CV);		D = InductionDescriptor(StartValue, IK_IntInduction, Step);
return true;		return true;
}		}

assert(PhiTy->isPointerTy() && "The PHI must be a pointer");		assert(PhiTy->isPointerTy() && "The PHI must be a pointer");
		// Pointer induction should be a constant.
		if (!ConstStep)
		return false;

		ConstantInt *CV = ConstStep->getValue();
Type *PointerElementType = PhiTy->getPointerElementType();		Type *PointerElementType = PhiTy->getPointerElementType();
// The pointer stride cannot be determined if the pointer element type is not		// The pointer stride cannot be determined if the pointer element type is not
// sized.		// sized.
if (!PointerElementType->isSized())		if (!PointerElementType->isSized())
return false;		return false;

const DataLayout &DL = Phi->getModule()->getDataLayout();		const DataLayout &DL = Phi->getModule()->getDataLayout();
int64_t Size = static_cast<int64_t>(DL.getTypeAllocSize(PointerElementType));		int64_t Size = static_cast<int64_t>(DL.getTypeAllocSize(PointerElementType));
if (!Size)		if (!Size)
return false;		return false;

int64_t CVSize = CV->getSExtValue();		int64_t CVSize = CV->getSExtValue();
if (CVSize % Size)		if (CVSize % Size)
return false;		return false;
auto *StepValue = ConstantInt::getSigned(CV->getType(), CVSize / Size);		auto *StepValue = SE->getConstant(CV->getType(), CVSize / Size,
		true /* signed */);
D = InductionDescriptor(StartValue, IK_PtrInduction, StepValue);		D = InductionDescriptor(StartValue, IK_PtrInduction, StepValue);
return true;		return true;
}		}

/// \brief Returns the instructions that use values defined in the loop.		/// \brief Returns the instructions that use values defined in the loop.
SmallVector<Instruction , 8> llvm::findDefsUsedOutsideOfLoop(Loop L) {		SmallVector<Instruction , 8> llvm::findDefsUsedOutsideOfLoop(Loop L) {
SmallVector<Instruction *, 8> UsedOutside;		SmallVector<Instruction *, 8> UsedOutside;

▲ Show 20 Lines • Show All 106 Lines • Show Last 20 Lines

llvm/trunk/lib/Transforms/Vectorize/LoopVectorize.cpp

Show First 20 Lines • Show All 410 Lines • ▼ Show 20 Lines	protected:
/// this is needed because each iteration in the loop corresponds to a SIMD		/// this is needed because each iteration in the loop corresponds to a SIMD
/// element.		/// element.
virtual Value getBroadcastInstrs(Value V);		virtual Value getBroadcastInstrs(Value V);

/// This function adds (StartIdx, StartIdx + Step, StartIdx + 2*Step, ...)		/// This function adds (StartIdx, StartIdx + Step, StartIdx + 2*Step, ...)
/// to each vector element of Val. The sequence starts at StartIndex.		/// to each vector element of Val. The sequence starts at StartIndex.
virtual Value getStepVector(Value Val, int StartIdx, Value *Step);		virtual Value getStepVector(Value Val, int StartIdx, Value *Step);

		/// This function adds (StartIdx, StartIdx + Step, StartIdx + 2*Step, ...)
		/// to each vector element of Val. The sequence starts at StartIndex.
		/// Step is a SCEV. In order to get StepValue it takes the existing value
		/// from SCEV or creates a new using SCEVExpander.
		virtual Value getStepVector(Value Val, int StartIdx, const SCEV *Step);

/// When we go over instructions in the basic block we rely on previous		/// When we go over instructions in the basic block we rely on previous
/// values within the current basic block or on loop invariant values.		/// values within the current basic block or on loop invariant values.
/// When we widen (vectorize) values we place them in the map. If the values		/// When we widen (vectorize) values we place them in the map. If the values
/// are not within the map, they have to be loop invariant, so we simply		/// are not within the map, they have to be loop invariant, so we simply
/// broadcast them into a vector.		/// broadcast them into a vector.
VectorParts &getVectorValue(Value *V);		VectorParts &getVectorValue(Value *V);

/// Try to vectorize the interleaved access group that \p Instr belongs to.		/// Try to vectorize the interleaved access group that \p Instr belongs to.
▲ Show 20 Lines • Show All 168 Lines • ▼ Show 20 Lines	InnerLoopUnroller(Loop *OrigLoop, PredicatedScalarEvolution &PSE,
UnrollFactor) {}		UnrollFactor) {}

private:		private:
void scalarizeInstruction(Instruction *Instr,		void scalarizeInstruction(Instruction *Instr,
bool IfPredicateStore = false) override;		bool IfPredicateStore = false) override;
void vectorizeMemoryInstruction(Instruction *Instr) override;		void vectorizeMemoryInstruction(Instruction *Instr) override;
Value getBroadcastInstrs(Value V) override;		Value getBroadcastInstrs(Value V) override;
Value getStepVector(Value Val, int StartIdx, Value *Step) override;		Value getStepVector(Value Val, int StartIdx, Value *Step) override;
		Value getStepVector(Value Val, int StartIdx, const SCEV *StepSCEV) override;
Value reverseVector(Value Vec) override;		Value reverseVector(Value Vec) override;
};		};

/// \brief Look for a meaningful debug location on the instruction or it's		/// \brief Look for a meaningful debug location on the instruction or it's
/// operands.		/// operands.
static Instruction getDebugLocFromInstOrOperands(Instruction I) {		static Instruction getDebugLocFromInstOrOperands(Instruction I) {
if (!I)		if (!I)
return I;		return I;
▲ Show 20 Lines • Show All 1,460 Lines • ▼ Show 20 Lines	Value InnerLoopVectorizer::getBroadcastInstrs(Value V) {

// Broadcast the scalar into all locations in the vector.		// Broadcast the scalar into all locations in the vector.
Value *Shuf = Builder.CreateVectorSplat(VF, V, "broadcast");		Value *Shuf = Builder.CreateVectorSplat(VF, V, "broadcast");

return Shuf;		return Shuf;
}		}

Value InnerLoopVectorizer::getStepVector(Value Val, int StartIdx,		Value InnerLoopVectorizer::getStepVector(Value Val, int StartIdx,
		const SCEV *StepSCEV) {
		const DataLayout &DL = OrigLoop->getHeader()->getModule()->getDataLayout();
		SCEVExpander Exp(*PSE.getSE(), DL, "induction");
		Value *StepValue = Exp.expandCodeFor(StepSCEV, StepSCEV->getType(),
		&*Builder.GetInsertPoint());
		return getStepVector(Val, StartIdx, StepValue);
		}

		Value InnerLoopVectorizer::getStepVector(Value Val, int StartIdx,
Value *Step) {		Value *Step) {
assert(Val->getType()->isVectorTy() && "Must be a vector");		assert(Val->getType()->isVectorTy() && "Must be a vector");
assert(Val->getType()->getScalarType()->isIntegerTy() &&		assert(Val->getType()->getScalarType()->isIntegerTy() &&
"Elem must be an integer");		"Elem must be an integer");
assert(Step->getType() == Val->getType()->getScalarType() &&		assert(Step->getType() == Val->getType()->getScalarType() &&
"Step has wrong type");		"Step has wrong type");
// Create the types.		// Create the types.
Type *ITy = Val->getType()->getScalarType();		Type *ITy = Val->getType()->getScalarType();
▲ Show 20 Lines • Show All 1,049 Lines • ▼ Show 20 Lines	for (I = List->begin(), E = List->end(); I != E; ++I) {
PHINode *BCResumeVal = PHINode::Create(		PHINode *BCResumeVal = PHINode::Create(
OrigPhi->getType(), 3, "bc.resume.val", ScalarPH->getTerminator());		OrigPhi->getType(), 3, "bc.resume.val", ScalarPH->getTerminator());
Value *EndValue;		Value *EndValue;
if (OrigPhi == OldInduction) {		if (OrigPhi == OldInduction) {
// We know what the end value is.		// We know what the end value is.
EndValue = CountRoundDown;		EndValue = CountRoundDown;
} else {		} else {
IRBuilder<> B(LoopBypassBlocks.back()->getTerminator());		IRBuilder<> B(LoopBypassBlocks.back()->getTerminator());
Value *CRD = B.CreateSExtOrTrunc(		Value *CRD = B.CreateSExtOrTrunc(CountRoundDown,
CountRoundDown, II.getStepValue()->getType(), "cast.crd");		II.getStep()->getType(), "cast.crd");
EndValue = II.transform(B, CRD);		const DataLayout &DL = OrigLoop->getHeader()->getModule()->getDataLayout();
		EndValue = II.transform(B, CRD, PSE.getSE(), DL);
EndValue->setName("ind.end");		EndValue->setName("ind.end");
}		}

// The new PHI merges the original incoming value, in case of a bypass,		// The new PHI merges the original incoming value, in case of a bypass,
// or the value at the end of the vectorized loop.		// or the value at the end of the vectorized loop.
BCResumeVal->addIncoming(EndValue, MiddleBlock);		BCResumeVal->addIncoming(EndValue, MiddleBlock);

// Fix the scalar body counter (PHI node).		// Fix the scalar body counter (PHI node).
▲ Show 20 Lines • Show All 875 Lines • ▼ Show 20 Lines	if (P->getParent() != OrigLoop->getHeader()) {
return;		return;
}		}

// This PHINode must be an induction variable.		// This PHINode must be an induction variable.
// Make sure that we know about it.		// Make sure that we know about it.
assert(Legal->getInductionVars()->count(P) && "Not an induction variable");		assert(Legal->getInductionVars()->count(P) && "Not an induction variable");

InductionDescriptor II = Legal->getInductionVars()->lookup(P);		InductionDescriptor II = Legal->getInductionVars()->lookup(P);
		const DataLayout &DL = OrigLoop->getHeader()->getModule()->getDataLayout();

// FIXME: The newly created binary instructions should contain nsw/nuw flags,		// FIXME: The newly created binary instructions should contain nsw/nuw flags,
// which can be found from the original scalar operations.		// which can be found from the original scalar operations.
switch (II.getKind()) {		switch (II.getKind()) {
case InductionDescriptor::IK_NoInduction:		case InductionDescriptor::IK_NoInduction:
llvm_unreachable("Unknown induction");		llvm_unreachable("Unknown induction");
case InductionDescriptor::IK_IntInduction: {		case InductionDescriptor::IK_IntInduction: {
assert(P->getType() == II.getStartValue()->getType() && "Types must match");		assert(P->getType() == II.getStartValue()->getType() && "Types must match");
// Handle other induction variables that are now based on the		// Handle other induction variables that are now based on the
// canonical one.		// canonical one.
Value *V = Induction;		Value *V = Induction;
if (P != OldInduction) {		if (P != OldInduction) {
V = Builder.CreateSExtOrTrunc(Induction, P->getType());		V = Builder.CreateSExtOrTrunc(Induction, P->getType());
V = II.transform(Builder, V);		V = II.transform(Builder, V, PSE.getSE(), DL);
V->setName("offset.idx");		V->setName("offset.idx");
}		}
Value *Broadcasted = getBroadcastInstrs(V);		Value *Broadcasted = getBroadcastInstrs(V);
// After broadcasting the induction variable we need to make the vector		// After broadcasting the induction variable we need to make the vector
// consecutive by adding 0, 1, 2, etc.		// consecutive by adding 0, 1, 2, etc.
for (unsigned part = 0; part < UF; ++part)		for (unsigned part = 0; part < UF; ++part)
Entry[part] = getStepVector(Broadcasted, VF * part, II.getStepValue());		Entry[part] = getStepVector(Broadcasted, VF * part, II.getStep());
return;		return;
}		}
case InductionDescriptor::IK_PtrInduction:		case InductionDescriptor::IK_PtrInduction:
// Handle the pointer induction variable case.		// Handle the pointer induction variable case.
assert(P->getType()->isPointerTy() && "Unexpected type.");		assert(P->getType()->isPointerTy() && "Unexpected type.");
// This is the normalized GEP that starts counting at zero.		// This is the normalized GEP that starts counting at zero.
Value *PtrInd = Induction;		Value *PtrInd = Induction;
PtrInd = Builder.CreateSExtOrTrunc(PtrInd, II.getStepValue()->getType());		PtrInd = Builder.CreateSExtOrTrunc(PtrInd, II.getStep()->getType());
// This is the vector of results. Notice that we don't generate		// This is the vector of results. Notice that we don't generate
// vector geps because scalar geps result in better code.		// vector geps because scalar geps result in better code.
for (unsigned part = 0; part < UF; ++part) {		for (unsigned part = 0; part < UF; ++part) {
if (VF == 1) {		if (VF == 1) {
int EltIndex = part;		int EltIndex = part;
Constant *Idx = ConstantInt::get(PtrInd->getType(), EltIndex);		Constant *Idx = ConstantInt::get(PtrInd->getType(), EltIndex);
Value *GlobalIdx = Builder.CreateAdd(PtrInd, Idx);		Value *GlobalIdx = Builder.CreateAdd(PtrInd, Idx);
Value *SclrGep = II.transform(Builder, GlobalIdx);		Value *SclrGep = II.transform(Builder, GlobalIdx, PSE.getSE(), DL);
SclrGep->setName("next.gep");		SclrGep->setName("next.gep");
Entry[part] = SclrGep;		Entry[part] = SclrGep;
continue;		continue;
}		}

Value *VecVal = UndefValue::get(VectorType::get(P->getType(), VF));		Value *VecVal = UndefValue::get(VectorType::get(P->getType(), VF));
for (unsigned int i = 0; i < VF; ++i) {		for (unsigned int i = 0; i < VF; ++i) {
int EltIndex = i + part * VF;		int EltIndex = i + part * VF;
Constant *Idx = ConstantInt::get(PtrInd->getType(), EltIndex);		Constant *Idx = ConstantInt::get(PtrInd->getType(), EltIndex);
Value *GlobalIdx = Builder.CreateAdd(PtrInd, Idx);		Value *GlobalIdx = Builder.CreateAdd(PtrInd, Idx);
Value *SclrGep = II.transform(Builder, GlobalIdx);		Value *SclrGep = II.transform(Builder, GlobalIdx, PSE.getSE(), DL);
SclrGep->setName("next.gep");		SclrGep->setName("next.gep");
VecVal = Builder.CreateInsertElement(VecVal, SclrGep,		VecVal = Builder.CreateInsertElement(VecVal, SclrGep,
Builder.getInt32(i), "insert.gep");		Builder.getInt32(i), "insert.gep");
}		}
Entry[part] = VecVal;		Entry[part] = VecVal;
}		}
return;		return;
}		}
▲ Show 20 Lines • Show All 119 Lines • ▼ Show 20 Lines	for (BasicBlock::iterator it = BB->begin(), e = BB->end(); it != e; ++it) {
case Instruction::IntToPtr:		case Instruction::IntToPtr:
case Instruction::SIToFP:		case Instruction::SIToFP:
case Instruction::UIToFP:		case Instruction::UIToFP:
case Instruction::Trunc:		case Instruction::Trunc:
case Instruction::FPTrunc:		case Instruction::FPTrunc:
case Instruction::BitCast: {		case Instruction::BitCast: {
CastInst *CI = dyn_cast<CastInst>(it);		CastInst *CI = dyn_cast<CastInst>(it);
setDebugLocFromInst(Builder, &*it);		setDebugLocFromInst(Builder, &*it);
/// Optimize the special case where the source is the induction		/// Optimize the special case where the source is a constant integer
/// variable. Notice that we can only optimize the 'trunc' case		/// induction variable. Notice that we can only optimize the 'trunc' case
/// because: a. FP conversions lose precision, b. sext/zext may wrap,		/// because: a. FP conversions lose precision, b. sext/zext may wrap,
/// c. other casts depend on pointer size.		/// c. other casts depend on pointer size.

if (CI->getOperand(0) == OldInduction &&		if (CI->getOperand(0) == OldInduction &&
it->getOpcode() == Instruction::Trunc) {		it->getOpcode() == Instruction::Trunc) {
Value *ScalarCast =
Builder.CreateCast(CI->getOpcode(), Induction, CI->getType());
Value *Broadcasted = getBroadcastInstrs(ScalarCast);
InductionDescriptor II =		InductionDescriptor II =
Legal->getInductionVars()->lookup(OldInduction);		Legal->getInductionVars()->lookup(OldInduction);
Constant *Step = ConstantInt::getSigned(		if (auto StepValue = II.getConstIntStepValue()) {
CI->getType(), II.getStepValue()->getSExtValue());		StepValue = ConstantInt::getSigned(cast<IntegerType>(CI->getType()),
		StepValue->getSExtValue());
		Value *ScalarCast = Builder.CreateCast(CI->getOpcode(), Induction,
		CI->getType());
		Value *Broadcasted = getBroadcastInstrs(ScalarCast);
for (unsigned Part = 0; Part < UF; ++Part)		for (unsigned Part = 0; Part < UF; ++Part)
Entry[Part] = getStepVector(Broadcasted, VF * Part, Step);		Entry[Part] = getStepVector(Broadcasted, VF * Part, StepValue);
addMetadata(Entry, &*it);		addMetadata(Entry, &*it);
break;		break;
}		}
		}
/// Vectorize casts.		/// Vectorize casts.
Type *DestTy =		Type *DestTy =
(VF == 1) ? CI->getType() : VectorType::get(CI->getType(), VF);		(VF == 1) ? CI->getType() : VectorType::get(CI->getType(), VF);

VectorParts &A = getVectorValue(it->getOperand(0));		VectorParts &A = getVectorValue(it->getOperand(0));
for (unsigned Part = 0; Part < UF; ++Part)		for (unsigned Part = 0; Part < UF; ++Part)
Entry[Part] = Builder.CreateCast(CI->getOpcode(), A[Part], DestTy);		Entry[Part] = Builder.CreateCast(CI->getOpcode(), A[Part], DestTy);
addMetadata(Entry, &*it);		addMetadata(Entry, &*it);
▲ Show 20 Lines • Show All 344 Lines • ▼ Show 20 Lines	bool LoopVectorizationLegality::addInductionPhi(PHINode *Phi,
// Get the widest type.		// Get the widest type.
if (!WidestIndTy)		if (!WidestIndTy)
WidestIndTy = convertPointerToIntegerType(DL, PhiTy);		WidestIndTy = convertPointerToIntegerType(DL, PhiTy);
else		else
WidestIndTy = getWiderType(DL, PhiTy, WidestIndTy);		WidestIndTy = getWiderType(DL, PhiTy, WidestIndTy);

// Int inductions are special because we only allow one IV.		// Int inductions are special because we only allow one IV.
if (ID.getKind() == InductionDescriptor::IK_IntInduction &&		if (ID.getKind() == InductionDescriptor::IK_IntInduction &&
ID.getStepValue()->isOne() &&		ID.getConstIntStepValue() &&
		ID.getConstIntStepValue()->isOne() &&
isa<Constant>(ID.getStartValue()) &&		isa<Constant>(ID.getStartValue()) &&
cast<Constant>(ID.getStartValue())->isNullValue()) {		cast<Constant>(ID.getStartValue())->isNullValue()) {

// Use the phi node with the widest type as induction. Use the last		// Use the phi node with the widest type as induction. Use the last
// one if there are multiple (no good reason for doing this other		// one if there are multiple (no good reason for doing this other
// than it is expedient). We've checked that it begins at zero and		// than it is expedient). We've checked that it begins at zero and
// steps by one, so this is a canonical induction variable.		// steps by one, so this is a canonical induction variable.
if (!Induction \|\| PhiTy == WidestIndTy)		if (!Induction \|\| PhiTy == WidestIndTy)
Induction = Phi;		Induction = Phi;
}		}

▲ Show 20 Lines • Show All 1,620 Lines • ▼ Show 20 Lines	void InnerLoopUnroller::vectorizeMemoryInstruction(Instruction *Instr) {

return scalarizeInstruction(Instr, IfPredicateStore);		return scalarizeInstruction(Instr, IfPredicateStore);
}		}

Value InnerLoopUnroller::reverseVector(Value Vec) { return Vec; }		Value InnerLoopUnroller::reverseVector(Value Vec) { return Vec; }

Value InnerLoopUnroller::getBroadcastInstrs(Value V) { return V; }		Value InnerLoopUnroller::getBroadcastInstrs(Value V) { return V; }

		Value InnerLoopUnroller::getStepVector(Value Val, int StartIdx,
		const SCEV *StepSCEV) {
		const DataLayout &DL = OrigLoop->getHeader()->getModule()->getDataLayout();
		SCEVExpander Exp(*PSE.getSE(), DL, "induction");
		Value *StepValue = Exp.expandCodeFor(StepSCEV, StepSCEV->getType(),
		&*Builder.GetInsertPoint());
		return getStepVector(Val, StartIdx, StepValue);
		}

Value InnerLoopUnroller::getStepVector(Value Val, int StartIdx, Value *Step) {		Value InnerLoopUnroller::getStepVector(Value Val, int StartIdx, Value *Step) {
// When unrolling and the VF is 1, we only need to add a simple scalar.		// When unrolling and the VF is 1, we only need to add a simple scalar.
Type *ITy = Val->getType();		Type *ITy = Val->getType();
assert(!ITy->isVectorTy() && "Val must be a scalar");		assert(!ITy->isVectorTy() && "Val must be a scalar");
Constant *C = ConstantInt::get(ITy, StartIdx);		Constant *C = ConstantInt::get(ITy, StartIdx);
return Builder.CreateAdd(Val, Builder.CreateMul(C, Step), "induction");		return Builder.CreateAdd(Val, Builder.CreateMul(C, Step), "induction");
}		}

llvm/trunk/test/Transforms/LoopVectorize/induction-step.ll

				; RUN: opt < %s -loop-vectorize -force-vector-interleave=1 -force-vector-width=8 -S \| FileCheck %s

				; int int_inc;
				;
				;int induction_with_global(int init, int *restrict A, int N) {
				; int x = init;
				; for (int i=0;i<N;i++){
				; A[i] = x;
				; x += int_inc;
				; }
				; return x;
				;}

				; CHECK-LABEL: @induction_with_global(
				; CHECK: %[[INT_INC:.]] = load i32, i32 @int_inc, align 4
				; CHECK: vector.body:
				; CHECK: %[[VAR1:.*]] = insertelement <8 x i32> undef, i32 %[[INT_INC]], i32 0
				; CHECK: %[[VAR2:.*]] = shufflevector <8 x i32> %[[VAR1]], <8 x i32> undef, <8 x i32> zeroinitializer
				; CHECK: mul <8 x i32> <i32 0, i32 1, i32 2, i32 3, i32 4, i32 5, i32 6, i32 7>, %[[VAR2]]

				target datalayout = "e-m:e-i64:64-f80:128-n8:16:32:64-S128"


				@int_inc = common global i32 0, align 4

				define i32 @induction_with_global(i32 %init, i32* noalias nocapture %A, i32 %N) {
				entry:
				%cmp4 = icmp sgt i32 %N, 0
				br i1 %cmp4, label %for.body.lr.ph, label %for.end

				for.body.lr.ph: ; preds = %entry
				%0 = load i32, i32* @int_inc, align 4
				%1 = mul i32 %0, %N
				br label %for.body

				for.body: ; preds = %for.body, %for.body.lr.ph
				%indvars.iv = phi i64 [ 0, %for.body.lr.ph ], [ %indvars.iv.next, %for.body ]
				%x.05 = phi i32 [ %init, %for.body.lr.ph ], [ %add, %for.body ]
				%arrayidx = getelementptr inbounds i32, i32* %A, i64 %indvars.iv
				store i32 %x.05, i32* %arrayidx, align 4
				%add = add nsw i32 %0, %x.05
				%indvars.iv.next = add nuw nsw i64 %indvars.iv, 1
				%lftr.wideiv = trunc i64 %indvars.iv.next to i32
				%exitcond = icmp eq i32 %lftr.wideiv, %N
				br i1 %exitcond, label %for.end.loopexit, label %for.body

				for.end.loopexit: ; preds = %for.body
				%2 = add i32 %1, %init
				br label %for.end

				for.end: ; preds = %for.end.loopexit, %entry
				%x.0.lcssa = phi i32 [ %init, %entry ], [ %2, %for.end.loopexit ]
				ret i32 %x.0.lcssa
				}


				;int induction_with_loop_inv(int init, int *restrict A, int N, int M) {
				; int x = init;
				; for (int j = 0; j < M; j++) {
				; for (int i=0; i<N; i++){
				; A[i] = x;
				; x += j; // induction step is a loop invariant variable
				; }
				; }
				; return x;
				;}

				; CHECK-LABEL: @induction_with_loop_inv(
				; CHECK: for.cond1.preheader:
				; CHECK: %[[INDVAR0:.*]] = phi i32 [ 0,
				; CHECK: %[[INDVAR1:.*]] = phi i32 [ 0,
				; CHECK: vector.body:
				; CHECK: %[[VAR1:.*]] = insertelement <8 x i32> undef, i32 %[[INDVAR1]], i32 0
				; CHECK: %[[VAR2:.*]] = shufflevector <8 x i32> %[[VAR1]], <8 x i32> undef, <8 x i32> zeroinitializer
				; CHECK: mul <8 x i32> <i32 0, i32 1, i32 2, i32 3, i32 4, i32 5, i32 6, i32 7>, %[[VAR2]]

				define i32 @induction_with_loop_inv(i32 %init, i32* noalias nocapture %A, i32 %N, i32 %M) {
				entry:
				%cmp10 = icmp sgt i32 %M, 0
				br i1 %cmp10, label %for.cond1.preheader.lr.ph, label %for.end6

				for.cond1.preheader.lr.ph: ; preds = %entry
				%cmp27 = icmp sgt i32 %N, 0
				br label %for.cond1.preheader

				for.cond1.preheader: ; preds = %for.inc4, %for.cond1.preheader.lr.ph
				%indvars.iv15 = phi i32 [ 0, %for.cond1.preheader.lr.ph ], [ %indvars.iv.next16, %for.inc4 ]
				%j.012 = phi i32 [ 0, %for.cond1.preheader.lr.ph ], [ %inc5, %for.inc4 ]
				%x.011 = phi i32 [ %init, %for.cond1.preheader.lr.ph ], [ %x.1.lcssa, %for.inc4 ]
				br i1 %cmp27, label %for.body3.preheader, label %for.inc4

				for.body3.preheader: ; preds = %for.cond1.preheader
				br label %for.body3

				for.body3: ; preds = %for.body3.preheader, %for.body3
				%indvars.iv = phi i64 [ %indvars.iv.next, %for.body3 ], [ 0, %for.body3.preheader ]
				%x.18 = phi i32 [ %add, %for.body3 ], [ %x.011, %for.body3.preheader ]
				%arrayidx = getelementptr inbounds i32, i32* %A, i64 %indvars.iv
				store i32 %x.18, i32* %arrayidx, align 4
				%add = add nsw i32 %x.18, %j.012
				%indvars.iv.next = add nuw nsw i64 %indvars.iv, 1
				%lftr.wideiv = trunc i64 %indvars.iv.next to i32
				%exitcond = icmp eq i32 %lftr.wideiv, %N
				br i1 %exitcond, label %for.inc4.loopexit, label %for.body3

				for.inc4.loopexit: ; preds = %for.body3
				%0 = add i32 %x.011, %indvars.iv15
				br label %for.inc4

				for.inc4: ; preds = %for.inc4.loopexit, %for.cond1.preheader
				%x.1.lcssa = phi i32 [ %x.011, %for.cond1.preheader ], [ %0, %for.inc4.loopexit ]
				%inc5 = add nuw nsw i32 %j.012, 1
				%indvars.iv.next16 = add i32 %indvars.iv15, %N
				%exitcond17 = icmp eq i32 %inc5, %M
				br i1 %exitcond17, label %for.end6.loopexit, label %for.cond1.preheader

				for.end6.loopexit: ; preds = %for.inc4
				%x.1.lcssa.lcssa = phi i32 [ %x.1.lcssa, %for.inc4 ]
				br label %for.end6

				for.end6: ; preds = %for.end6.loopexit, %entry
				%x.0.lcssa = phi i32 [ %init, %entry ], [ %x.1.lcssa.lcssa, %for.end6.loopexit ]
				ret i32 %x.0.lcssa
				}

This is an archive of the discontinued LLVM Phabricator instance.

Loop vectorization with induction variable with non-constant step.ClosedPublic

Details

Diff Detail

Event Timeline

Revision Contents

Diff 56667

llvm/trunk/include/llvm/Transforms/Utils/LoopUtils.h

llvm/trunk/lib/Transforms/Utils/LoopUtils.cpp

llvm/trunk/lib/Transforms/Vectorize/LoopVectorize.cpp

llvm/trunk/test/Transforms/LoopVectorize/induction-step.ll

Loop vectorization with induction variable with non-constant step.
ClosedPublic