This is an archive of the discontinued LLVM Phabricator instance.

[LV] Enable vectorization of loops where the IV has an external use
ClosedPublic

Authored by mkuper on Jun 6 2016, 5:32 PM.

Download Raw Diff

Details

Reviewers

delena
wmi
jmolloy
mssimpso
hfinkel

Commits

rG23b6d6adc9dd: [LV] Enable vectorization of loops where the IV has an external use
rL272715: [LV] Enable vectorization of loops where the IV has an external use

Summary

Vectorizing loops with "escaping" IVs has been disabled since it was discovered to not work correctly (PR17179).
This patch re-enables it, with support for external use of both "pre-increment" and "post-increment" (that is, last and second-to-last iteration) IVs.

Diff Detail

Repository: rL LLVM

Event Timeline

mkuper updated this revision to Diff 59811.Jun 6 2016, 5:32 PM

mkuper retitled this revision from to [LV] Enable vectorization of loops where the IV has an external use.

mkuper updated this object.

mkuper added reviewers: delena, hfinkel, jmolloy, wmi, mssimpso.

mkuper added subscribers: nadav, danielcdh, davidxl, llvm-commits.

Herald added a subscriber: mzolotukhin. · View Herald TranscriptJun 6 2016, 5:32 PM

ping

LGTM.

wmi added inline comments.Jun 14 2016, 11:36 AM

lib/Transforms/Vectorize/LoopVectorize.cpp
3301 ↗	(On Diff #59811)	Why PrevValue is necessary? In which case OrigPhi->users() can have more than one use outside loop?
4806 ↗	(On Diff #59811)	addInductionPhi will return true anyway now. So maybe change its return val to void and remove the if?

mkuper added inline comments.Jun 14 2016, 12:38 PM

lib/Transforms/Vectorize/LoopVectorize.cpp
3301 ↗	(On Diff #59811)	I think you're right. LCSSA should canonicalize this to one phi per exit block, and we only vectorize loops with a single exit block right now. I'll change it, thanks!
4806 ↗	(On Diff #59811)	Yes, of course, I didn't notice I removed the only false path.

Updated per Wei's comments.

wmi accepted this revision.Jun 14 2016, 1:53 PM

wmi edited edge metadata.

This revision is now accepted and ready to land.Jun 14 2016, 1:53 PM

Closed by commit rL272715: [LV] Enable vectorization of loops where the IV has an external use (authored by mkuper). · Explain WhyJun 14 2016, 2:34 PM

This revision was automatically updated to reflect the committed changes.

An alternative which I'm sure you thought of would be to fix/clean up such external users of IV's as a preparatory step (SimplifyIndVar?), eliminating them from the loop before starting to vectorize it. This may be a good thing to do early, for other "uses".

It may be somewhat more efficient to traverse the LCSSA phi's at the single exit block that are fed by allowed-to-exit IV's in order to fix/clean them up, instead of traversing mostly irrelevant internal uses in search for out-of-loop ones.

llvm/trunk/lib/Transforms/Vectorize/LoopVectorize.cpp
3299–3300	need[s] "simplest" as it employs II.transform, which takes care of pointers as well; one could argue that doing EndValue - Step is simpler..

In D21048#459394, @Ayal wrote:

An alternative which I'm sure you thought of would be to fix/clean up such external users of IV's as a preparatory step (SimplifyIndVar?), eliminating them from the loop before starting to vectorize it. This may be a good thing to do early, for other "uses".

Yes, in fact, that's what I've started with, but abandoned that direction.
This is a question of cost modeling. SimplifyIndVar will already clean this up if it considers generating the end value cheap enough. And it seems like this decision should not depend on whether it expects vectorization in the future or not.

It may be somewhat more efficient to traverse the LCSSA phi's at the single exit block that are fed by allowed-to-exit IV's in order to fix/clean them up, instead of traversing mostly irrelevant internal uses in search for out-of-loop ones.

I'm not sure it's much better. If the LCSSA phi uses the IV phi directly, it is. If it uses the value feeding into the IV phi, then we still need to find the IV this value belongs to. So, either have additional book-keeping, or go over the value's uses to find the phi.
If you think it may be significantly better, I can implement it, and see how it looks.

llvm/trunk/lib/Transforms/Vectorize/LoopVectorize.cpp
3299–3300	Yes, I want this to fire for pointer IVs as well, just enabling it step-by-step.

In D21048#459423, @mkuper wrote:

In D21048#459394, @Ayal wrote:

An alternative which I'm sure you thought of would be to fix/clean up such external users of IV's as a preparatory step (SimplifyIndVar?), eliminating them from the loop before starting to vectorize it. This may be a good thing to do early, for other "uses".

Yes, in fact, that's what I've started with, but abandoned that direction.
This is a question of cost modeling. SimplifyIndVar will already clean this up if it considers generating the end value cheap enough. And it seems like this decision should not depend on whether it expects vectorization in the future or not.

It could potentially help other uses as well, but ok, they'd be hard to anticipate as well. The alternative was firstly referring to cleaning this up SimplifyIndVar-style after we decide to vectorize the loop and before we start creating an empty loop etc.

It may be somewhat more efficient to traverse the LCSSA phi's at the single exit block that are fed by allowed-to-exit IV's in order to fix/clean them up, instead of traversing mostly irrelevant internal uses in search for out-of-loop ones.

I'm not sure it's much better. If the LCSSA phi uses the IV phi directly, it is. If it uses the value feeding into the IV phi, then we still need to find the IV this value belongs to. So, either have additional book-keeping, or go over the value's uses to find the phi.

or find the phi by looking at the defs feeding this value.

If you think it may be significantly better, I can implement it, and see how it looks.

Ah, I would expect this to have negligible effect if any. Just noted to keep in mind if one does go back to implement the SimplifyIndVar alternative.

llvm/trunk/lib/Transforms/Vectorize/LoopVectorize.cpp
3299–3300	BTW, another alternative is to extract the last element from the vectorized IV; or the element before last. But that is less amenable to further passes than the scalar computation.

Revision Contents

Path

Size

llvm/

trunk/

lib/

Transforms/

Vectorize/

LoopVectorize.cpp

106 lines

test/

Transforms/

LoopVectorize/

iv_outside_user.ll

84 lines

no_outside_user.ll

32 lines

Diff 60757

llvm/trunk/lib/Transforms/Vectorize/LoopVectorize.cpp

Show First 20 Lines • Show All 349 Lines • ▼ Show 20 Lines	protected:

// When we if-convert we need to create edge masks. We have to cache values		// When we if-convert we need to create edge masks. We have to cache values
// so that we don't end up with exponential recursion/IR.		// so that we don't end up with exponential recursion/IR.
typedef DenseMap<std::pair<BasicBlock , BasicBlock >, VectorParts>		typedef DenseMap<std::pair<BasicBlock , BasicBlock >, VectorParts>
EdgeMaskCache;		EdgeMaskCache;

/// Create an empty loop, based on the loop ranges of the old loop.		/// Create an empty loop, based on the loop ranges of the old loop.
void createEmptyLoop();		void createEmptyLoop();

		/// Set up the values of the IVs correctly when exiting the vector loop.
		void fixupIVUsers(PHINode *OrigPhi, const InductionDescriptor &II,
		Value CountRoundDown, Value EndValue,
		BasicBlock *MiddleBlock);

/// Create a new induction variable inside L.		/// Create a new induction variable inside L.
PHINode createInductionVariable(Loop L, Value Start, Value End,		PHINode createInductionVariable(Loop L, Value Start, Value End,
Value Step, Instruction DL);		Value Step, Instruction DL);
/// Copy and widen the instructions from the old loop.		/// Copy and widen the instructions from the old loop.
virtual void vectorizeLoop();		virtual void vectorizeLoop();

/// Fix a first-order recurrence. This is the second phase of vectorizing		/// Fix a first-order recurrence. This is the second phase of vectorizing
/// this phi node.		/// this phi node.
▲ Show 20 Lines • Show All 1,062 Lines • ▼ Show 20 Lines	private:
bool blockCanBePredicated(BasicBlock BB, SmallPtrSetImpl<Value > &SafePtrs);		bool blockCanBePredicated(BasicBlock BB, SmallPtrSetImpl<Value > &SafePtrs);

/// \brief Collect memory access with loop invariant strides.		/// \brief Collect memory access with loop invariant strides.
///		///
/// Looks for accesses like "a[i * StrideA]" where "StrideA" is loop		/// Looks for accesses like "a[i * StrideA]" where "StrideA" is loop
/// invariant.		/// invariant.
void collectStridedAccess(Value *LoadOrStoreInst);		void collectStridedAccess(Value *LoadOrStoreInst);

/// \brief Returns true if we can vectorize using this PHI node as an
/// induction.
///
/// Updates the vectorization state by adding \p Phi to the inductions list.		/// Updates the vectorization state by adding \p Phi to the inductions list.
/// This can set \p Phi as the main induction of the loop if \p Phi is a		/// This can set \p Phi as the main induction of the loop if \p Phi is a
/// better choice for the main induction than the existing one.		/// better choice for the main induction than the existing one.
bool addInductionPhi(PHINode *Phi, InductionDescriptor ID);		void addInductionPhi(PHINode *Phi, InductionDescriptor ID,
		SmallPtrSetImpl<Value *> &AllowedExit);

/// Report an analysis message to assist the user in diagnosing loops that are		/// Report an analysis message to assist the user in diagnosing loops that are
/// not vectorized. These are handled as LoopAccessReport rather than		/// not vectorized. These are handled as LoopAccessReport rather than
/// VectorizationReport because the << operator of VectorizationReport returns		/// VectorizationReport because the << operator of VectorizationReport returns
/// LoopAccessReport.		/// LoopAccessReport.
void emitAnalysis(const LoopAccessReport &Message) const {		void emitAnalysis(const LoopAccessReport &Message) const {
emitAnalysisDiag(TheFunction, TheLoop, *Hints, Message);		emitAnalysisDiag(TheFunction, TheLoop, *Hints, Message);
}		}
Show All 37 Lines	private:
/// Notice that inductions don't need to start at zero and that induction		/// Notice that inductions don't need to start at zero and that induction
/// variables can be pointers.		/// variables can be pointers.
InductionList Inductions;		InductionList Inductions;
/// Holds the phi nodes that are first-order recurrences.		/// Holds the phi nodes that are first-order recurrences.
RecurrenceSet FirstOrderRecurrences;		RecurrenceSet FirstOrderRecurrences;
/// Holds the widest induction type encountered.		/// Holds the widest induction type encountered.
Type *WidestIndTy;		Type *WidestIndTy;

/// Allowed outside users. This holds the reduction		/// Allowed outside users. This holds the induction and reduction
/// vars which can be accessed from outside the loop.		/// vars which can be accessed from outside the loop.
SmallPtrSet<Value *, 4> AllowedExit;		SmallPtrSet<Value *, 4> AllowedExit;
/// This set holds the variables which are known to be uniform after		/// This set holds the variables which are known to be uniform after
/// vectorization.		/// vectorization.
SmallPtrSet<Instruction *, 4> Uniforms;		SmallPtrSet<Instruction *, 4> Uniforms;

/// Can we assume the absence of NaNs.		/// Can we assume the absence of NaNs.
bool HasFunNoNaNAttr;		bool HasFunNoNaNAttr;
▲ Show 20 Lines • Show All 1,709 Lines • ▼ Show 20 Lines	if (OrigPhi == OldInduction) {
EndValue = II.transform(B, CRD, PSE.getSE(), DL);		EndValue = II.transform(B, CRD, PSE.getSE(), DL);
EndValue->setName("ind.end");		EndValue->setName("ind.end");
}		}

// The new PHI merges the original incoming value, in case of a bypass,		// The new PHI merges the original incoming value, in case of a bypass,
// or the value at the end of the vectorized loop.		// or the value at the end of the vectorized loop.
BCResumeVal->addIncoming(EndValue, MiddleBlock);		BCResumeVal->addIncoming(EndValue, MiddleBlock);

		// Fix up external users of the induction variable.
		fixupIVUsers(OrigPhi, II, CountRoundDown, EndValue, MiddleBlock);

// Fix the scalar body counter (PHI node).		// Fix the scalar body counter (PHI node).
unsigned BlockIdx = OrigPhi->getBasicBlockIndex(ScalarPH);		unsigned BlockIdx = OrigPhi->getBasicBlockIndex(ScalarPH);

// The old induction's phi node in the scalar body needs the truncated		// The old induction's phi node in the scalar body needs the truncated
// value.		// value.
for (unsigned I = 0, E = LoopBypassBlocks.size(); I != E; ++I)		for (unsigned I = 0, E = LoopBypassBlocks.size(); I != E; ++I)
BCResumeVal->addIncoming(II.getStartValue(), LoopBypassBlocks[I]);		BCResumeVal->addIncoming(II.getStartValue(), LoopBypassBlocks[I]);
OrigPhi->setIncomingValue(BlockIdx, BCResumeVal);		OrigPhi->setIncomingValue(BlockIdx, BCResumeVal);
Show All 23 Lines	void InnerLoopVectorizer::createEmptyLoop() {
// replace the vectorizer-specific hints below).		// replace the vectorizer-specific hints below).
if (MDNode *LID = OrigLoop->getLoopID())		if (MDNode *LID = OrigLoop->getLoopID())
Lp->setLoopID(LID);		Lp->setLoopID(LID);

LoopVectorizeHints Hints(Lp, true);		LoopVectorizeHints Hints(Lp, true);
Hints.setAlreadyVectorized();		Hints.setAlreadyVectorized();
}		}

		// Fix up external users of the induction variable. At this point, we are
		// in LCSSA form, with all external PHIs that use the IV having one input value,
		// coming from the remainder loop. We need those PHIs to also have a correct
		// value for the IV when arriving directly from the middle block.
		void InnerLoopVectorizer::fixupIVUsers(PHINode *OrigPhi,
		const InductionDescriptor &II,
		Value CountRoundDown, Value EndValue,
		BasicBlock *MiddleBlock) {
		// There are two kinds of external IV usages - those that use the value
		// computed in the last iteration (the PHI) and those that use the penultimate
		// value (the value that feeds into the phi from the loop latch).
		// We allow both, but they, obviously, have different values.

		// We only expect at most one of each kind of user. This is because LCSSA will
		// canonicalize the users to a single PHI node per exit block, and we
		// currently only vectorize loops with a single exit.
		assert(OrigLoop->getExitBlock() && "Expected a single exit block");

		// An external user of the last iteration's value should see the value that
		// the remainder loop uses to initialize its own IV.
		Value *PostInc = OrigPhi->getIncomingValueForBlock(OrigLoop->getLoopLatch());
		for (User *U : PostInc->users()) {
		Instruction *UI = cast<Instruction>(U);
		if (!OrigLoop->contains(UI)) {
		assert(isa<PHINode>(UI) && "Expected LCSSA form");
		cast<PHINode>(UI)->addIncoming(EndValue, MiddleBlock);
		break;
		}
		}

		// An external user of the penultimate value need to see EndValue - Step.
		// The simplest way to get this is to recompute it from the constituent SCEVs,
		// that is Start + (Step * (CRD - 1)).
		AyalUnsubmitted Not Done Reply Inline Actions need[s] "simplest" as it employs II.transform, which takes care of pointers as well; one could argue that doing EndValue - Step is simpler.. Ayal: need[s] "simplest" as it employs II.transform, which takes care of pointers as well; one could…
		mkuperAuthorUnsubmitted Not Done Reply Inline Actions Yes, I want this to fire for pointer IVs as well, just enabling it step-by-step. mkuper: Yes, I want this to fire for pointer IVs as well, just enabling it step-by-step.
		AyalUnsubmitted Not Done Reply Inline Actions BTW, another alternative is to extract the last element from the vectorized IV; or the element before last. But that is less amenable to further passes than the scalar computation. Ayal: BTW, another alternative is to extract the last element from the vectorized IV; or the element…
		for (User *U : OrigPhi->users()) {
		Instruction *UI = cast<Instruction>(U);
		if (!OrigLoop->contains(UI)) {
		assert(isa<PHINode>(UI) && "Expected LCSSA form");
		const DataLayout &DL =
		OrigLoop->getHeader()->getModule()->getDataLayout();

		IRBuilder<> B(MiddleBlock->getTerminator());
		Value *CountMinusOne = B.CreateSub(
		CountRoundDown, ConstantInt::get(CountRoundDown->getType(), 1));
		Value *CMO = B.CreateSExtOrTrunc(CountMinusOne, II.getStep()->getType(),
		"cast.cmo");
		Value *Escape = II.transform(B, CMO, PSE.getSE(), DL);
		Escape->setName("ind.escape");
		cast<PHINode>(UI)->addIncoming(Escape, MiddleBlock);
		break;
		}
		}
		}

namespace {		namespace {
struct CSEDenseMapInfo {		struct CSEDenseMapInfo {
static bool canHandle(Instruction *I) {		static bool canHandle(Instruction *I) {
return isa<InsertElementInst>(I) \|\| isa<ExtractElementInst>(I) \|\|		return isa<InsertElementInst>(I) \|\| isa<ExtractElementInst>(I) \|\|
isa<ShuffleVectorInst>(I) \|\| isa<GetElementPtrInst>(I);		isa<ShuffleVectorInst>(I) \|\| isa<GetElementPtrInst>(I);
}		}
static inline Instruction *getEmptyKey() {		static inline Instruction *getEmptyKey() {
return DenseMapInfo<Instruction *>::getEmptyKey();		return DenseMapInfo<Instruction *>::getEmptyKey();
▲ Show 20 Lines • Show All 1,365 Lines • ▼ Show 20 Lines	static Type getWiderType(const DataLayout &DL, Type Ty0, Type *Ty1) {
if (Ty0->getScalarSizeInBits() > Ty1->getScalarSizeInBits())		if (Ty0->getScalarSizeInBits() > Ty1->getScalarSizeInBits())
return Ty0;		return Ty0;
return Ty1;		return Ty1;
}		}

/// \brief Check that the instruction has outside loop users and is not an		/// \brief Check that the instruction has outside loop users and is not an
/// identified reduction variable.		/// identified reduction variable.
static bool hasOutsideLoopUser(const Loop TheLoop, Instruction Inst,		static bool hasOutsideLoopUser(const Loop TheLoop, Instruction Inst,
SmallPtrSetImpl<Value *> &Reductions) {		SmallPtrSetImpl<Value *> &AllowedExit) {
// Reduction instructions are allowed to have exit users. All other		// Reduction and Induction instructions are allowed to have exit users. All
// instructions must not have external users.		// other instructions must not have external users.
if (!Reductions.count(Inst))		if (!AllowedExit.count(Inst))
// Check that all of the users of the loop are inside the BB.		// Check that all of the users of the loop are inside the BB.
for (User *U : Inst->users()) {		for (User *U : Inst->users()) {
Instruction *UI = cast<Instruction>(U);		Instruction *UI = cast<Instruction>(U);
// This user may be a reduction exit value.		// This user may be a reduction exit value.
if (!TheLoop->contains(UI)) {		if (!TheLoop->contains(UI)) {
DEBUG(dbgs() << "LV: Found an outside user for : " << *UI << '\n');		DEBUG(dbgs() << "LV: Found an outside user for : " << *UI << '\n');
return true;		return true;
}		}
}		}
return false;		return false;
}		}

bool LoopVectorizationLegality::addInductionPhi(PHINode *Phi,		void LoopVectorizationLegality::addInductionPhi(
InductionDescriptor ID) {		PHINode *Phi, InductionDescriptor ID,
		SmallPtrSetImpl<Value *> &AllowedExit) {
Inductions[Phi] = ID;		Inductions[Phi] = ID;
Type *PhiTy = Phi->getType();		Type *PhiTy = Phi->getType();
const DataLayout &DL = Phi->getModule()->getDataLayout();		const DataLayout &DL = Phi->getModule()->getDataLayout();

// Get the widest type.		// Get the widest type.
if (!WidestIndTy)		if (!WidestIndTy)
WidestIndTy = convertPointerToIntegerType(DL, PhiTy);		WidestIndTy = convertPointerToIntegerType(DL, PhiTy);
else		else
Show All 9 Lines	if (ID.getKind() == InductionDescriptor::IK_IntInduction &&
// Use the phi node with the widest type as induction. Use the last		// Use the phi node with the widest type as induction. Use the last
// one if there are multiple (no good reason for doing this other		// one if there are multiple (no good reason for doing this other
// than it is expedient). We've checked that it begins at zero and		// than it is expedient). We've checked that it begins at zero and
// steps by one, so this is a canonical induction variable.		// steps by one, so this is a canonical induction variable.
if (!Induction \|\| PhiTy == WidestIndTy)		if (!Induction \|\| PhiTy == WidestIndTy)
Induction = Phi;		Induction = Phi;
}		}

DEBUG(dbgs() << "LV: Found an induction variable.\n");		// Both the PHI node itself, and the "post-increment" value feeding
		// back into the PHI node may have external users.
// Until we explicitly handle the case of an induction variable with		AllowedExit.insert(Phi);
// an outside loop user we have to give up vectorizing this loop.		AllowedExit.insert(Phi->getIncomingValueForBlock(TheLoop->getLoopLatch()));
if (hasOutsideLoopUser(TheLoop, Phi, AllowedExit)) {
emitAnalysis(VectorizationReport(Phi) <<
"use of induction value outside of the "
"loop is not handled by vectorizer");
return false;
}

return true;		DEBUG(dbgs() << "LV: Found an induction variable.\n");
		return;
}		}

bool LoopVectorizationLegality::canVectorizeInstrs() {		bool LoopVectorizationLegality::canVectorizeInstrs() {
BasicBlock *Header = TheLoop->getHeader();		BasicBlock *Header = TheLoop->getHeader();

// Look for the attribute signaling the absence of NaNs.		// Look for the attribute signaling the absence of NaNs.
Function &F = *Header->getParent();		Function &F = *Header->getParent();
HasFunNoNaNAttr =		HasFunNoNaNAttr =
▲ Show 20 Lines • Show All 47 Lines • ▼ Show 20 Lines	for (BasicBlock::iterator it = (bb)->begin(), e = (bb)->end(); it != e;
Requirements->addUnsafeAlgebraInst(RedDes.getUnsafeAlgebraInst());		Requirements->addUnsafeAlgebraInst(RedDes.getUnsafeAlgebraInst());
AllowedExit.insert(RedDes.getLoopExitInstr());		AllowedExit.insert(RedDes.getLoopExitInstr());
Reductions[Phi] = RedDes;		Reductions[Phi] = RedDes;
continue;		continue;
}		}

InductionDescriptor ID;		InductionDescriptor ID;
if (InductionDescriptor::isInductionPHI(Phi, PSE, ID)) {		if (InductionDescriptor::isInductionPHI(Phi, PSE, ID)) {
if (!addInductionPhi(Phi, ID))		addInductionPhi(Phi, ID, AllowedExit);
return false;
continue;		continue;
}		}

if (RecurrenceDescriptor::isFirstOrderRecurrence(Phi, TheLoop, DT)) {		if (RecurrenceDescriptor::isFirstOrderRecurrence(Phi, TheLoop, DT)) {
FirstOrderRecurrences.insert(Phi);		FirstOrderRecurrences.insert(Phi);
continue;		continue;
}		}

// As a last resort, coerce the PHI to a AddRec expression		// As a last resort, coerce the PHI to a AddRec expression
// and re-try classifying it a an induction PHI.		// and re-try classifying it a an induction PHI.
if (InductionDescriptor::isInductionPHI(Phi, PSE, ID, true)) {		if (InductionDescriptor::isInductionPHI(Phi, PSE, ID, true)) {
if (!addInductionPhi(Phi, ID))		addInductionPhi(Phi, ID, AllowedExit);
return false;
continue;		continue;
}		}

emitAnalysis(VectorizationReport(&*it)		emitAnalysis(VectorizationReport(&*it)
<< "value that could not be identified as "		<< "value that could not be identified as "
"reduction is used outside the loop");		"reduction is used outside the loop");
DEBUG(dbgs() << "LV: Found an unidentified PHI." << *Phi << "\n");		DEBUG(dbgs() << "LV: Found an unidentified PHI." << *Phi << "\n");
return false;		return false;
▲ Show 20 Lines • Show All 1,633 Lines • Show Last 20 Lines

llvm/trunk/test/Transforms/LoopVectorize/iv_outside_user.ll

				; RUN: opt -S -loop-vectorize -force-vector-interleave=1 -force-vector-width=2 < %s \| FileCheck %s

				; CHECK-LABEL: @postinc
				; CHECK-LABEL: scalar.ph:
				; CHECK: %bc.resume.val = phi i32 [ %n.vec, %middle.block ], [ 0, %entry ]
				; CHECK-LABEL: for.end:
				; CHECK: %[[RET:.]] = phi i32 [ {{.}}, %for.body ], [ %n.vec, %middle.block ]
				; CHECK: ret i32 %[[RET]]
				define i32 @postinc(i32 %k) {
				entry:
				br label %for.body

				for.body:
				%inc.phi = phi i32 [ 0, %entry ], [ %inc, %for.body ]
				%inc = add nsw i32 %inc.phi, 1
				%cmp = icmp eq i32 %inc, %k
				br i1 %cmp, label %for.end, label %for.body

				for.end:
				ret i32 %inc
				}

				; CHECK-LABEL: @preinc
				; CHECK-LABEL: middle.block:
				; CHECK: %3 = sub i32 %n.vec, 1
				; CHECK: %ind.escape = add i32 0, %3
				; CHECK-LABEL: scalar.ph:
				; CHECK: %bc.resume.val = phi i32 [ %n.vec, %middle.block ], [ 0, %entry ]
				; CHECK-LABEL: for.end:
				; CHECK: %[[RET:.]] = phi i32 [ {{.}}, %for.body ], [ %ind.escape, %middle.block ]
				; CHECK: ret i32 %[[RET]]
				define i32 @preinc(i32 %k) {
				entry:
				br label %for.body

				for.body:
				%inc.phi = phi i32 [ 0, %entry ], [ %inc, %for.body ]
				%inc = add nsw i32 %inc.phi, 1
				%cmp = icmp eq i32 %inc, %k
				br i1 %cmp, label %for.end, label %for.body

				for.end:
				ret i32 %inc.phi
				}

				; CHECK-LABEL: @constpre
				; CHECK-LABEL: for.end:
				; CHECK: %[[RET:.]] = phi i32 [ {{.}}, %for.body ], [ 2, %middle.block ]
				; CHECK: ret i32 %[[RET]]
				define i32 @constpre() {
				entry:
				br label %for.body

				for.body:
				%inc.phi = phi i32 [ 32, %entry ], [ %inc, %for.body ]
				%inc = sub nsw i32 %inc.phi, 2
				%cmp = icmp eq i32 %inc, 0
				br i1 %cmp, label %for.end, label %for.body

				for.end:
				ret i32 %inc.phi
				}

				; CHECK-LABEL: @geppre
				; CHECK-LABEL: middle.block:
				; CHECK: %ind.escape = getelementptr i32, i32* %ptr, i64 124
				; CHECK-LABEL: for.end:
				; CHECK: %[[RET:.]] = phi i32 [ {{.*}}, %for.body ], [ %ind.escape, %middle.block ]
				; CHECK: ret i32* %[[RET]]
				define i32* @geppre(i32* %ptr) {
				entry:
				br label %for.body

				for.body:
				%inc.phi = phi i32 [ 0, %entry ], [ %inc, %for.body ]
				%ptr.phi = phi i32* [ %ptr, %entry ], [ %inc.ptr, %for.body ]
				%inc = add nsw i32 %inc.phi, 1
				%inc.ptr = getelementptr i32, i32* %ptr.phi, i32 4
				%cmp = icmp eq i32 %inc, 32
				br i1 %cmp, label %for.end, label %for.body

				for.end:
				ret i32* %ptr.phi
				}

llvm/trunk/test/Transforms/LoopVectorize/no_outside_user.ll

; RUN: opt -S -loop-vectorize -force-vector-interleave=1 -force-vector-width=2 < %s 2>&1 \| FileCheck %s		; RUN: opt -S -loop-vectorize -force-vector-interleave=1 -force-vector-width=2 < %s 2>&1 \| FileCheck %s

; CHECK: remark: {{.*}}: loop not vectorized: value could not be identified as an induction or reduction variable		; CHECK: remark: {{.*}}: loop not vectorized: value could not be identified as an induction or reduction variable
; CHECK: remark: {{.*}}: loop not vectorized: use of induction value outside of the loop is not handled by vectorizer

target datalayout = "e-p:32:32:32-i1:8:8-i8:8:8-i16:16:16-i32:32:32-i64:32:64-f32:32:32-f64:32:64-v64:64:64-v128:128:128-a0:0:64-f80:32:32-n8:16:32-S128"		target datalayout = "e-p:32:32:32-i1:8:8-i8:8:8-i16:16:16-i32:32:32-i64:32:64-f32:32:32-f64:32:64-v64:64:64-v128:128:128-a0:0:64-f80:32:32-n8:16:32-S128"

@f = common global i32 0, align 4		@f = common global i32 0, align 4
@.str = private unnamed_addr constant [4 x i8] c"%d\0A\00", align 1		@.str = private unnamed_addr constant [4 x i8] c"%d\0A\00", align 1
@c = common global i32 0, align 4		@c = common global i32 0, align 4
@a = common global i32 0, align 4		@a = common global i32 0, align 4
@b = common global i32 0, align 4		@b = common global i32 0, align 4
Show All 23 Lines	bb16:
%tmp18 = add nsw i32 %tmp8, 1		%tmp18 = add nsw i32 %tmp8, 1
%tmp19 = icmp slt i32 %tmp18, 4		%tmp19 = icmp slt i32 %tmp18, 4
br i1 %tmp19, label %.lr.ph.i, label %f1.exit.loopexit		br i1 %tmp19, label %.lr.ph.i, label %f1.exit.loopexit

f1.exit.loopexit:		f1.exit.loopexit:
%.lcssa = phi i32 [ %tmp17, %bb16 ]		%.lcssa = phi i32 [ %tmp17, %bb16 ]
ret i32 %.lcssa		ret i32 %.lcssa
}		}

; Don't vectorize this loop. Its phi node (induction variable) has an outside
; loop user. We currently don't handle this case.
; PR17179

; CHECK-LABEL: @test2(
; CHECK-NOT: <2 x

@x1 = common global i32 0, align 4
@x2 = common global i32 0, align 4
@x0 = common global i32 0, align 4

define i32 @test2() {
entry:
store i32 0, i32* @x1, align 4
%0 = load i32, i32* @x0, align 4
br label %for.cond1.preheader

for.cond1.preheader:
%inc7 = phi i32 [ 0, %entry ], [ %inc, %for.cond1.preheader ]
%inc = add nsw i32 %inc7, 1
%cmp = icmp eq i32 %inc, 52
br i1 %cmp, label %for.end5, label %for.cond1.preheader

for.end5:
%inc7.lcssa = phi i32 [ %inc7, %for.cond1.preheader ]
%xor = xor i32 %inc7.lcssa, %0
store i32 52, i32* @x1, align 4
store i32 1, i32* @x2, align 4
ret i32 %xor
}