This is an archive of the discontinued LLVM Phabricator instance.

[LV] Never widen an induction variable.
ClosedPublic

Authored by jmolloy on Aug 24 2015, 9:01 AM.

Download Raw Diff

Details

Reviewers

anemet
mzolotukhin

Summary

There's no need to widen canonical induction variables. It's just as efficient to create a *new*, wide, induction variable.

Consider, if we widen an indvar, then we'll have to truncate it before its uses anyway (1 trunc). If we create a new indvar instead, we'll have to truncate that instead (1 trunc) [besides which IndVars should go and clean up our mess after us anyway on principle].

This lets us remove a ton of special-casing code.

Diff Detail

Repository: rL LLVM

Event Timeline

jmolloy updated this revision to Diff 32964.Aug 24 2015, 9:01 AM

jmolloy retitled this revision from to [LV] Never widen an induction variable..

jmolloy updated this object.

jmolloy added reviewers: anemet, mzolotukhin.

jmolloy set the repository for this revision to rL LLVM.

jmolloy added a subscriber: llvm-commits.

Please upload with full context. Because it's on top of the previous patch, I tried to review this but it's basically impossible.

I have a few early comments/questions though:

lib/Transforms/Vectorize/LoopVectorize.cpp
2683–2692	Is this becoming dead as a consequence of not widening indvars or was this code dead even before? I think it's the latter in which case it should be a separate patch to minimize confusion.
3416–3417	These steps seem unnecessary if P == OldInduction, no?

Hi Adam,

Rebased and uploaded with full context, with some of your comments addressed.

You mentioned the removal of ExtendedIdx in D12285 - actually it's this revision that makes ExtendedIdx redundant, not D12285. Accordingly I've squashed its removal into this patch now.

Cheers,

James

jmolloy added inline comments.Aug 30 2015, 3:10 AM

lib/Transforms/Vectorize/LoopVectorize.cpp
2683–2692	(now refers to L2174) This is becoming dead as part of this patch. This is the patch that enforces that IdxTy is no different from Induction->getType(). The previous patch only enforced that Induction->getStart() == 0.
3406–3407	(Now refers to L3477) Yes, it does. It's harmless, and I sometimes prefer to remove special-cases when they make no difference, but I've added the if back in because I think you're right, it makes the flow more obvious in this case.

anemet added inline comments.Aug 31 2015, 5:02 PM

lib/Transforms/Vectorize/LoopVectorize.cpp
470–471	It's pretty strange to store the start idx when it's always zero...
2616–2618	This logic belongs to LoopVectorizationLegality::canVectorizeInstrs together with all the other conditions.
2683–2692	I don't think so. Count is derived from this: const SCEV *BackedgeTakeCount = SE->getNoopOrZeroExtend(ExitCount, IdxTy); What worries me that if this code weren't dead and this comment was true: // The exit count can be of pointer type. Convert it to the correct // integer type. then the code after the patch wouldn't handle this case.
3482–3483	Aren't we subtracting zero here now? Does this mean that we don't need to stash StartIdx?

jmolloy added inline comments.Sep 1 2015, 7:33 AM

lib/Transforms/Vectorize/LoopVectorize.cpp
2683–2692	Hi Adam, This was a very good catch. It turns out there are no regression tests that can trigger this code, but 3 tests in the test-suite do. It's possible for Count to have pointer type. But in fact, we calculate exactly the same value above (ExitCountValue) and test that for pointer-typedness. "Count" isn't needed. So i'll remove this code, use ExitCountValue instead of Count and I'll also add a new testcase to make sure this case is triggered.

Hi Adam,

All your comments should be fixed now. I've also added a test (which asserted when I ran it against my original patch).

Thanks!

James

LGTM with some changes below.

Thanks for your patience!

lib/Transforms/Vectorize/LoopVectorize.cpp
2683–2692	Great! Also just to note what I missed about getNoopOrZeroExtend: it returns the original type (e.g. pointer type) if IdxTy and the original types have the same size. The function comment is misleading in SCEV. I'll fix this later.
test/Transforms/LoopVectorize/ptr-induction.ll
1–3	Doesn't this test need a datalayout to ensure that pointers are 64-bit?
3–6	So I guess you're checking here that we have converted this into an integer induction variable? Perhaps a comment would be good.
17	When I tried this with my not completely up-to-date sources, vectorization failed with: LV: Found an unidentified PHI. %acc.07 = phi i32 [ %add, %while.body ], [ 0, %while.body.preheader ] I think we should just remove this and its uses further down.

This revision is now accepted and ready to land.Sep 2 2015, 12:11 AM

Committed r246631. Thanks!

test/Transforms/LoopVectorize/ptr-induction.ll
17	It's the "and i32 %acc.07, 255" that's causing it. I'll remove that particular instruction, but reductions with ANDs in them are only just analyzable very recently.

Revision Contents

Path

Size

lib/

Transforms/

Vectorize/

LoopVectorize.cpp

164 lines

test/

Transforms/

LoopVectorize/

ptr-induction.ll

34 lines

Diff 33697

lib/Transforms/Vectorize/LoopVectorize.cpp

Show First 20 Lines • Show All 461 Lines • ▼ Show 20 Lines	protected:
///The scalar loop body.		///The scalar loop body.
BasicBlock *LoopScalarBody;		BasicBlock *LoopScalarBody;
/// A list of all bypass blocks. The first block is the entry of the loop.		/// A list of all bypass blocks. The first block is the entry of the loop.
SmallVector<BasicBlock *, 4> LoopBypassBlocks;		SmallVector<BasicBlock *, 4> LoopBypassBlocks;

/// The new Induction variable which was added to the new block.		/// The new Induction variable which was added to the new block.
PHINode *Induction;		PHINode *Induction;
/// The induction variable of the old basic block.		/// The induction variable of the old basic block.
PHINode *OldInduction;		PHINode *OldInduction;
/// Holds the extended (to the widest induction type) start index.
Value *ExtendedIdx;
/// Maps scalars to widened vectors.		/// Maps scalars to widened vectors.
		anemetUnsubmitted Done Reply Inline Actions It's pretty strange to store the start idx when it's always zero... anemet: It's pretty strange to store the start idx when it's always zero...
ValueMap WidenMap;		ValueMap WidenMap;
EdgeMaskCache MaskCache;		EdgeMaskCache MaskCache;

LoopVectorizationLegality *Legal;		LoopVectorizationLegality *Legal;

// Record whether runtime check is added.		// Record whether runtime check is added.
bool AddedSafetyChecks;		bool AddedSafetyChecks;
};		};
▲ Show 20 Lines • Show All 2,118 Lines • ▼ Show 20 Lines	void InnerLoopVectorizer::createEmptyLoop() {
BasicBlock *ExitBlock = OrigLoop->getExitBlock();		BasicBlock *ExitBlock = OrigLoop->getExitBlock();
assert(VectorPH && "Invalid loop structure");		assert(VectorPH && "Invalid loop structure");
assert(ExitBlock && "Must have an exit block");		assert(ExitBlock && "Must have an exit block");

// Some loops have a single integer induction variable, while other loops		// Some loops have a single integer induction variable, while other loops
// don't. One example is c++ iterators that often have multiple pointer		// don't. One example is c++ iterators that often have multiple pointer
// induction variables. In the code below we also support a case where we		// induction variables. In the code below we also support a case where we
// don't have a single induction variable.		// don't have a single induction variable.
		//
		// We try to obtain an induction variable from the original loop as hard
		// as possible. However if we don't find one that:
		// - is an integer
		// - counts from zero, stepping by one
		// - is the size of the widest induction variable type
		// then we create a new one.
OldInduction = Legal->getInduction();		OldInduction = Legal->getInduction();
Type *IdxTy = Legal->getWidestInductionType();		Type *IdxTy = Legal->getWidestInductionType();

// Find the loop boundaries.		// Find the loop boundaries.
const SCEV *ExitCount = SE->getBackedgeTakenCount(OrigLoop);		const SCEV *ExitCount = SE->getBackedgeTakenCount(OrigLoop);
assert(ExitCount != SE->getCouldNotCompute() && "Invalid loop count");		assert(ExitCount != SE->getCouldNotCompute() && "Invalid loop count");
		anemetUnsubmitted Done Reply Inline Actions This logic belongs to LoopVectorizationLegality::canVectorizeInstrs together with all the other conditions. anemet: This logic belongs to LoopVectorizationLegality::canVectorizeInstrs together with all the other…

// The exit count might have the type of i64 while the phi is i32. This can		// The exit count might have the type of i64 while the phi is i32. This can
// happen if we have an induction variable that is sign extended before the		// happen if we have an induction variable that is sign extended before the
// compare. The only way that we get a backedge taken count is that the		// compare. The only way that we get a backedge taken count is that the
// induction variable was signed and as such will not overflow. In such a case		// induction variable was signed and as such will not overflow. In such a case
// truncation is legal.		// truncation is legal.
if (ExitCount->getType()->getPrimitiveSizeInBits() >		if (ExitCount->getType()->getPrimitiveSizeInBits() >
IdxTy->getPrimitiveSizeInBits())		IdxTy->getPrimitiveSizeInBits())
Show All 26 Lines	ExitCountValue = CastInst::CreatePointerCast(ExitCountValue, IdxTy,
VectorPH->getTerminator());		VectorPH->getTerminator());

Instruction *CheckMinIters =		Instruction *CheckMinIters =
CmpInst::Create(Instruction::ICmp, CmpInst::ICMP_ULT, ExitCountValue,		CmpInst::Create(Instruction::ICmp, CmpInst::ICMP_ULT, ExitCountValue,
ConstantInt::get(ExitCountValue->getType(), VF * UF),		ConstantInt::get(ExitCountValue->getType(), VF * UF),
"min.iters.check", VectorPH->getTerminator());		"min.iters.check", VectorPH->getTerminator());

Builder.SetInsertPoint(VectorPH->getTerminator());		Builder.SetInsertPoint(VectorPH->getTerminator());
Value *StartIdx = ExtendedIdx = ConstantInt::get(IdxTy, 0);		Value *StartIdx = ConstantInt::get(IdxTy, 0);

// Count holds the overall loop count (N).
Value *Count = Exp.expandCodeFor(ExitCount, ExitCount->getType(),
VectorPH->getTerminator());

LoopBypassBlocks.push_back(VectorPH);		LoopBypassBlocks.push_back(VectorPH);

// Split the single block loop into the two loop structure described above.		// Split the single block loop into the two loop structure described above.
BasicBlock *VecBody =		BasicBlock *VecBody =
VectorPH->splitBasicBlock(VectorPH->getTerminator(), "vector.body");		VectorPH->splitBasicBlock(VectorPH->getTerminator(), "vector.body");
BasicBlock *MiddleBlock =		BasicBlock *MiddleBlock =
VecBody->splitBasicBlock(VecBody->getTerminator(), "middle.block");		VecBody->splitBasicBlock(VecBody->getTerminator(), "middle.block");
BasicBlock *ScalarPH =		BasicBlock *ScalarPH =
MiddleBlock->splitBasicBlock(MiddleBlock->getTerminator(), "scalar.ph");		MiddleBlock->splitBasicBlock(MiddleBlock->getTerminator(), "scalar.ph");

// Create and register the new vector loop.		// Create and register the new vector loop.
Loop* Lp = new Loop();		Loop* Lp = new Loop();
Loop *ParentLoop = OrigLoop->getParentLoop();		Loop *ParentLoop = OrigLoop->getParentLoop();

// Insert the new loop into the loop nest and register the new basic blocks		// Insert the new loop into the loop nest and register the new basic blocks
// before calling any utilities such as SCEV that require valid LoopInfo.		// before calling any utilities such as SCEV that require valid LoopInfo.
if (ParentLoop) {		if (ParentLoop) {
ParentLoop->addChildLoop(Lp);		ParentLoop->addChildLoop(Lp);
ParentLoop->addBasicBlockToLoop(ScalarPH, *LI);		ParentLoop->addBasicBlockToLoop(ScalarPH, *LI);
ParentLoop->addBasicBlockToLoop(MiddleBlock, *LI);		ParentLoop->addBasicBlockToLoop(MiddleBlock, *LI);
} else {		} else {
LI->addTopLevelLoop(Lp);		LI->addTopLevelLoop(Lp);
}		}
Lp->addBasicBlockToLoop(VecBody, *LI);		Lp->addBasicBlockToLoop(VecBody, *LI);

// Use this IR builder to create the loop instructions (Phi, Br, Cmp)		// Use this IR builder to create the loop instructions (Phi, Br, Cmp)
// inside the loop.		// inside the loop.
Builder.SetInsertPoint(VecBody->getFirstNonPHI());		Builder.SetInsertPoint(VecBody->getFirstNonPHI());

// Generate the induction variable.		// Generate the induction variable.
setDebugLocFromInst(Builder, getDebugLocFromInstOrOperands(OldInduction));		setDebugLocFromInst(Builder, getDebugLocFromInstOrOperands(OldInduction));
anemetUnsubmitted Not Done Reply Inline Actions Is this becoming dead as a consequence of not widening indvars or was this code dead even before? I think it's the latter in which case it should be a separate patch to minimize confusion. anemet: Is this becoming dead as a consequence of not widening indvars or was this code dead even…
jmolloyAuthorUnsubmitted Not Done Reply Inline Actions (now refers to L2174) This is becoming dead as part of this patch. This is the patch that enforces that IdxTy is no different from Induction->getType(). The previous patch only enforced that Induction->getStart() == 0. jmolloy: (now refers to L2174) This is becoming dead as part of this patch. This is the patch that…
anemetUnsubmitted Not Done Reply Inline Actions I don't think so. Count is derived from this: const SCEV BackedgeTakeCount = SE->getNoopOrZeroExtend(ExitCount, IdxTy); What worries me that if this code weren't dead and this comment was true: // The exit count can be of pointer type. Convert it to the correct // integer type. then the code after the patch wouldn't handle this case. anemet:* I don't think so. Count is derived from this: const SCEV *BackedgeTakeCount = SE…
jmolloyAuthorUnsubmitted Not Done Reply Inline Actions Hi Adam, This was a very good catch. It turns out there are no regression tests that can trigger this code, but 3 tests in the test-suite do. It's possible for Count to have pointer type. But in fact, we calculate exactly the same value above (ExitCountValue) and test that for pointer-typedness. "Count" isn't needed. So i'll remove this code, use ExitCountValue instead of Count and I'll also add a new testcase to make sure this case is triggered. jmolloy: Hi Adam, This was a very good catch. It turns out there are no regression tests that can…
anemetUnsubmitted Not Done Reply Inline Actions Great! Also just to note what I missed about getNoopOrZeroExtend: it returns the original type (e.g. pointer type) if IdxTy and the original types have the same size. The function comment is misleading in SCEV. I'll fix this later. anemet: Great! Also just to note what I missed about getNoopOrZeroExtend: it returns the original type…
Induction = Builder.CreatePHI(IdxTy, 2, "index");		Induction = Builder.CreatePHI(IdxTy, 2, "index");
// The loop step is equal to the vectorization factor (num of SIMD elements)		// The loop step is equal to the vectorization factor (num of SIMD elements)
// times the unroll factor (num of SIMD instructions).		// times the unroll factor (num of SIMD instructions).
Constant Step = ConstantInt::get(IdxTy, VF UF);		Constant Step = ConstantInt::get(IdxTy, VF UF);

// Generate code to check that the loop's trip count is not less than the		// Generate code to check that the loop's trip count is not less than the
// minimum loop iteration number threshold.		// minimum loop iteration number threshold.
BasicBlock *NewVectorPH =		BasicBlock *NewVectorPH =
VectorPH->splitBasicBlock(VectorPH->getTerminator(), "min.iters.checked");		VectorPH->splitBasicBlock(VectorPH->getTerminator(), "min.iters.checked");
if (ParentLoop)		if (ParentLoop)
ParentLoop->addBasicBlockToLoop(NewVectorPH, *LI);		ParentLoop->addBasicBlockToLoop(NewVectorPH, *LI);
ReplaceInstWithInst(VectorPH->getTerminator(),		ReplaceInstWithInst(VectorPH->getTerminator(),
BranchInst::Create(ScalarPH, NewVectorPH, CheckMinIters));		BranchInst::Create(ScalarPH, NewVectorPH, CheckMinIters));
VectorPH = NewVectorPH;		VectorPH = NewVectorPH;

// This is the IR builder that we use to add all of the logic for bypassing		// This is the IR builder that we use to add all of the logic for bypassing
// the new vector loop.		// the new vector loop.
IRBuilder<> BypassBuilder(VectorPH->getTerminator());		IRBuilder<> BypassBuilder(VectorPH->getTerminator());
setDebugLocFromInst(BypassBuilder,		setDebugLocFromInst(BypassBuilder,
getDebugLocFromInstOrOperands(OldInduction));		getDebugLocFromInstOrOperands(OldInduction));

// We may need to extend the index in case there is a type mismatch.
// We know that the count starts at zero and does not overflow.
if (Count->getType() != IdxTy) {
// The exit count can be of pointer type. Convert it to the correct
// integer type.
if (ExitCount->getType()->isPointerTy())
Count = BypassBuilder.CreatePointerCast(Count, IdxTy, "ptrcnt.to.int");
else
Count = BypassBuilder.CreateZExtOrTrunc(Count, IdxTy, "cnt.cast");
}

// Add the start index to the loop count to get the new end index.		// Add the start index to the loop count to get the new end index.
Value *IdxEnd = BypassBuilder.CreateAdd(Count, StartIdx, "end.idx");		Value *IdxEnd = BypassBuilder.CreateAdd(ExitCountValue, StartIdx, "end.idx");

// Now we need to generate the expression for N - (N % VF), which is		// Now we need to generate the expression for N - (N % VF), which is
// the part that the vectorized body will execute.		// the part that the vectorized body will execute.
Value *R = BypassBuilder.CreateURem(Count, Step, "n.mod.vf");		Value *R = BypassBuilder.CreateURem(ExitCountValue, Step, "n.mod.vf");
Value *CountRoundDown = BypassBuilder.CreateSub(Count, R, "n.vec");		Value *CountRoundDown = BypassBuilder.CreateSub(ExitCountValue, R, "n.vec");
Value *IdxEndRoundDown = BypassBuilder.CreateAdd(CountRoundDown, StartIdx,		Value *IdxEndRoundDown = BypassBuilder.CreateAdd(CountRoundDown, StartIdx,
"end.idx.rnd.down");		"end.idx.rnd.down");

// Now, compare the new count to zero. If it is zero skip the vector loop and		// Now, compare the new count to zero. If it is zero skip the vector loop and
// jump to the scalar loop.		// jump to the scalar loop.
Value *Cmp =		Value *Cmp =
BypassBuilder.CreateICmpEQ(IdxEndRoundDown, StartIdx, "cmp.zero");		BypassBuilder.CreateICmpEQ(IdxEndRoundDown, StartIdx, "cmp.zero");
NewVectorPH =		NewVectorPH =
▲ Show 20 Lines • Show All 59 Lines • ▼ Show 20 Lines	void InnerLoopVectorizer::createEmptyLoop() {
// We are going to resume the execution of the scalar loop.		// We are going to resume the execution of the scalar loop.
// Go over all of the induction variables that we found and fix the		// Go over all of the induction variables that we found and fix the
// PHIs that are left in the scalar version of the loop.		// PHIs that are left in the scalar version of the loop.
// The starting values of PHI nodes depend on the counter of the last		// The starting values of PHI nodes depend on the counter of the last
// iteration in the vectorized loop.		// iteration in the vectorized loop.
// If we come from a bypass edge then we need to start from the original		// If we come from a bypass edge then we need to start from the original
// start value.		// start value.

// This variable saves the new starting index for the scalar loop.		// This variable saves the new starting index for the scalar loop. It is used
		// to test if there are any tail iterations left once the vector loop has
		// completed.
PHINode *ResumeIndex = nullptr;		PHINode *ResumeIndex = nullptr;
LoopVectorizationLegality::InductionList::iterator I, E;		LoopVectorizationLegality::InductionList::iterator I, E;
LoopVectorizationLegality::InductionList *List = Legal->getInductionVars();		LoopVectorizationLegality::InductionList *List = Legal->getInductionVars();
// Set builder to point to last bypass block.		// Set builder to point to last bypass block.
BypassBuilder.SetInsertPoint(LoopBypassBlocks.back()->getTerminator());		BypassBuilder.SetInsertPoint(LoopBypassBlocks.back()->getTerminator());
for (I = List->begin(), E = List->end(); I != E; ++I) {		for (I = List->begin(), E = List->end(); I != E; ++I) {
PHINode *OrigPhi = I->first;		PHINode *OrigPhi = I->first;
InductionDescriptor II = I->second;		InductionDescriptor II = I->second;

Type *ResumeValTy = (OrigPhi == OldInduction) ? IdxTy : OrigPhi->getType();		PHINode *ResumeVal = PHINode::Create(OrigPhi->getType(), 2, "resume.val",
PHINode *ResumeVal = PHINode::Create(ResumeValTy, 2, "resume.val",
MiddleBlock->getTerminator());		MiddleBlock->getTerminator());
// We might have extended the type of the induction variable but we need a
// truncated version for the scalar loop.
PHINode *TruncResumeVal = (OrigPhi == OldInduction) ?
PHINode::Create(OrigPhi->getType(), 2, "trunc.resume.val",
MiddleBlock->getTerminator()) : nullptr;

// Create phi nodes to merge from the backedge-taken check block.		// Create phi nodes to merge from the backedge-taken check block.
PHINode *BCResumeVal = PHINode::Create(ResumeValTy, 3, "bc.resume.val",		PHINode *BCResumeVal = PHINode::Create(OrigPhi->getType(), 3,
		"bc.resume.val",
ScalarPH->getTerminator());		ScalarPH->getTerminator());
BCResumeVal->addIncoming(ResumeVal, MiddleBlock);		BCResumeVal->addIncoming(ResumeVal, MiddleBlock);

PHINode *BCTruncResumeVal = nullptr;		Value *EndValue;
if (OrigPhi == OldInduction) {		if (OrigPhi == OldInduction) {
BCTruncResumeVal =
PHINode::Create(OrigPhi->getType(), 2, "bc.trunc.resume.val",
ScalarPH->getTerminator());
BCTruncResumeVal->addIncoming(TruncResumeVal, MiddleBlock);
}

Value *EndValue = nullptr;
switch (II.getKind()) {
case InductionDescriptor::IK_NoInduction:
llvm_unreachable("Unknown induction");
case InductionDescriptor::IK_IntInduction: {
// Handle the integer induction counter.
assert(OrigPhi->getType()->isIntegerTy() && "Invalid type");

// We have the canonical induction variable.
if (OrigPhi == OldInduction) {
// Create a truncated version of the resume value for the scalar loop,
// we might have promoted the type to a larger width.
EndValue =
BypassBuilder.CreateTrunc(IdxEndRoundDown, OrigPhi->getType());
// The new PHI merges the original incoming value, in case of a bypass,
// or the value at the end of the vectorized loop.
for (unsigned I = 1, E = LoopBypassBlocks.size(); I != E; ++I)
TruncResumeVal->addIncoming(II.getStartValue(), LoopBypassBlocks[I]);
TruncResumeVal->addIncoming(EndValue, VecBody);

BCTruncResumeVal->addIncoming(II.getStartValue(), LoopBypassBlocks[0]);

// We know what the end value is.		// We know what the end value is.
EndValue = IdxEndRoundDown;		EndValue = IdxEndRoundDown;
// We also know which PHI node holds it.		// We also know which PHI node holds it.
ResumeIndex = ResumeVal;		ResumeIndex = ResumeVal;
break;		} else {
}

// Not the canonical induction variable - add the vector loop count to the
// start value.
Value *CRD = BypassBuilder.CreateSExtOrTrunc(CountRoundDown,
II.getStartValue()->getType(),
"cast.crd");
EndValue = II.transform(BypassBuilder, CRD);
EndValue->setName("ind.end");
break;
}
case InductionDescriptor::IK_PtrInduction: {
Value *CRD = BypassBuilder.CreateSExtOrTrunc(CountRoundDown,		Value *CRD = BypassBuilder.CreateSExtOrTrunc(CountRoundDown,
II.getStepValue()->getType(),		II.getStepValue()->getType(),
"cast.crd");		"cast.crd");
EndValue = II.transform(BypassBuilder, CRD);		EndValue = II.transform(BypassBuilder, CRD);
EndValue->setName("ptr.ind.end");		EndValue->setName("ind.end");
break;
}		}
}// end of case

// The new PHI merges the original incoming value, in case of a bypass,		// The new PHI merges the original incoming value, in case of a bypass,
// or the value at the end of the vectorized loop.		// or the value at the end of the vectorized loop.
for (unsigned I = 1, E = LoopBypassBlocks.size(); I != E; ++I) {		for (unsigned I = 1, E = LoopBypassBlocks.size(); I != E; ++I)
if (OrigPhi == OldInduction)
ResumeVal->addIncoming(StartIdx, LoopBypassBlocks[I]);
else
ResumeVal->addIncoming(II.getStartValue(), LoopBypassBlocks[I]);		ResumeVal->addIncoming(II.getStartValue(), LoopBypassBlocks[I]);
}
ResumeVal->addIncoming(EndValue, VecBody);		ResumeVal->addIncoming(EndValue, VecBody);

// Fix the scalar body counter (PHI node).		// Fix the scalar body counter (PHI node).
unsigned BlockIdx = OrigPhi->getBasicBlockIndex(ScalarPH);		unsigned BlockIdx = OrigPhi->getBasicBlockIndex(ScalarPH);

// The old induction's phi node in the scalar body needs the truncated		// The old induction's phi node in the scalar body needs the truncated
// value.		// value.
if (OrigPhi == OldInduction) {
BCResumeVal->addIncoming(StartIdx, LoopBypassBlocks[0]);
OrigPhi->setIncomingValue(BlockIdx, BCTruncResumeVal);
} else {
BCResumeVal->addIncoming(II.getStartValue(), LoopBypassBlocks[0]);		BCResumeVal->addIncoming(II.getStartValue(), LoopBypassBlocks[0]);
OrigPhi->setIncomingValue(BlockIdx, BCResumeVal);		OrigPhi->setIncomingValue(BlockIdx, BCResumeVal);
}		}
}

// If we are generating a new induction variable then we also need to		// If we are generating a new induction variable then we also need to
// generate the code that calculates the exit value. This value is not		// generate the code that calculates the exit value. This value is not
// simply the end of the counter because we may skip the vectorized body		// simply the end of the counter because we may skip the vectorized body
// in case of a runtime check.		// in case of a runtime check.
if (!OldInduction){		if (!OldInduction){
assert(!ResumeIndex && "Unexpected resume value found");		assert(!ResumeIndex && "Unexpected resume value found");
ResumeIndex = PHINode::Create(IdxTy, 2, "new.indc.resume.val",		ResumeIndex = PHINode::Create(IdxTy, 2, "new.indc.resume.val",
▲ Show 20 Lines • Show All 545 Lines • ▼ Show 20 Lines	void InnerLoopVectorizer::widenPHIInstruction(Instruction *PN,
unsigned UF, unsigned VF, PhiVector *PV) {		unsigned UF, unsigned VF, PhiVector *PV) {
PHINode* P = cast<PHINode>(PN);		PHINode* P = cast<PHINode>(PN);
// Handle reduction variables:		// Handle reduction variables:
if (Legal->getReductionVars()->count(P)) {		if (Legal->getReductionVars()->count(P)) {
for (unsigned part = 0; part < UF; ++part) {		for (unsigned part = 0; part < UF; ++part) {
// This is phase one of vectorizing PHIs.		// This is phase one of vectorizing PHIs.
Type *VecTy = (VF == 1) ? PN->getType() :		Type *VecTy = (VF == 1) ? PN->getType() :
VectorType::get(PN->getType(), VF);		VectorType::get(PN->getType(), VF);
Entry[part] = PHINode::Create(VecTy, 2, "vec.phi",		Entry[part] = PHINode::Create(VecTy, 2, "vec.phi",
LoopVectorBody.back()-> getFirstInsertionPt());		LoopVectorBody.back()-> getFirstInsertionPt());
		jmolloyAuthorUnsubmitted Not Done Reply Inline Actions (Now refers to L3477) Yes, it does. It's harmless, and I sometimes prefer to remove special-cases when they make no difference, but I've added the if back in because I think you're right, it makes the flow more obvious in this case. jmolloy: (Now refers to L3477) Yes, it does. It's harmless, and I sometimes prefer to remove special…
}		}
PV->push_back(P);		PV->push_back(P);
return;		return;
}		}

setDebugLocFromInst(Builder, P);		setDebugLocFromInst(Builder, P);
// Check for PHI nodes that are lowered to vector selects.		// Check for PHI nodes that are lowered to vector selects.
if (P->getParent() != OrigLoop->getHeader()) {		if (P->getParent() != OrigLoop->getHeader()) {
// We know that all PHIs in non-header blocks are converted into		// We know that all PHIs in non-header blocks are converted into
// selects, so we don't have to worry about the insertion order and we		// selects, so we don't have to worry about the insertion order and we
		anemetUnsubmitted Not Done Reply Inline Actions These steps seem unnecessary if P == OldInduction, no? anemet: These steps seem unnecessary if P == OldInduction, no?
// can just use the builder.		// can just use the builder.
// At this point we generate the predication tree. There may be		// At this point we generate the predication tree. There may be
// duplications since this is a simple recursive scan, but future		// duplications since this is a simple recursive scan, but future
// optimizations will clean it up.		// optimizations will clean it up.

unsigned NumIncoming = P->getNumIncomingValues();		unsigned NumIncoming = P->getNumIncomingValues();

// Generate a sequence of selects of the form:		// Generate a sequence of selects of the form:
Show All 30 Lines	void InnerLoopVectorizer::widenPHIInstruction(Instruction *PN,

// FIXME: The newly created binary instructions should contain nsw/nuw flags,		// FIXME: The newly created binary instructions should contain nsw/nuw flags,
// which can be found from the original scalar operations.		// which can be found from the original scalar operations.
switch (II.getKind()) {		switch (II.getKind()) {
case InductionDescriptor::IK_NoInduction:		case InductionDescriptor::IK_NoInduction:
llvm_unreachable("Unknown induction");		llvm_unreachable("Unknown induction");
case InductionDescriptor::IK_IntInduction: {		case InductionDescriptor::IK_IntInduction: {
assert(P->getType() == II.getStartValue()->getType() && "Types must match");		assert(P->getType() == II.getStartValue()->getType() && "Types must match");
Type *PhiTy = P->getType();
Value *Broadcasted;
if (P == OldInduction) {
// Handle the canonical induction variable. We might have had to
// extend the type.
Broadcasted = Builder.CreateTrunc(Induction, PhiTy);
} else {
// Handle other induction variables that are now based on the		// Handle other induction variables that are now based on the
// canonical one.		// canonical one.
auto *V = Builder.CreateSExtOrTrunc(Induction, PhiTy);		Value *V = Induction;
Broadcasted = II.transform(Builder, V);		if (P != OldInduction) {
Broadcasted->setName("offset.idx");		V = Builder.CreateSExtOrTrunc(Induction, P->getType());
		V = II.transform(Builder, V);
		V->setName("offset.idx");
}		}
Broadcasted = getBroadcastInstrs(Broadcasted);		Value *Broadcasted = getBroadcastInstrs(V);
// After broadcasting the induction variable we need to make the vector		// After broadcasting the induction variable we need to make the vector
// consecutive by adding 0, 1, 2, etc.		// consecutive by adding 0, 1, 2, etc.
for (unsigned part = 0; part < UF; ++part)		for (unsigned part = 0; part < UF; ++part)
Entry[part] = getStepVector(Broadcasted, VF * part, II.getStepValue());		Entry[part] = getStepVector(Broadcasted, VF * part, II.getStepValue());
return;		return;
}		}
case InductionDescriptor::IK_PtrInduction:		case InductionDescriptor::IK_PtrInduction:
// Handle the pointer induction variable case.		// Handle the pointer induction variable case.
assert(P->getType()->isPointerTy() && "Unexpected type.");		assert(P->getType()->isPointerTy() && "Unexpected type.");
// This is the normalized GEP that starts counting at zero.		// This is the normalized GEP that starts counting at zero.
Value *NormalizedIdx =		Value *PtrInd = Induction;
		anemetUnsubmitted Done Reply Inline Actions Aren't we subtracting zero here now? Does this mean that we don't need to stash StartIdx? anemet: Aren't we subtracting zero here now? Does this mean that we don't need to stash StartIdx?
Builder.CreateSub(Induction, ExtendedIdx, "normalized.idx");		PtrInd = Builder.CreateSExtOrTrunc(PtrInd, II.getStepValue()->getType());
NormalizedIdx =
Builder.CreateSExtOrTrunc(NormalizedIdx, II.getStepValue()->getType());
// This is the vector of results. Notice that we don't generate		// This is the vector of results. Notice that we don't generate
// vector geps because scalar geps result in better code.		// vector geps because scalar geps result in better code.
for (unsigned part = 0; part < UF; ++part) {		for (unsigned part = 0; part < UF; ++part) {
if (VF == 1) {		if (VF == 1) {
int EltIndex = part;		int EltIndex = part;
Constant *Idx = ConstantInt::get(NormalizedIdx->getType(), EltIndex);		Constant *Idx = ConstantInt::get(PtrInd->getType(), EltIndex);
Value *GlobalIdx = Builder.CreateAdd(NormalizedIdx, Idx);		Value *GlobalIdx = Builder.CreateAdd(PtrInd, Idx);
Value *SclrGep = II.transform(Builder, GlobalIdx);		Value *SclrGep = II.transform(Builder, GlobalIdx);
SclrGep->setName("next.gep");		SclrGep->setName("next.gep");
Entry[part] = SclrGep;		Entry[part] = SclrGep;
continue;		continue;
}		}

Value *VecVal = UndefValue::get(VectorType::get(P->getType(), VF));		Value *VecVal = UndefValue::get(VectorType::get(P->getType(), VF));
for (unsigned int i = 0; i < VF; ++i) {		for (unsigned int i = 0; i < VF; ++i) {
int EltIndex = i + part * VF;		int EltIndex = i + part * VF;
Constant *Idx = ConstantInt::get(NormalizedIdx->getType(), EltIndex);		Constant *Idx = ConstantInt::get(PtrInd->getType(), EltIndex);
Value *GlobalIdx = Builder.CreateAdd(NormalizedIdx, Idx);		Value *GlobalIdx = Builder.CreateAdd(PtrInd, Idx);
Value *SclrGep = II.transform(Builder, GlobalIdx);		Value *SclrGep = II.transform(Builder, GlobalIdx);
SclrGep->setName("next.gep");		SclrGep->setName("next.gep");
VecVal = Builder.CreateInsertElement(VecVal, SclrGep,		VecVal = Builder.CreateInsertElement(VecVal, SclrGep,
Builder.getInt32(i),		Builder.getInt32(i),
"insert.gep");		"insert.gep");
}		}
Entry[part] = VecVal;		Entry[part] = VecVal;
}		}
▲ Show 20 Lines • Show All 651 Lines • ▼ Show 20 Lines	if (!Induction) {
DEBUG(dbgs() << "LV: Did not find one integer induction var.\n");		DEBUG(dbgs() << "LV: Did not find one integer induction var.\n");
if (Inductions.empty()) {		if (Inductions.empty()) {
emitAnalysis(VectorizationReport()		emitAnalysis(VectorizationReport()
<< "loop induction variable could not be identified");		<< "loop induction variable could not be identified");
return false;		return false;
}		}
}		}

		// Now we know the widest induction type, check if our found induction
		// is the same size. If it's not, unset it here and InnerLoopVectorizer
		// will create another.
		if (Induction && WidestIndTy != Induction->getType())
		Induction = nullptr;

return true;		return true;
}		}

void LoopVectorizationLegality::collectStridedAccess(Value *MemAccess) {		void LoopVectorizationLegality::collectStridedAccess(Value *MemAccess) {
Value *Ptr = nullptr;		Value *Ptr = nullptr;
if (LoadInst *LI = dyn_cast<LoadInst>(MemAccess))		if (LoadInst *LI = dyn_cast<LoadInst>(MemAccess))
Ptr = LI->getPointerOperand();		Ptr = LI->getPointerOperand();
else if (StoreInst *SI = dyn_cast<StoreInst>(MemAccess))		else if (StoreInst *SI = dyn_cast<StoreInst>(MemAccess))
▲ Show 20 Lines • Show All 1,269 Lines • Show Last 20 Lines

test/Transforms/LoopVectorize/ptr-induction.ll

This file was added.

				; RUN: opt < %s -loop-vectorize -S \| FileCheck %s

				; This testcase causes SCEV to return a pointer-typed exit value.
				anemetUnsubmitted Done Reply Inline Actions Doesn't this test need a datalayout to ensure that pointers are 64-bit? anemet: Doesn't this test need a datalayout to ensure that pointers are 64-bit?

				; CHECK: @f
				; CHECK: %index.next = add i64 %index, 4
				anemetUnsubmitted Done Reply Inline Actions So I guess you're checking here that we have converted this into an integer induction variable? Perhaps a comment would be good. anemet: So I guess you're checking here that we have converted this into an integer induction variable?
				define i8 @f(i8* readonly %a, i8* readnone %b) #0 {
				entry:
				%cmp.6 = icmp ult i8* %a, %b
				br i1 %cmp.6, label %while.body.preheader, label %while.end

				while.body.preheader: ; preds = %entry
				br label %while.body

				while.body: ; preds = %while.body.preheader, %while.body
				%a.pn = phi i8* [ %incdec.ptr8, %while.body ], [ %a, %while.body.preheader ]
				%acc.07 = phi i32 [ %add, %while.body ], [ 0, %while.body.preheader ]
				anemetUnsubmitted Not Done Reply Inline Actions When I tried this with my not completely up-to-date sources, vectorization failed with: LV: Found an unidentified PHI. %acc.07 = phi i32 [ %add, %while.body ], [ 0, %while.body.preheader ] I think we should just remove this and its uses further down. anemet: When I tried this with my not completely up-to-date sources, vectorization failed with: LV…
				jmolloyAuthorUnsubmitted Not Done Reply Inline Actions It's the "and i32 %acc.07, 255" that's causing it. I'll remove that particular instruction, but reductions with ANDs in them are only just analyzable very recently. jmolloy: It's the "and i32 %acc.07, 255" that's causing it. I'll remove that particular instruction, but…
				%incdec.ptr8 = getelementptr inbounds i8, i8* %a.pn, i64 1
				%0 = load i8, i8* %incdec.ptr8, align 1
				%conv = zext i8 %0 to i32
				%conv1 = and i32 %acc.07, 255
				%add = add nuw nsw i32 %conv, %conv1
				%exitcond = icmp eq i8* %incdec.ptr8, %b
				br i1 %exitcond, label %while.cond.while.end_crit_edge, label %while.body

				while.cond.while.end_crit_edge: ; preds = %while.body
				%add.lcssa = phi i32 [ %add, %while.body ]
				%conv2 = trunc i32 %add.lcssa to i8
				br label %while.end

				while.end: ; preds = %while.cond.while.end_crit_edge, %entry
				%acc.0.lcssa = phi i8 [ %conv2, %while.cond.while.end_crit_edge ], [ 0, %entry ]
				ret i8 %acc.0.lcssa
				}
				No newline at end of file