This is an archive of the discontinued LLVM Phabricator instance.

Paths

Table of Contentst

-
llvm/trunk/
-
trunk/
-
lib/Transforms/Scalar/
-
Transforms/
-
Scalar/
-
IndVarSimplify.cpp
-
test/Transforms/IndVarSimplify/
-
Transforms/
-
IndVarSimplify/
-
2011-11-01-lftrptr.ll
-
lftr-pr41998.ll

Differential D63686

[LFTR] Fix post-inc pointer IV with truncated exit count (PR41998)
ClosedPublic

Authored by nikic on Jun 22 2019, 3:09 AM.

Download Raw Diff

Details

Reviewers

reames
sanjoy

Commits

rG2d756c4feb69: [LFTR] Fix post-inc pointer IV with truncated exit count (PR41998)
rL364709: [LFTR] Fix post-inc pointer IV with truncated exit count (PR41998)

Summary

Fixes https://bugs.llvm.org/show_bug.cgi?id=41998. Usually when we have a truncated exit count we'll truncate the IV when comparing against the limit, in which case exit count overflow in post-inc form doesn't matter. However, for pointer IVs we don't do that, so we have to be careful about incrementing the IV in the wide type.

I'm fixing this by removing the IVCount variable (which was ExitCount or ExitCount+1) and replacing it with a UsePostInc flag, and then moving the actual limit adjustment to the individual cases (which are pointer IV where we add to the wide type, integer IV where we add to the narrow type and constant integer IV where we add to the wide type).

Diff Detail

Repository: rL LLVM

Event Timeline

nikic created this revision.Jun 22 2019, 3:09 AM

Herald added a project: Restricted Project. · View Herald TranscriptJun 22 2019, 3:09 AM

Herald added subscribers: llvm-commits, hiraditya. · View Herald Transcript

FWIW, the original test case with pre-increment is fixed by this, e.g.:

#include <stdio.h>

__attribute__((__noinline__)) void parse_pre(const char *dataptr) {
  int col;
  char line[1024];
  char *lineptr;

  for (col = 0, lineptr = line; *dataptr != '\0'; dataptr++) {
    printf("before inner loop: col=%d, diff=%td\n", col, lineptr - line);
    do {
      printf("inside inner loop: col=%d, diff=%td\n", col, lineptr - line);
      *++lineptr = ' ';
      col++;
    } while (col & 7);
  }
}

int main(void) {
  parse_pre("a");

  return 0;
}

now runs the correct number of loops:

$ ./ps-pdf-pre-r364133-D63686
before inner loop: col=0, diff=0
inside inner loop: col=0, diff=0
inside inner loop: col=1, diff=1
inside inner loop: col=2, diff=2
inside inner loop: col=3, diff=3
inside inner loop: col=4, diff=4
inside inner loop: col=5, diff=5
inside inner loop: col=6, diff=6
inside inner loop: col=7, diff=7

The IR difference is this:

@@ -54,7 +54,7 @@ do.body:                                          ; pr
   %9 = load i8*, i8** %lineptr, align 8, !tbaa !2
   %incdec.ptr = getelementptr inbounds i8, i8* %9, i32 1
   store i8* %incdec.ptr, i8** %lineptr, align 8, !tbaa !2
-  store i8 32, i8* %9, align 1, !tbaa !8
+  store i8 32, i8* %incdec.ptr, align 1, !tbaa !8
   %10 = load i32, i32* %col, align 4, !tbaa !6
   %inc = add nsw i32 %10, 1
   store i32 %inc, i32* %col, align 4, !tbaa !6

I'm hesitant about this. Not because your patch is wrong, but because we've ended up with a ton of complexity involving the post-increment case. I'd really like to take a step back and see if we can factor apart some code here.

Assume for the moment that we have a transform which knows how to generate the pre-increment form, can we do post increment conversion as a post processing step?

If we were willing to strip, I think we could just strip flags and add one to both sides of the comparison *in whatever bitwidth the pre-inc form chose*. Do you agree?

If so, then I think we can move the post-inc legality/stripping entirely into a separate transform which runs *after* we've formed the arguments for the new comparison. (Which is slightly different than the current patch as we'd do the pre-to-post *after* the extend/trunc decision.)

I'm also really questioning the value of doing pre-to-post conversion in the middle end at all. I'm getting increasing tempted to move that to just before codegen, and just strip all the flags without concern. :)

LGTM. As indicated in previous comment, I have some design hesitation here, but the patch is correct and fixes a bug. I'm fine continuing the design discussion after submit as I trust Nikita to engage and follow through as needed.

This revision was not accepted when it landed; it landed in state Needs Review.Jun 29 2019, 2:24 AM

Closed by commit rL364709: [LFTR] Fix post-inc pointer IV with truncated exit count (PR41998) (authored by nikic). · Explain Why

This revision was automatically updated to reflect the committed changes.

In D63686#1562925, @reames wrote:

I'm hesitant about this. Not because your patch is wrong, but because we've ended up with a ton of complexity involving the post-increment case. I'd really like to take a step back and see if we can factor apart some code here.

Assume for the moment that we have a transform which knows how to generate the pre-increment form, can we do post increment conversion as a post processing step?

If we were willing to strip, I think we could just strip flags and add one to both sides of the comparison *in whatever bitwidth the pre-inc form chose*. Do you agree?

If so, then I think we can move the post-inc legality/stripping entirely into a separate transform which runs *after* we've formed the arguments for the new comparison. (Which is slightly different than the current patch as we'd do the pre-to-post *after* the extend/trunc decision.)

I'm also really questioning the value of doing pre-to-post conversion in the middle end at all. I'm getting increasing tempted to move that to just before codegen, and just strip all the flags without concern. :)

I've been thinking the same thing ... the post-inc conversion here causes a lot of issues for unclear benefit. Some thoughts:

LFTR currently does two things: Simplify an existing exit condition, or switch to a different IV. If we're simplifying an existing post-inc exit condition, we likely want to preserve it in that form. Though I think that this may be a good bit simpler than the general transform as we shouldn't have to deal with different-width IV and exit count in that case (is that correct?) Generally splitting the exit condition simplification and the IV switch parts of LFTR may be worthwhile, because the latter is much more dicey due to all the undef/poison baggage it comes with.
Ideally we would not use post-inc form in the middle end and let LSR move to post-inc if profitable. Apart from the issues we're seeing here, this post-inc form is also non-canonical -- clearly (a + C1) == C2 should canonically be a == C2-C1, but we have hacks in InstCombine to preserve this for multi-use cases. There is some relevant discussion on D58633 and in particular some samples where using pre-inc form currently causes regressions. The first step would probably be to figure out what is going on there.

In D63686#1563412, @nikic wrote:

Ideally we would not use post-inc form in the middle end and let LSR move to post-inc if profitable. Apart from the issues we're seeing here, this post-inc form is also non-canonical -- clearly (a + C1) == C2 should canonically be a == C2-C1, but we have hacks in InstCombine to preserve this for multi-use cases. There is some relevant discussion on D58633 and in particular some samples where using pre-inc form currently causes regressions. The first step would probably be to figure out what is going on there.

After looking into this a bit, I think that the problems of D58633 do not apply here. The issue there is that InstCombine can fold instructions inside loops that LSR does not understand because it requires loop simplify form. This doesn't apply here because indvars also works on loop simplify form. Given that, I think it should be fine to use pre-inc form in LFTR (possibly post-inc only if it's already used) and let LSR pick up the post-inc transform.

In D63686#1563512, @nikic wrote:

In D63686#1563412, @nikic wrote:

Ideally we would not use post-inc form in the middle end and let LSR move to post-inc if profitable. Apart from the issues we're seeing here, this post-inc form is also non-canonical -- clearly (a + C1) == C2 should canonically be a == C2-C1, but we have hacks in InstCombine to preserve this for multi-use cases. There is some relevant discussion on D58633 and in particular some samples where using pre-inc form currently causes regressions. The first step would probably be to figure out what is going on there.

After looking into this a bit, I think that the problems of D58633 do not apply here. The issue there is that InstCombine can fold instructions inside loops that LSR does not understand because it requires loop simplify form. This doesn't apply here because indvars also works on loop simplify form. Given that, I think it should be fine to use pre-inc form in LFTR (possibly post-inc only if it's already used) and let LSR pick up the post-inc transform.

I think we're on the same page here. Any chance you're motivated to actually do this? I'd be happy to review. :)

nikic mentioned this in D64286: [LFTR] Don't use post-inc IV unless already used.Jul 6 2019, 7:52 AM

Revision Contents

Path

Size

llvm/

trunk/

lib/

Transforms/

Scalar/

IndVarSimplify.cpp

77 lines

test/

Transforms/

IndVarSimplify/

2011-11-01-lftrptr.ll

7 lines

lftr-pr41998.ll

5 lines

Diff 207191

llvm/trunk/lib/Transforms/Scalar/IndVarSimplify.cpp

Show First 20 Lines • Show All 2,294 Lines • ▼ Show 20 Lines	for (BasicBlock::iterator I = L->getHeader()->begin(); isa<PHINode>(I); ++I) {
BestPhi = Phi;		BestPhi = Phi;
BestInit = Init;		BestInit = Init;
}		}
return BestPhi;		return BestPhi;
}		}

/// Insert an IR expression which computes the value held by the IV IndVar		/// Insert an IR expression which computes the value held by the IV IndVar
/// (which must be an loop counter w/unit stride) after the backedge of loop L		/// (which must be an loop counter w/unit stride) after the backedge of loop L
/// is taken IVCount times.		/// is taken ExitCount times.
static Value genLoopLimit(PHINode IndVar, BasicBlock *ExitingBB,		static Value genLoopLimit(PHINode IndVar, BasicBlock *ExitingBB,
const SCEV IVCount, Loop L,		const SCEV ExitCount, bool UsePostInc, Loop L,
SCEVExpander &Rewriter, ScalarEvolution *SE) {		SCEVExpander &Rewriter, ScalarEvolution *SE) {
assert(isLoopCounter(IndVar, L, SE));		assert(isLoopCounter(IndVar, L, SE));
const SCEVAddRecExpr *AR = cast<SCEVAddRecExpr>(SE->getSCEV(IndVar));		const SCEVAddRecExpr *AR = cast<SCEVAddRecExpr>(SE->getSCEV(IndVar));
const SCEV *IVInit = AR->getStart();		const SCEV *IVInit = AR->getStart();

// IVInit may be a pointer while IVCount is an integer when FindLoopCounter		// IVInit may be a pointer while ExitCount is an integer when FindLoopCounter
// finds a valid pointer IV. Sign extend BECount in order to materialize a		// finds a valid pointer IV. Sign extend ExitCount in order to materialize a
// GEP. Avoid running SCEVExpander on a new pointer value, instead reusing		// GEP. Avoid running SCEVExpander on a new pointer value, instead reusing
// the existing GEPs whenever possible.		// the existing GEPs whenever possible.
if (IndVar->getType()->isPointerTy() && !IVCount->getType()->isPointerTy()) {		if (IndVar->getType()->isPointerTy() &&
		!ExitCount->getType()->isPointerTy()) {
// IVOffset will be the new GEP offset that is interpreted by GEP as a		// IVOffset will be the new GEP offset that is interpreted by GEP as a
// signed value. IVCount on the other hand represents the loop trip count,		// signed value. ExitCount on the other hand represents the loop trip count,
// which is an unsigned value. FindLoopCounter only allows induction		// which is an unsigned value. FindLoopCounter only allows induction
// variables that have a positive unit stride of one. This means we don't		// variables that have a positive unit stride of one. This means we don't
// have to handle the case of negative offsets (yet) and just need to zero		// have to handle the case of negative offsets (yet) and just need to zero
// extend IVCount.		// extend ExitCount.
Type *OfsTy = SE->getEffectiveSCEVType(IVInit->getType());		Type *OfsTy = SE->getEffectiveSCEVType(IVInit->getType());
const SCEV *IVOffset = SE->getTruncateOrZeroExtend(IVCount, OfsTy);		const SCEV *IVOffset = SE->getTruncateOrZeroExtend(ExitCount, OfsTy);
		if (UsePostInc)
		IVOffset = SE->getAddExpr(IVOffset, SE->getOne(OfsTy));

// Expand the code for the iteration count.		// Expand the code for the iteration count.
assert(SE->isLoopInvariant(IVOffset, L) &&		assert(SE->isLoopInvariant(IVOffset, L) &&
"Computed iteration count is not loop invariant!");		"Computed iteration count is not loop invariant!");
BranchInst *BI = cast<BranchInst>(ExitingBB->getTerminator());		BranchInst *BI = cast<BranchInst>(ExitingBB->getTerminator());
Value *GEPOffset = Rewriter.expandCodeFor(IVOffset, OfsTy, BI);		Value *GEPOffset = Rewriter.expandCodeFor(IVOffset, OfsTy, BI);

Value *GEPBase = IndVar->getIncomingValueForBlock(L->getLoopPreheader());		Value *GEPBase = IndVar->getIncomingValueForBlock(L->getLoopPreheader());
assert(AR->getStart() == SE->getSCEV(GEPBase) && "bad loop counter");		assert(AR->getStart() == SE->getSCEV(GEPBase) && "bad loop counter");
// We could handle pointer IVs other than i8*, but we need to compensate for		// We could handle pointer IVs other than i8*, but we need to compensate for
// gep index scaling.		// gep index scaling.
assert(SE->getSizeOfExpr(IntegerType::getInt64Ty(IndVar->getContext()),		assert(SE->getSizeOfExpr(IntegerType::getInt64Ty(IndVar->getContext()),
cast<PointerType>(GEPBase->getType())		cast<PointerType>(GEPBase->getType())
->getElementType())->isOne() &&		->getElementType())->isOne() &&
"unit stride pointer IV must be i8*");		"unit stride pointer IV must be i8*");

IRBuilder<> Builder(L->getLoopPreheader()->getTerminator());		IRBuilder<> Builder(L->getLoopPreheader()->getTerminator());
return Builder.CreateGEP(GEPBase->getType()->getPointerElementType(),		return Builder.CreateGEP(GEPBase->getType()->getPointerElementType(),
GEPBase, GEPOffset, "lftr.limit");		GEPBase, GEPOffset, "lftr.limit");
} else {		} else {
// In any other case, convert both IVInit and IVCount to integers before		// In any other case, convert both IVInit and ExitCount to integers before
// comparing. This may result in SCEV expansion of pointers, but in practice		// comparing. This may result in SCEV expansion of pointers, but in practice
// SCEV will fold the pointer arithmetic away as such:		// SCEV will fold the pointer arithmetic away as such:
// BECount = (IVEnd - IVInit - 1) => IVLimit = IVInit (postinc).		// BECount = (IVEnd - IVInit - 1) => IVLimit = IVInit (postinc).
//		//
// Valid Cases: (1) both integers is most common; (2) both may be pointers		// Valid Cases: (1) both integers is most common; (2) both may be pointers
// for simple memset-style loops.		// for simple memset-style loops.
//		//
// IVInit integer and IVCount pointer would only occur if a canonical IV		// IVInit integer and ExitCount pointer would only occur if a canonical IV
// were generated on top of case #2, which is not expected.		// were generated on top of case #2, which is not expected.

assert(AR->getStepRecurrence(*SE)->isOne() && "only handles unit stride");		assert(AR->getStepRecurrence(*SE)->isOne() && "only handles unit stride");
// For unit stride, IVCount = Start + BECount with 2's complement overflow.		// For unit stride, IVCount = Start + ExitCount with 2's complement
		// overflow.
const SCEV *IVInit = AR->getStart();		const SCEV *IVInit = AR->getStart();

// For integer IVs, truncate the IV before computing IVInit + BECount.		// For integer IVs, truncate the IV before computing IVInit + BECount.
if (SE->getTypeSizeInBits(IVInit->getType())		if (SE->getTypeSizeInBits(IVInit->getType())
> SE->getTypeSizeInBits(IVCount->getType()))		> SE->getTypeSizeInBits(ExitCount->getType()))
IVInit = SE->getTruncateExpr(IVInit, IVCount->getType());		IVInit = SE->getTruncateExpr(IVInit, ExitCount->getType());

const SCEV *IVLimit = SE->getAddExpr(IVInit, IVCount);		const SCEV *IVLimit = SE->getAddExpr(IVInit, ExitCount);

		if (UsePostInc)
		IVLimit = SE->getAddExpr(IVLimit, SE->getOne(IVLimit->getType()));

// Expand the code for the iteration count.		// Expand the code for the iteration count.
BranchInst *BI = cast<BranchInst>(ExitingBB->getTerminator());		BranchInst *BI = cast<BranchInst>(ExitingBB->getTerminator());
IRBuilder<> Builder(BI);		IRBuilder<> Builder(BI);
assert(SE->isLoopInvariant(IVLimit, L) &&		assert(SE->isLoopInvariant(IVLimit, L) &&
"Computed iteration count is not loop invariant!");		"Computed iteration count is not loop invariant!");
// Ensure that we generate the same type as IndVar, or a smaller integer		// Ensure that we generate the same type as IndVar, or a smaller integer
// type. In the presence of null pointer values, we have an integer type		// type. In the presence of null pointer values, we have an integer type
// SCEV expression (IVInit) for a pointer type IV value (IndVar).		// SCEV expression (IVInit) for a pointer type IV value (IndVar).
Type *LimitTy = IVCount->getType()->isPointerTy() ?		Type *LimitTy = ExitCount->getType()->isPointerTy() ?
IndVar->getType() : IVCount->getType();		IndVar->getType() : ExitCount->getType();
return Rewriter.expandCodeFor(IVLimit, LimitTy, BI);		return Rewriter.expandCodeFor(IVLimit, LimitTy, BI);
}		}
}		}

/// This method rewrites the exit condition of the loop to be a canonical !=		/// This method rewrites the exit condition of the loop to be a canonical !=
/// comparison against the incremented loop induction variable. This pass is		/// comparison against the incremented loop induction variable. This pass is
/// able to rewrite the exit tests of any loop where the SCEV analysis can		/// able to rewrite the exit tests of any loop where the SCEV analysis can
/// determine a loop-invariant trip count of the loop, which is actually a much		/// determine a loop-invariant trip count of the loop, which is actually a much
/// broader range than just linear tests.		/// broader range than just linear tests.
bool IndVarSimplify::		bool IndVarSimplify::
linearFunctionTestReplace(Loop L, BasicBlock ExitingBB,		linearFunctionTestReplace(Loop L, BasicBlock ExitingBB,
const SCEV *ExitCount,		const SCEV *ExitCount,
PHINode *IndVar, SCEVExpander &Rewriter) {		PHINode *IndVar, SCEVExpander &Rewriter) {
assert(L->getLoopLatch() && "Loop no longer in simplified form?");		assert(L->getLoopLatch() && "Loop no longer in simplified form?");
assert(isLoopCounter(IndVar, L, SE));		assert(isLoopCounter(IndVar, L, SE));
Instruction * const IncVar =		Instruction * const IncVar =
cast<Instruction>(IndVar->getIncomingValueForBlock(L->getLoopLatch()));		cast<Instruction>(IndVar->getIncomingValueForBlock(L->getLoopLatch()));

// Initialize CmpIndVar and IVCount to their preincremented values.		// Initialize CmpIndVar to the preincremented IV.
Value *CmpIndVar = IndVar;		Value *CmpIndVar = IndVar;
const SCEV *IVCount = ExitCount;		bool UsePostInc = false;

// If the exiting block is the same as the backedge block, we prefer to		// If the exiting block is the same as the backedge block, we prefer to
// compare against the post-incremented value, otherwise we must compare		// compare against the post-incremented value, otherwise we must compare
// against the preincremented value.		// against the preincremented value.
if (ExitingBB == L->getLoopLatch()) {		if (ExitingBB == L->getLoopLatch()) {
bool SafeToPostInc = IndVar->getType()->isIntegerTy();		bool SafeToPostInc = IndVar->getType()->isIntegerTy();
if (!SafeToPostInc) {		if (!SafeToPostInc) {
// For pointer IVs, we chose to not strip inbounds which requires us not		// For pointer IVs, we chose to not strip inbounds which requires us not
// to add a potentially UB introducing use. We need to either a) show		// to add a potentially UB introducing use. We need to either a) show
// the loop test we're modifying is already in post-inc form, or b) show		// the loop test we're modifying is already in post-inc form, or b) show
// that adding a use must not introduce UB.		// that adding a use must not introduce UB.
if (ICmpInst *LoopTest = getLoopTest(L, ExitingBB))		if (ICmpInst *LoopTest = getLoopTest(L, ExitingBB))
SafeToPostInc = LoopTest->getOperand(0) == IncVar \|\|		SafeToPostInc = LoopTest->getOperand(0) == IncVar \|\|
LoopTest->getOperand(1) == IncVar;		LoopTest->getOperand(1) == IncVar;
if (!SafeToPostInc)		if (!SafeToPostInc)
SafeToPostInc =		SafeToPostInc =
mustExecuteUBIfPoisonOnPathTo(IncVar, ExitingBB->getTerminator(), DT);		mustExecuteUBIfPoisonOnPathTo(IncVar, ExitingBB->getTerminator(), DT);
}		}

if (SafeToPostInc) {		if (SafeToPostInc) {
// Add one to the "backedge-taken" count to get the trip count.		UsePostInc = true;
// This addition may overflow, which is valid as long as the comparison
// is truncated to ExitCount->getType().
IVCount = SE->getAddExpr(ExitCount,
SE->getOne(ExitCount->getType()));
// The BackedgeTaken expression contains the number of times that the
// backedge branches to the loop header. This is one less than the
// number of times the loop executes, so use the incremented indvar.
CmpIndVar = IncVar;		CmpIndVar = IncVar;
}		}
}		}

// It may be necessary to drop nowrap flags on the incrementing instruction		// It may be necessary to drop nowrap flags on the incrementing instruction
// if either LFTR moves from a pre-inc check to a post-inc check (in which		// if either LFTR moves from a pre-inc check to a post-inc check (in which
// case the increment might have previously been poison on the last iteration		// case the increment might have previously been poison on the last iteration
// only) or if LFTR switches to a different IV that was previously dynamically		// only) or if LFTR switches to a different IV that was previously dynamically
// dead (and as such may be arbitrarily poison). We remove any nowrap flags		// dead (and as such may be arbitrarily poison). We remove any nowrap flags
// that SCEV didn't infer for the post-inc addrec (even if we use a pre-inc		// that SCEV didn't infer for the post-inc addrec (even if we use a pre-inc
// check), because the pre-inc addrec flags may be adopted from the original		// check), because the pre-inc addrec flags may be adopted from the original
// instruction, while SCEV has to explicitly prove the post-inc nowrap flags.		// instruction, while SCEV has to explicitly prove the post-inc nowrap flags.
// TODO: This handling is inaccurate for one case: If we switch to a		// TODO: This handling is inaccurate for one case: If we switch to a
// dynamically dead IV that wraps on the first loop iteration only, which is		// dynamically dead IV that wraps on the first loop iteration only, which is
// not covered by the post-inc addrec. (If the new IV was not dynamically		// not covered by the post-inc addrec. (If the new IV was not dynamically
// dead, it could not be poison on the first iteration in the first place.)		// dead, it could not be poison on the first iteration in the first place.)
if (auto *BO = dyn_cast<BinaryOperator>(IncVar)) {		if (auto *BO = dyn_cast<BinaryOperator>(IncVar)) {
const SCEVAddRecExpr *AR = cast<SCEVAddRecExpr>(SE->getSCEV(IncVar));		const SCEVAddRecExpr *AR = cast<SCEVAddRecExpr>(SE->getSCEV(IncVar));
if (BO->hasNoUnsignedWrap())		if (BO->hasNoUnsignedWrap())
BO->setHasNoUnsignedWrap(AR->hasNoUnsignedWrap());		BO->setHasNoUnsignedWrap(AR->hasNoUnsignedWrap());
if (BO->hasNoSignedWrap())		if (BO->hasNoSignedWrap())
BO->setHasNoSignedWrap(AR->hasNoSignedWrap());		BO->setHasNoSignedWrap(AR->hasNoSignedWrap());
}		}

Value *ExitCnt = genLoopLimit(IndVar, ExitingBB, IVCount, L, Rewriter, SE);		Value *ExitCnt = genLoopLimit(
		IndVar, ExitingBB, ExitCount, UsePostInc, L, Rewriter, SE);
assert(ExitCnt->getType()->isPointerTy() ==		assert(ExitCnt->getType()->isPointerTy() ==
IndVar->getType()->isPointerTy() &&		IndVar->getType()->isPointerTy() &&
"genLoopLimit missed a cast");		"genLoopLimit missed a cast");

// Insert a new icmp_ne or icmp_eq instruction before the branch.		// Insert a new icmp_ne or icmp_eq instruction before the branch.
BranchInst *BI = cast<BranchInst>(ExitingBB->getTerminator());		BranchInst *BI = cast<BranchInst>(ExitingBB->getTerminator());
ICmpInst::Predicate P;		ICmpInst::Predicate P;
if (L->contains(BI->getSuccessor(0)))		if (L->contains(BI->getSuccessor(0)))
P = ICmpInst::ICMP_NE;		P = ICmpInst::ICMP_NE;
else		else
P = ICmpInst::ICMP_EQ;		P = ICmpInst::ICMP_EQ;

IRBuilder<> Builder(BI);		IRBuilder<> Builder(BI);

// The new loop exit condition should reuse the debug location of the		// The new loop exit condition should reuse the debug location of the
// original loop exit condition.		// original loop exit condition.
if (auto *Cond = dyn_cast<Instruction>(BI->getCondition()))		if (auto *Cond = dyn_cast<Instruction>(BI->getCondition()))
Builder.SetCurrentDebugLocation(Cond->getDebugLoc());		Builder.SetCurrentDebugLocation(Cond->getDebugLoc());

// LFTR can ignore IV overflow and truncate to the width of		// LFTR can ignore IV overflow and truncate to the width of
// BECount. This avoids materializing the add(zext(add)) expression.		// ExitCount. This avoids materializing the add(zext(add)) expression.
unsigned CmpIndVarSize = SE->getTypeSizeInBits(CmpIndVar->getType());		unsigned CmpIndVarSize = SE->getTypeSizeInBits(CmpIndVar->getType());
unsigned ExitCntSize = SE->getTypeSizeInBits(ExitCnt->getType());		unsigned ExitCntSize = SE->getTypeSizeInBits(ExitCnt->getType());
if (CmpIndVarSize > ExitCntSize) {		if (CmpIndVarSize > ExitCntSize) {
const SCEVAddRecExpr *AR = cast<SCEVAddRecExpr>(SE->getSCEV(IndVar));		const SCEVAddRecExpr *AR = cast<SCEVAddRecExpr>(SE->getSCEV(IndVar));
const SCEV *ARStart = AR->getStart();		const SCEV *ARStart = AR->getStart();
const SCEV ARStep = AR->getStepRecurrence(SE);		const SCEV ARStep = AR->getStepRecurrence(SE);
// For constant IVCount, avoid truncation.		// For constant ExitCount, avoid truncation.
if (isa<SCEVConstant>(ARStart) && isa<SCEVConstant>(IVCount)) {		if (isa<SCEVConstant>(ARStart) && isa<SCEVConstant>(ExitCount)) {
const APInt &Start = cast<SCEVConstant>(ARStart)->getAPInt();		const APInt &Start = cast<SCEVConstant>(ARStart)->getAPInt();
APInt Count = cast<SCEVConstant>(IVCount)->getAPInt();		APInt Count = cast<SCEVConstant>(ExitCount)->getAPInt();
// Note that the post-inc value of ExitCount may have overflowed
// above such that IVCount is now zero.
if (IVCount != ExitCount && Count == 0) {
Count = APInt::getMaxValue(Count.getBitWidth()).zext(CmpIndVarSize);
++Count;
}
else
Count = Count.zext(CmpIndVarSize);		Count = Count.zext(CmpIndVarSize);
		if (UsePostInc)
		++Count;
APInt NewLimit;		APInt NewLimit;
if (cast<SCEVConstant>(ARStep)->getValue()->isNegative())		if (cast<SCEVConstant>(ARStep)->getValue()->isNegative())
NewLimit = Start - Count;		NewLimit = Start - Count;
else		else
NewLimit = Start + Count;		NewLimit = Start + Count;
ExitCnt = ConstantInt::get(CmpIndVar->getType(), NewLimit);		ExitCnt = ConstantInt::get(CmpIndVar->getType(), NewLimit);
} else {		} else {
// We try to extend trip count first. If that doesn't work we truncate IV.		// We try to extend trip count first. If that doesn't work we truncate IV.
Show All 28 Lines	if (isa<SCEVConstant>(ARStart) && isa<SCEVConstant>(ExitCount)) {
"lftr.wideiv");		"lftr.wideiv");
}		}
}		}
LLVM_DEBUG(dbgs() << "INDVARS: Rewriting loop exit condition to:\n"		LLVM_DEBUG(dbgs() << "INDVARS: Rewriting loop exit condition to:\n"
<< " LHS:" << *CmpIndVar << '\n'		<< " LHS:" << *CmpIndVar << '\n'
<< " op:\t" << (P == ICmpInst::ICMP_NE ? "!=" : "==")		<< " op:\t" << (P == ICmpInst::ICMP_NE ? "!=" : "==")
<< "\n"		<< "\n"
<< " RHS:\t" << *ExitCnt << "\n"		<< " RHS:\t" << *ExitCnt << "\n"
<< " IVCount:\t" << *IVCount << "\n"		<< "ExitCount:\t" << *ExitCount << "\n"
<< " was: " << *BI->getCondition() << "\n");		<< " was: " << *BI->getCondition() << "\n");

Value *Cond = Builder.CreateICmp(P, CmpIndVar, ExitCnt, "exitcond");		Value *Cond = Builder.CreateICmp(P, CmpIndVar, ExitCnt, "exitcond");
Value *OrigCond = BI->getCondition();		Value *OrigCond = BI->getCondition();
// It's tempting to use replaceAllUsesWith here to fully replace the old		// It's tempting to use replaceAllUsesWith here to fully replace the old
// comparison, but that's not immediately safe, since users of the old		// comparison, but that's not immediately safe, since users of the old
// comparison may not be dominated by the new comparison. Instead, just		// comparison may not be dominated by the new comparison. Instead, just
// update the branch to use the new comparison; in the common case this		// update the branch to use the new comparison; in the common case this
▲ Show 20 Lines • Show All 333 Lines • Show Last 20 Lines

llvm/trunk/test/Transforms/IndVarSimplify/2011-11-01-lftrptr.ll

	Show First 20 Lines • Show All 140 Lines • ▼ Show 20 Lines
	; PTR64-NEXT: br label [[LOOPGUARD:%.*]]			; PTR64-NEXT: br label [[LOOPGUARD:%.*]]
	; PTR64: loopguard:			; PTR64: loopguard:
	; PTR64-NEXT: [[BI:%.]] = ptrtoint i8 [[BUF:%.*]] to i32			; PTR64-NEXT: [[BI:%.]] = ptrtoint i8 [[BUF:%.*]] to i32
	; PTR64-NEXT: [[EI:%.]] = ptrtoint i8 [[END:%.*]] to i32			; PTR64-NEXT: [[EI:%.]] = ptrtoint i8 [[END:%.*]] to i32
	; PTR64-NEXT: [[CNT:%.*]] = sub i32 [[EI]], [[BI]]			; PTR64-NEXT: [[CNT:%.*]] = sub i32 [[EI]], [[BI]]
	; PTR64-NEXT: [[GUARD:%.*]] = icmp ult i32 0, [[CNT]]			; PTR64-NEXT: [[GUARD:%.*]] = icmp ult i32 0, [[CNT]]
	; PTR64-NEXT: br i1 [[GUARD]], label [[PREHEADER:%.]], label [[EXIT:%.]]			; PTR64-NEXT: br i1 [[GUARD]], label [[PREHEADER:%.]], label [[EXIT:%.]]
	; PTR64: preheader:			; PTR64: preheader:
	; PTR64-NEXT: [[TMP1:%.*]] = zext i32 [[CNT]] to i64			; PTR64-NEXT: [[TMP1:%.*]] = add i32 [[EI]], -1
	; PTR64-NEXT: [[LFTR_LIMIT:%.]] = getelementptr i8, i8 null, i64 [[TMP1]]			; PTR64-NEXT: [[TMP2:%.*]] = sub i32 [[TMP1]], [[BI]]
				; PTR64-NEXT: [[TMP3:%.*]] = zext i32 [[TMP2]] to i64
				; PTR64-NEXT: [[TMP4:%.*]] = add nuw nsw i64 [[TMP3]], 1
				; PTR64-NEXT: [[LFTR_LIMIT:%.]] = getelementptr i8, i8 null, i64 [[TMP4]]
	; PTR64-NEXT: br label [[LOOP:%.*]]			; PTR64-NEXT: br label [[LOOP:%.*]]
	; PTR64: loop:			; PTR64: loop:
	; PTR64-NEXT: [[P_01_US_US:%.]] = phi i8 [ null, [[PREHEADER]] ], [ [[GEP:%.*]], [[LOOP]] ]			; PTR64-NEXT: [[P_01_US_US:%.]] = phi i8 [ null, [[PREHEADER]] ], [ [[GEP:%.*]], [[LOOP]] ]
	; PTR64-NEXT: [[GEP]] = getelementptr inbounds i8, i8* [[P_01_US_US]], i64 1			; PTR64-NEXT: [[GEP]] = getelementptr inbounds i8, i8* [[P_01_US_US]], i64 1
	; PTR64-NEXT: [[SNEXT:%.]] = load i8, i8 [[GEP]]			; PTR64-NEXT: [[SNEXT:%.]] = load i8, i8 [[GEP]]
	; PTR64-NEXT: [[EXITCOND:%.]] = icmp ne i8 [[GEP]], [[LFTR_LIMIT]]			; PTR64-NEXT: [[EXITCOND:%.]] = icmp ne i8 [[GEP]], [[LFTR_LIMIT]]
	; PTR64-NEXT: br i1 [[EXITCOND]], label [[LOOP]], label [[EXIT_LOOPEXIT:%.*]]			; PTR64-NEXT: br i1 [[EXITCOND]], label [[LOOP]], label [[EXIT_LOOPEXIT:%.*]]
	; PTR64: exit.loopexit:			; PTR64: exit.loopexit:
	▲ Show 20 Lines • Show All 192 Lines • Show Last 20 Lines

llvm/trunk/test/Transforms/IndVarSimplify/lftr-pr41998.ll

	Show All 36 Lines
	}			}

	@data = global [256 x i8] zeroinitializer			@data = global [256 x i8] zeroinitializer

	define void @test_ptr(i32 %start) {			define void @test_ptr(i32 %start) {
	; CHECK-LABEL: @test_ptr(			; CHECK-LABEL: @test_ptr(
	; CHECK-NEXT: entry:			; CHECK-NEXT: entry:
	; CHECK-NEXT: [[TMP0:%.]] = trunc i32 [[START:%.]] to i3			; CHECK-NEXT: [[TMP0:%.]] = trunc i32 [[START:%.]] to i3
	; CHECK-NEXT: [[TMP1:%.*]] = sub i3 0, [[TMP0]]			; CHECK-NEXT: [[TMP1:%.*]] = sub i3 -1, [[TMP0]]
	; CHECK-NEXT: [[TMP2:%.*]] = zext i3 [[TMP1]] to i64			; CHECK-NEXT: [[TMP2:%.*]] = zext i3 [[TMP1]] to i64
	; CHECK-NEXT: [[LFTR_LIMIT:%.]] = getelementptr i8, i8 getelementptr inbounds ([256 x i8], [256 x i8]* @data, i64 0, i64 0), i64 [[TMP2]]			; CHECK-NEXT: [[TMP3:%.*]] = add nuw nsw i64 [[TMP2]], 1
				; CHECK-NEXT: [[LFTR_LIMIT:%.]] = getelementptr i8, i8 getelementptr inbounds ([256 x i8], [256 x i8]* @data, i64 0, i64 0), i64 [[TMP3]]
	; CHECK-NEXT: br label [[LOOP:%.*]]			; CHECK-NEXT: br label [[LOOP:%.*]]
	; CHECK: loop:			; CHECK: loop:
	; CHECK-NEXT: [[P:%.]] = phi i8 [ getelementptr inbounds ([256 x i8], [256 x i8]* @data, i64 0, i64 0), [[ENTRY:%.]] ], [ [[P_INC:%.]], [[LOOP]] ]			; CHECK-NEXT: [[P:%.]] = phi i8 [ getelementptr inbounds ([256 x i8], [256 x i8]* @data, i64 0, i64 0), [[ENTRY:%.]] ], [ [[P_INC:%.]], [[LOOP]] ]
	; CHECK-NEXT: [[P_INC]] = getelementptr inbounds i8, i8* [[P]], i64 1			; CHECK-NEXT: [[P_INC]] = getelementptr inbounds i8, i8* [[P]], i64 1
	; CHECK-NEXT: store volatile i8 0, i8* [[P_INC]]			; CHECK-NEXT: store volatile i8 0, i8* [[P_INC]]
	; CHECK-NEXT: [[EXITCOND:%.]] = icmp eq i8 [[P_INC]], [[LFTR_LIMIT]]			; CHECK-NEXT: [[EXITCOND:%.]] = icmp eq i8 [[P_INC]], [[LFTR_LIMIT]]
	; CHECK-NEXT: br i1 [[EXITCOND]], label [[END:%.*]], label [[LOOP]]			; CHECK-NEXT: br i1 [[EXITCOND]], label [[END:%.*]], label [[LOOP]]
	; CHECK: end:			; CHECK: end:
	Show All 18 Lines