This is an archive of the discontinued LLVM Phabricator instance.

Paths

Table of Contentst

-
include/llvm/ADT/
-
llvm/
-
ADT/
-
STLExtras.h
-
lib/Transforms/Scalar/
-
Transforms/
-
Scalar/
4
IndVarSimplify.cpp
-
test/Transforms/IndVarSimplify/
-
Transforms/
-
IndVarSimplify/
1
PR31181.ll
-
widen-nsw.ll

Differential D30446

[IndVars] Do not branch on poison
AbandonedPublic

Authored by sanjoy on Feb 27 2017, 10:56 PM.

Download Raw Diff

Details

Reviewers

majnemer
efriedma

Summary

Earlier, LFTR would introduce branches on values that could
potentially be poison. This change fixes the issue by demoting the
value LFTR is about to introduce a branch on to a non-poison variant
of itself whenever it cannot statically prove that it isn't poison.

Fixes PR31181.

Diff Detail

Event Timeline

sanjoy created this revision.Feb 27 2017, 10:56 PM

Herald added a subscriber: mcrosier. · View Herald TranscriptFeb 27 2017, 10:56 PM

sanjoy added parent revisions: D30445: [ValueTracking] Add a isIVNeverPoison helper, D30444: Rename isKnownNotFullPoison to programUndefinedIfPoison; NFC, D30443: Add a propagateKnownNonPoison helper.Mar 14 2017, 4:12 PM

ping (on this and all the patches this one depends on)

I guess first off, I'd like to take a step back and ask: do we need to do this transform at all? We're using some very complicated logic to conditionally rewrite an induction variable in terms of another induction variable... and then LSR will rewrite all the induction variables from scratch. This also seems difficult to test effectively: if we incorrectly rewrite an induction variable to something which could be poison, it'll almost always work, except in obscure cases.

In D30446#702025, @efriedma wrote:

I guess first off, I'd like to take a step back and ask: do we need to do this transform at all? We're using some very complicated logic to conditionally rewrite an induction variable in terms of another induction variable... and then LSR will rewrite all the induction variables from scratch. This also seems difficult to test effectively: if we incorrectly rewrite an induction variable to something which could be poison, it'll almost always work, except in obscure cases.

I see this transform as a "canonicalization" type transform (supposed to make later loop opts easier), but I did not add it. Maybe @atrick can shed some light here on the initial motivation? I'm most certainly in favor of deleting code where possible.

Having said that, I don't think this is the only place in LLVM where the "it'll almost always work, except in obscure cases" adage applies, so that alone can't be the reason for excising this.

I see this transform as a "canonicalization" type transform (supposed to make later loop opts easier), but I did not add it. Maybe @atrick can shed some light here on the initial motivation? I'm most certainly in favor of deleting code where possible.

I don't know of any good reason to run linearFunctionTestReplace during IndVarSimplify. I think it should be part of LSR's OptimizeLoopTermCond(), which is the only thing that depends on it now AFAIK. I suppose it's possible that some loop vectorizer code relies on LFTR.

If SCEV provides the same information before and after LFTR, then it doesn't seem like a useful canonicalization.

Replacing a quadratic loop test (i*i < N) with a linear loop test does seem like a useful simplification, but I suspect it's exceedingly rare.

I'm not sure how moving it to LSR would help you deal with poison on the backedge though. If you completely remove LFTR then

We lose the quadratic to linear optimization (the original meaning of LFTR)
I think LSR will fail to generate certain expected patterns like counting down to zero

The part that's actually making this code complicated is the attempt to reuse an existing induction variable. Hopefully we don't have to worry about the trip count itself being poison?

Since LSR makes new induction variables anyway, I think the problem this patch is trying to solve doesn't exist if we transform the loop test in LSR. And it would be a bit more powerful there, because we could transform the loop test whether or not the loop actually has a canonical induction variable.

Materializing a new IV isn't a good canonicalization. It may block other loop transformations and there's no guarantee that LSR will undo it. General rule: never pessimize your loop and expect LSR to rewrite it. LSR only works well on a small subset of loops and essentially gives up in a lot of cases. I agree the complexity smells bad. I don't have a better suggestion at the moment.

Err, oops, I can see how that could be misread. I wasn't suggesting we should make a new IV in LSR; I was suggesting we could more aggressively use existing IVs because LSR will reconstruct the underlying IR.

When I talk about creating or removing an IV, I'm talking about the strongly connected phi cycle. I'm not referring to any "derived IV" stemming from values that depend on the IV.

I found the dependent patches and roughly figured out what's hapenning here. It seems like we could leave the original IV's nw flags in place and simply introduce an add without the nuw flags for the branch's post-increment. Maybe that's what Eli is suggesting? I think that would result in a nicely canonical form, but it may require adjusting downstream passes. I think the risk of interfering with downstream passes could be reduced by moving all of LFTR into LSR.

In D30446#702790, @atrick wrote:

When I talk about creating or removing an IV, I'm talking about the strongly connected phi cycle. I'm not referring to any "derived IV" stemming from values that depend on the IV.

I found the dependent patches and roughly figured out what's hapenning here. It seems like we could leave the original IV's nw flags in place and simply introduce an add without the nuw flags for the branch's post-increment. Maybe that's what Eli is suggesting? I think that would result in a nicely canonical form, but it may require adjusting downstream passes. I think the risk of interfering with downstream passes could be reduced by moving all of LFTR into LSR.

It isn't just that, but also cases like this:

target datalayout = "e-m:e-i64:64-f80:128-n8:16:32:64-S128"
target triple = "x86_64-unknown-linux-gnu"

;; Slightly edited version of the @f_1 function in the test case, but as I've noted
;; the test case in the patch is broken and needs to be fixed to be the same as this.
define i32 @f_1(i32* %ptr, i1 %always_false) {
entry:
  br label %for.cond

for.cond:
  %iv = phi i32 [ -2147483648, %entry ], [ %iv.inc, %always_taken ]
  %iv2 = phi i32 [ 0, %entry ], [ %iv2.inc, %always_taken ]
  store i32 %iv, i32* %ptr
  br i1 %always_false, label %never_taken, label %always_taken

never_taken:
  store volatile i32 %iv2, i32* %ptr
  br label %always_taken

always_taken:
  %iv.inc = add nsw i32 %iv, 1
  %iv2.inc = add nsw i32 %iv2, 1
  %cmp = icmp slt i32 %iv, 20
  br i1 %cmp, label %for.cond, label %for.end

for.end:
  ret i32 0
}

where LFTR rewrites the exit condition to use %iv2.inc (a "different" induction variable) which may be poison (and is poison in the last two iterations in this case).

Hanging the comparison off of a non-nsw/nuw increment of %iv2 won't help since by the last iteration %iv2 is poison, so a non-nsw/nuw addition will propagate that poison.

test/Transforms/IndVarSimplify/PR31181.ll
51	This test case is wrong -- it the increment on `%iv.inc` should just be `nsw` and not `nuw`.

Ok. I thought the problem was caused by converting the loop test to a post-increment compare. Is that a problem?

The problem you've been talking about is caused by FindLoopCounter. This is mostly about setting up a specific pattern of loop for count-down-to-zero codegen. I wanted to make sure we don't create additional IVs in the process. A side effect is that we can reduce the number of IVs for some loops as a "nice" simplification. That doesn't seem worth doing if we need to erase flags. Why doesn't FindLoopCounter just check isIVNeverPoison?

At any rate, FindLoopCounter is making the wrong choice here. I don't understand why it's picking an "AlmostDead" IV over an IV that starts at non-zero.

In D30446#702912, @atrick wrote:

Ok. I thought the problem was caused by converting the loop test to a post-increment compare. Is that a problem?

Yes, but that isn't the only problem.

The problem you've been talking about is caused by FindLoopCounter.

Yes.

This is mostly about setting up a specific pattern of loop for count-down-to-zero codegen. I wanted to make sure we don't create additional IVs in the process. A side effect is that we can reduce the number of IVs for some loops as a "nice" simplification. That doesn't seem worth doing if we need to erase flags. Why doesn't FindLoopCounter just check isIVNeverPoison?

That would be correct, but it pessimizes the case where it would have been okay, for the various reasons outlined in the code for this patch, to nevertheless emit a branch on a potentially poison IV.

At any rate, FindLoopCounter is making the wrong choice here. I don't understand why it's picking an "AlmostDead" IV over an IV that starts at non-zero.

Are you talking about the example I pasted above? Then the IV picked by FindLoopCounter is not AlmostDead, since it (deliberately) has a use in the never_taken block.

I was referring to the same example but missed that there were two stores. So FindLoopCounter is working as intended.

I don't think that dropping flags is a good tradeoff for counting from zero. Especially this early in the pipeline. Beyond that, I'll leave it up to you!

Fix test case

Hi Andy,

I'd like to revive this patch. I tried what you suggested -- I taught LFTR to bail out of the transform if it would have to drop no-wrap flags. With that change, the following tests fail:

LLVM :: Transforms/IndVarSimplify/2011-10-27-lftrnull.ll
LLVM :: Transforms/IndVarSimplify/2011-11-01-lftrptr.ll
LLVM :: Transforms/IndVarSimplify/PR31181.ll
LLVM :: Transforms/IndVarSimplify/lftr-udiv-tripcount.ll
LLVM :: Transforms/IndVarSimplify/lftr-zext.ll
LLVM :: Transforms/IndVarSimplify/lftr_simple.ll
LLVM :: Transforms/IndVarSimplify/widen-nsw.ll

If that seems reasonable (i.e. pessimizing the cases tested in the tests above seem okay), I'll go ahead and change makeValueNonPoisonOnBackEdge to not drop no-wrap flags.

I'd also appreciate it someone takes a look at the dependencies of this patch.

The other option would be to keep LFTR aggressive (i.e. have it drop no-wrap flags if needed), but to do that only if the backedge condition is quadratic.

My suggestion was this: LFTR is not necessary this early in the pipeline if it's only purpose is converting a 'cmp slt' to 'cmp eq'. As long as SCEV can compute the trip count, the loop test shouldn't matter. If we really need to convert the loop test, it could be done during LSR and we could drop the 'nw' flags at that point.

We still want to remove sext/zext by widening, remove quadratic IVs, etc. It's not clear to me why we need to pessimize those particular test cases. In all those cases, is the problem that the rewritten loop test uses a possibly-poison IV? I don't even see nsw/nuw flags in the tests.

@mzolotukhin, if we do need to pessimize any of these cases, can you evaluate the performance impact, or suggest someone who can?

In D30446#742283, @atrick wrote:

My suggestion was this: LFTR is not necessary this early in the pipeline if it's only purpose is converting a 'cmp slt' to 'cmp eq'. As long as SCEV can compute the trip count, the loop test shouldn't matter. If we really need to convert the loop test, it could be done during LSR and we could drop the 'nw' flags at that point.

We still want to remove sext/zext by widening, remove quadratic IVs, etc.

I think we're on the same page then (https://reviews.llvm.org/D30446#741950).

It's not clear to me why we need to pessimize those particular test cases. In all those cases, is the problem that the rewritten loop test uses a possibly-poison IV? I don't even see nsw/nuw flags in the tests.

I'll have to check case by case, but in several tests I see an inbounds GEP, which can also generate poison.

@mzolotukhin, if we do need to pessimize any of these cases, can you evaluate the performance impact, or suggest someone who can?

I can provide a version of this patch that avoids stripping no-wrap flags if that helps. Alternatively, you can remove the code paths that call dropPoisonGeneratingFlags and get the same effect.

We just ran across this again, with a bit more of an urgent motivation this time. I'd very much like to see this patch make progress.

If I follow the discussion so far, it basically comes down to:

We should factor out the LFTR code such that it can be used by LSR as well as IndVarSimplify.
Once done, we should restrict the IndVar version to "do no harm". Specifically, this means that we do *not* strip nsw/nuw flags, we simply don't convert in that case.
The LSR version should use the "strip flags if needed" strategy.

Is everyone comfortable with that direction?

Looking at a few of the test cases Sanjoy mentioned, I think we could also trivially re-infer flags in several of the cases. We could potentially integrate that logic into the IndVar version of the code if desired - i.e. trigger if either no-nsw, or could trivially infer nsw on IV w/new use added.

lib/Transforms/Scalar/IndVarSimplify.cpp
2174	Ok, seeing the usage here, I think this is probably the wrong interface for the problem even if we want to pursue this approach. I think we'd be much better off with an analysis which provided an interface for asking whether a given Instruction is known to be non-poison. Your IV analysis could be used, but you could also walk forward through uses to by-assumption non-poison use as well.
2193	minor: can't we do this before the more general search since we already have the information? Or are you trying to check where the new IV itself is known non-poison?

As a generally good direction I like the idea of moving LFTR, and eventually even IV widening, forward to the LSR stage of the pipeline.
So, I'm ok with it as long as you do the normal amount of benchmark validation and @mzolotukhin doesn't see any immediate performance issues with this approach.

Sorry for the delay, I missed this thread somehow.
I'm running tests now, and publish the results as soon as I get them.

Michael

I ran SPEC2006 and CTMark with and without the patches and didn't see any performance difference.

Michael

Thanks for doing the performance validation @mzolotukhin.
Nice code @sanjoy.

sanjoy mentioned this in D30443: Add a propagateKnownNonPoison helper.Jun 17 2017, 3:17 PM

sanjoy added inline comments.Jun 17 2017, 3:45 PM

lib/Transforms/Scalar/IndVarSimplify.cpp
2174	I think a context-free analysis with a simpler interface will be less powerful (if I understand your suggestion correctly). For instance, say we have: loop: %iv = phi ... %iv.inc = add nsw i32 %iv, ... %cmp = %iv.inc s< ... call void @may_exit() br i1 %cmp, ... An analysis can't correctly return true for `isKnownNonPoison(%iv.inc)` since it is possible for it to be poison and have `@may_exit` call `exit(0)` to avoid the UB due to branching on poison. However, if we know the expression we're about to generate needs to be non-poison only at the point of the existing backedge, we can exploit the fact that the original condition for the branch is known to be not poison at that location, which is what I'm trying to do via `propagateKnownNonPoison`.
2193	I certainly want to first check if it is possible to get away without calling `dropPoisonGeneratingFlags` on anything at all. However, I don't remember if there was a good reason to have `PoisonGenerator->dropPoisonGeneratingFlags()` before `IV->dropPoisonGeneratingFlags()`. One (somewhat weak) reason to do this: when I do `IV->dropPoisonGeneratingFlags()` I only affect stuff within the loop[0] I'm trying to optimize but when I do `PoisonGenerator->dropPoisonGeneratingFlags()` I may affect larger scope of things. [0]: This is not really true (I affect stuff that uses the exit value of the IV, for one thing), which is why this line of reasoning is weak.

fedor.sergeev mentioned this in D60935: [IndVarSimplify] Fixup nowrap flags during LFTR when moving to post-inc (PR31181).May 8 2019, 1:27 AM

sanjoy abandoned this revision.Jan 29 2022, 5:37 PM

Herald added a subscriber: bixia. · View Herald TranscriptJan 29 2022, 5:37 PM

Revision Contents

Path

Size

include/

llvm/

ADT/

STLExtras.h

11 lines

lib/

Transforms/

Scalar/

IndVarSimplify.cpp

77 lines

test/

Transforms/

IndVarSimplify/

PR31181.ll

114 lines

widen-nsw.ll

2 lines

Diff 92252

include/llvm/ADT/STLExtras.h

	Show First 20 Lines • Show All 853 Lines • ▼ Show 20 Lines
	/// Wrapper function around std::find to detect if an element exists			/// Wrapper function around std::find to detect if an element exists
	/// in a container.			/// in a container.
	template <typename R, typename E>			template <typename R, typename E>
	bool is_contained(R &&Range, const E &Element) {			bool is_contained(R &&Range, const E &Element) {
	return std::find(std::begin(Range), std::end(Range), Element) !=			return std::find(std::begin(Range), std::end(Range), Element) !=
	std::end(Range);			std::end(Range);
	}			}

				/// Wrapper function around std::lower_bound to detect if an element
				/// exists in a container via binary search. The same requirements as
				/// for std::lower_bound (that is: Range must be partially ordered,
				/// with respect to the expression Element < Value for Value in Range)
				/// applies here.
				template <typename R, typename E>
				bool binary_search(R &&Range, const E &Element) {
				auto It = std::lower_bound(std::begin(Range), std::end(Range), Element);
				return It != std::end(Range) && *It == Element;
				}

	/// Wrapper function around std::count to count the number of times an element			/// Wrapper function around std::count to count the number of times an element
	/// \p Element occurs in the given range \p Range.			/// \p Element occurs in the given range \p Range.
	template <typename R, typename E>			template <typename R, typename E>
	auto count(R &&Range, const E &Element) -> typename std::iterator_traits<			auto count(R &&Range, const E &Element) -> typename std::iterator_traits<
	decltype(std::begin(Range))>::difference_type {			decltype(std::begin(Range))>::difference_type {
	return std::count(std::begin(Range), std::end(Range), Element);			return std::count(std::begin(Range), std::end(Range), Element);
	}			}

	▲ Show 20 Lines • Show All 250 Lines • Show Last 20 Lines

lib/Transforms/Scalar/IndVarSimplify.cpp

Show All 24 Lines
//===----------------------------------------------------------------------===//		//===----------------------------------------------------------------------===//

#include "llvm/Transforms/Scalar/IndVarSimplify.h"		#include "llvm/Transforms/Scalar/IndVarSimplify.h"
#include "llvm/ADT/SmallVector.h"		#include "llvm/ADT/SmallVector.h"
#include "llvm/ADT/Statistic.h"		#include "llvm/ADT/Statistic.h"
#include "llvm/Analysis/GlobalsModRef.h"		#include "llvm/Analysis/GlobalsModRef.h"
#include "llvm/Analysis/LoopInfo.h"		#include "llvm/Analysis/LoopInfo.h"
#include "llvm/Analysis/LoopPass.h"		#include "llvm/Analysis/LoopPass.h"
		#include "llvm/Analysis/ValueTracking.h"
#include "llvm/Analysis/ScalarEvolutionAliasAnalysis.h"		#include "llvm/Analysis/ScalarEvolutionAliasAnalysis.h"
#include "llvm/Analysis/ScalarEvolutionExpander.h"		#include "llvm/Analysis/ScalarEvolutionExpander.h"
#include "llvm/Analysis/TargetLibraryInfo.h"		#include "llvm/Analysis/TargetLibraryInfo.h"
#include "llvm/Analysis/TargetTransformInfo.h"		#include "llvm/Analysis/TargetTransformInfo.h"
#include "llvm/IR/BasicBlock.h"		#include "llvm/IR/BasicBlock.h"
#include "llvm/IR/CFG.h"		#include "llvm/IR/CFG.h"
#include "llvm/IR/Constants.h"		#include "llvm/IR/Constants.h"
#include "llvm/IR/DataLayout.h"		#include "llvm/IR/DataLayout.h"
▲ Show 20 Lines • Show All 2,090 Lines • ▼ Show 20 Lines	if (IndVar->getType()->isPointerTy() && !IVCount->getType()->isPointerTy()) {
// type. In the presence of null pointer values, we have an integer type		// type. In the presence of null pointer values, we have an integer type
// SCEV expression (IVInit) for a pointer type IV value (IndVar).		// SCEV expression (IVInit) for a pointer type IV value (IndVar).
Type *LimitTy = IVCount->getType()->isPointerTy() ?		Type *LimitTy = IVCount->getType()->isPointerTy() ?
IndVar->getType() : IVCount->getType();		IndVar->getType() : IVCount->getType();
return Rewriter.expandCodeFor(IVLimit, LimitTy, BI);		return Rewriter.expandCodeFor(IVLimit, LimitTy, BI);
}		}
}		}

		/// We want to insert a branch on IV on the backedge of L. Return true if we
		/// can legally do that (branching on poison is undefined behavior), modifying
		/// instructions if necessary.
		static bool makeValueNonPoisonOnBackEdge(Instruction IV, Loop L) {
		auto IVNeverPoison = isIVNeverPoison(IV, L);
		if (IVNeverPoison.first)
		return true;

		Instruction *PoisonGenerator =
		const_cast<Instruction *>(IVNeverPoison.second);

		// If PoisonGenerator is non-null then we can call
		// dropPoisonGeneratingFlags on it to make IV not poison. However,
		// let's first check if we can avoid dropping no-wrap flags at all.

		Value *BECond = nullptr;
		auto *LatchTerminator = L->getLoopLatch()->getTerminator();
		if (auto *BI = dyn_cast<BranchInst>(LatchTerminator)) {
		if (BI->isConditional())
		BECond = dyn_cast<Instruction>(BI->getCondition());
		} else if (auto *SI = dyn_cast<SwitchInst>(LatchTerminator)) {
		BECond = dyn_cast<Instruction>(SI->getCondition());
		}

		if (BECond) {
		auto InLoop = [&](const Value *V) {
		auto *I = dyn_cast<Instruction>(V);
		return I && L->contains(I);
		};

		// We know that at the backedge we will branch on BECond, which in turn
		// means that BECond itself is not poison. See if that knowledge amounts to
		// something.
		SmallVector<const Value *, 8> NotPoison;
		propagateKnownNonPoison(BECond, NotPoison, InLoop);
		reamesUnsubmitted Not Done Reply Inline Actions Ok, seeing the usage here, I think this is probably the wrong interface for the problem even if we want to pursue this approach. I think we'd be much better off with an analysis which provided an interface for asking whether a given Instruction is known to be non-poison. Your IV analysis could be used, but you could also walk forward through uses to by-assumption non-poison use as well. reames: Ok, seeing the usage here, I think this is probably the wrong interface for the problem even if…
		sanjoyAuthorUnsubmitted Not Done Reply Inline Actions I think a context-free analysis with a simpler interface will be less powerful (if I understand your suggestion correctly). For instance, say we have: loop: %iv = phi ... %iv.inc = add nsw i32 %iv, ... %cmp = %iv.inc s< ... call void @may_exit() br i1 %cmp, ... An analysis can't correctly return true for `isKnownNonPoison(%iv.inc)` since it is possible for it to be poison and have `@may_exit` call `exit(0)` to avoid the UB due to branching on poison. However, if we know the expression we're about to generate needs to be non-poison only at the point of the existing backedge, we can exploit the fact that the original condition for the branch is known to be not poison at that location, which is what I'm trying to do via `propagateKnownNonPoison`. sanjoy: I think a context-free analysis with a simpler interface will be less powerful (if I understand…

		array_pod_sort(NotPoison.begin(), NotPoison.end());

		if (binary_search(NotPoison, IV))
		return true;

		auto IsNonPoison = [&](Value *V) {
		return isa<ConstantInt>(V) \|\| binary_search(NotPoison, V);
		};

		if (all_of(IV->operands(), IsNonPoison)) {
		// All of the inputs to IV are non-poison. This means I can make IV
		// non-poison by preventing it from generating new poison.
		IV->dropPoisonGeneratingFlags();
		return true;
		}
		}

		if (PoisonGenerator) {
		reamesUnsubmitted Not Done Reply Inline Actions minor: can't we do this before the more general search since we already have the information? Or are you trying to check where the new IV itself is known non-poison? reames: minor: can't we do this before the more general search since we already have the information?
		sanjoyAuthorUnsubmitted Not Done Reply Inline Actions I certainly want to first check if it is possible to get away without calling `dropPoisonGeneratingFlags` on anything at all. However, I don't remember if there was a good reason to have `PoisonGenerator->dropPoisonGeneratingFlags()` before `IV->dropPoisonGeneratingFlags()`. One (somewhat weak) reason to do this: when I do `IV->dropPoisonGeneratingFlags()` I only affect stuff within the loop[0] I'm trying to optimize but when I do `PoisonGenerator->dropPoisonGeneratingFlags()` I may affect larger scope of things. [0]: This is not really true (I affect stuff that uses the exit value of the IV, for one thing), which is why this line of reasoning is weak. sanjoy: I certainly want to first check if it is possible to get away without calling…
		PoisonGenerator->dropPoisonGeneratingFlags();
		return true;
		}

		return false;
		}

/// This method rewrites the exit condition of the loop to be a canonical !=		/// This method rewrites the exit condition of the loop to be a canonical !=
/// comparison against the incremented loop induction variable. This pass is		/// comparison against the incremented loop induction variable. This pass is
/// able to rewrite the exit tests of any loop where the SCEV analysis can		/// able to rewrite the exit tests of any loop where the SCEV analysis can
/// determine a loop-invariant trip count of the loop, which is actually a much		/// determine a loop-invariant trip count of the loop, which is actually a much
/// broader range than just linear tests.		/// broader range than just linear tests.
Value *IndVarSimplify::		Value *IndVarSimplify::
linearFunctionTestReplace(Loop *L,		linearFunctionTestReplace(Loop *L,
const SCEV *BackedgeTakenCount,		const SCEV *BackedgeTakenCount,
PHINode *IndVar,		PHINode *IndVar,
SCEVExpander &Rewriter) {		SCEVExpander &Rewriter) {
assert(canExpandBackedgeTakenCount(L, SE, Rewriter) && "precondition");		assert(canExpandBackedgeTakenCount(L, SE, Rewriter) && "precondition");

// Initialize CmpIndVar and IVCount to their preincremented values.		// Initialize CmpIndVar and IVCount to their preincremented values.
Value *CmpIndVar = IndVar;		Instruction *CmpIndVar = IndVar;
const SCEV *IVCount = BackedgeTakenCount;		const SCEV *IVCount = BackedgeTakenCount;

assert(L->getLoopLatch() && "Loop no longer in simplified form?");		assert(L->getLoopLatch() && "Loop no longer in simplified form?");

// If the exiting block is the same as the backedge block, we prefer to		// If the exiting block is the same as the backedge block, we prefer to
// compare against the post-incremented value, otherwise we must compare		// compare against the post-incremented value, otherwise we must compare
// against the preincremented value.		// against the preincremented value.
if (L->getExitingBlock() == L->getLoopLatch()) {		if (L->getExitingBlock() == L->getLoopLatch()) {
// Add one to the "backedge-taken" count to get the trip count.		// Add one to the "backedge-taken" count to get the trip count.
// This addition may overflow, which is valid as long as the comparison is		// This addition may overflow, which is valid as long as the comparison is
// truncated to BackedgeTakenCount->getType().		// truncated to BackedgeTakenCount->getType().
IVCount = SE->getAddExpr(BackedgeTakenCount,		IVCount = SE->getAddExpr(BackedgeTakenCount,
SE->getOne(BackedgeTakenCount->getType()));		SE->getOne(BackedgeTakenCount->getType()));
// The BackedgeTaken expression contains the number of times that the		// The BackedgeTaken expression contains the number of times that the
// backedge branches to the loop header. This is one less than the		// backedge branches to the loop header. This is one less than the
// number of times the loop executes, so use the incremented indvar.		// number of times the loop executes, so use the incremented indvar.
CmpIndVar = IndVar->getIncomingValueForBlock(L->getExitingBlock());		CmpIndVar = cast<Instruction>(
		IndVar->getIncomingValueForBlock(L->getExitingBlock()));
}		}

		// In general, we cannot introduce a branch on poison where there
		// wasn't one before. Try a variety of tricks to see if branching
		// on CmpIndVar can be made legal.
		if (!makeValueNonPoisonOnBackEdge(CmpIndVar, L))
		return nullptr;

Value *ExitCnt = genLoopLimit(IndVar, IVCount, L, Rewriter, SE);		Value *ExitCnt = genLoopLimit(IndVar, IVCount, L, Rewriter, SE);
assert(ExitCnt->getType()->isPointerTy() ==		assert(ExitCnt->getType()->isPointerTy() ==
IndVar->getType()->isPointerTy() &&		IndVar->getType()->isPointerTy() &&
"genLoopLimit missed a cast");		"genLoopLimit missed a cast");

// Insert a new icmp_ne or icmp_eq instruction before the branch.		// Insert a new icmp_ne or icmp_eq instruction before the branch.
BranchInst *BI = cast<BranchInst>(L->getExitingBlock()->getTerminator());		BranchInst *BI = cast<BranchInst>(L->getExitingBlock()->getTerminator());
ICmpInst::Predicate P;		ICmpInst::Predicate P;
▲ Show 20 Lines • Show All 68 Lines • ▼ Show 20 Lines	if (isa<SCEVConstant>(ARStart) && isa<SCEVConstant>(IVCount)) {
if (SExtTrunc == IV) {		if (SExtTrunc == IV) {
Extended = true;		Extended = true;
ExitCnt = Builder.CreateSExt(ExitCnt, IndVar->getType(),		ExitCnt = Builder.CreateSExt(ExitCnt, IndVar->getType(),
"wide.trip.count");		"wide.trip.count");
}		}
}		}

if (!Extended)		if (!Extended)
CmpIndVar = Builder.CreateTrunc(CmpIndVar, ExitCnt->getType(),		CmpIndVar = cast<Instruction>(
"lftr.wideiv");		Builder.CreateTrunc(CmpIndVar, ExitCnt->getType(), "lftr.wideiv"));
}		}
}		}
Value *Cond = Builder.CreateICmp(P, CmpIndVar, ExitCnt, "exitcond");		Value *Cond = Builder.CreateICmp(P, CmpIndVar, ExitCnt, "exitcond");
Value *OrigCond = BI->getCondition();		Value *OrigCond = BI->getCondition();
// It's tempting to use replaceAllUsesWith here to fully replace the old		// It's tempting to use replaceAllUsesWith here to fully replace the old
// comparison, but that's not immediately safe, since users of the old		// comparison, but that's not immediately safe, since users of the old
// comparison may not be dominated by the new comparison. Instead, just		// comparison may not be dominated by the new comparison. Instead, just
// update the branch to use the new comparison; in the common case this		// update the branch to use the new comparison; in the common case this
▲ Show 20 Lines • Show All 278 Lines • Show Last 20 Lines

test/Transforms/IndVarSimplify/PR31181.ll

This file was added.

				; RUN: opt -S -indvars < %s \| FileCheck %s

				target datalayout = "e-m:e-i64:64-f80:128-n8:16:32:64-S128"
				target triple = "x86_64-unknown-linux-gnu"

				define i32 @f_0(i32* %ptr) {
				; CHECK-LABEL: @f_0(
				; CHECK: for.cond:
				; CHECK: %iv = phi i32 [ -2, %entry ], [ %iv.inc, %for.cond ]
				; CHECK: %iv.inc = add i32 %iv, 1
				; CHECK: %exitcond = icmp ne i32 %iv.inc, 0
				; CHECK: br i1 %exitcond, label %for.cond, label %for.end
				;
				entry:
				br label %for.cond

				for.cond:
				%iv = phi i32 [ -2, %entry ], [ %iv.inc, %for.cond ]
				store i32 %iv, i32* %ptr
				%cmp = icmp slt i32 %iv, -1
				%iv.inc = add nuw nsw i32 %iv, 1
				br i1 %cmp, label %for.cond, label %for.end

				for.end:
				ret i32 0
				}

				define i32 @f_1(i32* %ptr, i1 %always_false) {
				; CHECK-LABEL: @f_1(
				; CHECK: always_taken:
				; CHECK: %iv.inc = add nsw i32 %iv, 1
				; CHECK: %iv2.inc = add i32 %iv2, 1
				; CHECK: %exitcond = icmp ne i32 %iv2.inc, -2147483627
				; CHECK: br i1 %exitcond, label %for.cond, label %for.end

				entry:
				br label %for.cond

				for.cond:
				%iv = phi i32 [ -2147483648, %entry ], [ %iv.inc, %always_taken ]
				%iv2 = phi i32 [ 0, %entry ], [ %iv2.inc, %always_taken ]
				store i32 %iv, i32* %ptr
				br i1 %always_false, label %never_taken, label %always_taken

				never_taken:
				store volatile i32 %iv2, i32* %ptr
				br label %always_taken

				always_taken:
				%iv.inc = add nsw i32 %iv, 1
				%iv2.inc = add nsw i32 %iv2, 1
				sanjoyAuthorUnsubmitted Not Done Reply Inline Actions This test case is wrong -- it the increment on `%iv.inc` should just be `nsw` and not `nuw`. sanjoy: This test case is wrong -- it the increment on `%iv.inc` should just be `nsw` and not `nuw`.
				%cmp = icmp slt i32 %iv, 20
				br i1 %cmp, label %for.cond, label %for.end

				for.end:
				ret i32 0
				}

				define i32 @f_2(i32* %ptr, i1 %always_false) {
				; CHECK-LABEL: @f_2(
				; CHECK: always_taken:
				; CHECK: %iv.inc = add nuw nsw i32 %iv, 1
				; CHECK: %exitcond = icmp ne i32 %iv.inc, 25
				; CHECK: br i1 %exitcond, label %for.cond, label %for.end

				entry:
				br label %for.cond

				for.cond:
				%iv = phi i32 [ 0, %entry ], [ %iv.inc, %always_taken ]
				store i32 %iv, i32* %ptr
				br i1 %always_false, label %never_taken, label %always_taken

				never_taken:
				store volatile i32 %iv, i32* %ptr
				br label %always_taken

				always_taken:
				%iv.inc = add nuw nsw i32 %iv, 1
				%iv.offset = add i32 %iv.inc, -5
				%cmp = icmp slt i32 %iv.offset, 20
				br i1 %cmp, label %for.cond, label %for.end

				for.end:
				ret i32 0
				}

				define i32 @f_3(i32* %ptr, i1 %always_false) {
				; CHECK-LABEL: @f_3(
				; CHECK: always_taken:
				; CHECK: %iv.inc = add i32 %iv, 1
				; CHECK: %exitcond = icmp ne i32 %iv.inc, 21
				; CHECK: br i1 %exitcond, label %for.cond, label %for.end

				entry:
				br label %for.cond

				for.cond:
				%iv = phi i32 [ 0, %entry ], [ %iv.inc, %always_taken ]
				store i32 %iv, i32* %ptr
				br i1 %always_false, label %never_taken, label %always_taken

				never_taken:
				store volatile i32 %iv, i32* %ptr
				br label %always_taken

				always_taken:
				%iv.inc = add nuw nsw i32 %iv, 1
				%cmp = icmp slt i32 %iv, 20
				br i1 %cmp, label %for.cond, label %for.end

				for.end:
				ret i32 0
				}

test/Transforms/IndVarSimplify/widen-nsw.ll

	; RUN: opt < %s -indvars -S \| FileCheck %s			; RUN: opt < %s -indvars -S \| FileCheck %s
	target datalayout = "e-p:64:64:64-i1:8:8-i8:8:8-i16:16:16-i32:32:32-i64:64:64-f32:32:32-f64:64:64-v64:64:64-v128:128:128-a0:0:64-s0:64:64-f80:128:128-n8:16:32:64-S128"			target datalayout = "e-p:64:64:64-i1:8:8-i8:8:8-i16:16:16-i32:32:32-i64:64:64-f32:32:32-f64:64:64-v64:64:64-v128:128:128-a0:0:64-s0:64:64-f80:128:128-n8:16:32:64-S128"
	target triple = "x86_64-apple-macosx"			target triple = "x86_64-apple-macosx"

	; CHECK-LABEL: @test1			; CHECK-LABEL: @test1
	; CHECK: %indvars.iv.next = add nuw nsw i64 %indvars.iv, 1			; CHECK: %indvars.iv.next = add i64 %indvars.iv, 1
	define i32 @test1(i32* %a) #0 {			define i32 @test1(i32* %a) #0 {
	entry:			entry:
	br label %for.cond			br label %for.cond

	for.cond: ; preds = %for.body, %entry			for.cond: ; preds = %for.body, %entry
	%sum.0 = phi i32 [ 0, %entry ], [ %add, %for.body ]			%sum.0 = phi i32 [ 0, %entry ], [ %add, %for.body ]
	%i.0 = phi i32 [ 0, %entry ], [ %inc, %for.body ]			%i.0 = phi i32 [ 0, %entry ], [ %inc, %for.body ]
	%cmp = icmp slt i32 %i.0, 1000			%cmp = icmp slt i32 %i.0, 1000
	Show All 15 Lines