This is an archive of the discontinued LLVM Phabricator instance.

Paths

Table of Contentst

-
llvm/trunk/
-
trunk/
-
include/llvm/Analysis/
-
llvm/
-
Analysis/
-
ScalarEvolution.h
-
lib/Analysis/
-
Analysis/
-
ScalarEvolution.cpp
-
unittests/Analysis/
-
Analysis/
-
ScalarEvolutionTest.cpp

Differential D30350

[LSR] Add a cap for reassociation of AllFixupsOutsideLoop type LSRUse to protect compile time
ClosedPublic

Authored by wmi on Feb 24 2017, 1:41 PM.

Download Raw Diff

Details

Reviewers

qcolombet
• dberlin
atrick
sanjoy
davide
hfinkel

Commits

rG8c4053372efb: [SCEV] Add a local cache for getZeroExtendExpr and getSignExtendExpr to prevent…
rL300494: [SCEV] Add a local cache for getZeroExtendExpr and getSignExtendExpr to prevent

Summary

In PR32043, we saw a testcase containing a AllFixupsOutsideLoop type LSRUse with huge SCEVAddExpr. LSRInstance::GenerateReassociations generates lots of new formula for the LSRUse because of the huge AddExpr, and causes compilation to hang.

Since AllFixupsOutsideLoop type LSRUses are outside of current loop, reassociation for them should have much less impact compared with that for normal LSRUses. The fix is to add a cap in reassociation if the LSRUse is of AllFixupsOutsideLoop type. I admit this is a workround. AllFixupsOutsideLoop LSRUse needs to be handled in a better way to reduce compile time and improve LSR results.

No test because I am not sure the potential hanging test from PR32043 is proper to be added.

Diff Detail

Repository: rL LLVM

Event Timeline

wmi created this revision.Feb 24 2017, 1:41 PM

Herald added a subscriber: mzolotukhin. · View Herald TranscriptFeb 24 2017, 1:41 PM

Probably this is OK as stopgap solution but I'm generally very wary of adding cutoffs to the code as they add technical debt etc..

davide added a reviewer: • dberlin.Feb 24 2017, 1:55 PM

davide added inline comments.Feb 25 2017, 6:04 PM

lib/Transforms/Scalar/LoopStrengthReduce.cpp
3429 ↗	(On Diff #89714)	How did you chose this cap, BTW? Maybe we should do some measurements and pick a less arbitrary value?

Also, I think the test from the PR (after some polishing, maybe) is good to add. If somebody removes these lines at least all (or a subset of) the bots will timeout and we'll notice the regression.

I don't like a hanging test to be killed after timeout. I will try to create another testcase by checking the LSR trace.

lib/Transforms/Scalar/LoopStrengthReduce.cpp
3429 ↗	(On Diff #89714)	On one side, for AddOps.size()==5, even without the workaround, the compile time will not bloat up too much, so I think it can cover the cases suffering compile time issue. On the other side, not many cases have very large AddOps numbers. I choose 5 which is not a too small number so it will not kick in frequently. Since current model handling AllFixupsOutsideLoop is very imprecise, I think it is better to just fix the compile time issue and don't change the existing code. I found no regressions using some internal benchmarks.

@qcolombet what do you think?

Add a testcase by checking the LSR debug output.

Fixed a typo in the testcase.

sanjoy added a subscriber: sanjoy.Mar 3 2017, 10:45 PM

sanjoy added inline comments.

lib/Transforms/Scalar/LoopStrengthReduce.cpp
3430 ↗	(On Diff #89939)	I'm not too familiar with LSR, but this looks pretty ad-hoc -- why can't the same compile-time blowup happen for an `LSRUse` with `AllFixupsOutsideLoop` = `false`?

Wei, any progress on this? We should either commit a workaround or revert the patchset to unblock folks.

davide added a reviewer: atrick.Mar 8 2017, 10:50 AM

LSR is fundamentally combinatorial. It relies on arbitrary pruning. I don't like it, but that's the way it is, and we need to guard against pathological cases. I just wonder why you complete bypass the reassociation code instead of simply limiting the number of expressions that participate.

Andy,

Thanks for the comment.

I just wonder why you complete bypass the reassociation code instead of simply limiting the number of expressions that participate.

Because I feel it brings less uncertainty about the LSR result. If I limit the number of expressions that participate, suppose there is a candidate formula matters for the final LSR result, whether it will be used depends on when it is seen and it brings some uncertainty to the LSR result.

Another reason is, for LSRUse with AllFixupsOutsideLoop = true, I don't feel reassociation for their formulas matters much for the performance, since the uses are all outside of loop. From my understanding, LSRUse with AllFixupsOutsideLoop = true shouldn't have equal weight as other LSRUse with fixups inside the loop. It is ok for a LSRFixup user outside of loop to use many registers. That is why I simply choose to bypass the reassociation. If it is for LSRUse with AllFixupsOutsideLoop = false, I feel I need to be more careful.

Thanks,
Wei.

@qcolombet ping

@qcolombet Ping?

Hi Wei,

Sorry for the delay, I thought Andy's comment was clear enough for you to explore other directions. So I didn't pay attention to the review.

Anyhow, at this point you would have guessed it, but I agree with Andy. I think this patch is a workaround for a problem we yet don't understand.

I wanted to provide a patch to demonstrate Andy's point and what I found when playing around is interesting. (Patch attached nonetheless:

experimental_lsr_reassoc.diff1 KBDownload

)

TL;DR I believe the problem is in the SCEVExpander code and not in LSR per se, I would suggest to recreate a profile using opt -loop-reduce and look where the time is spent.

Basically, I wanted to limit the number of reassociations per recursive call to 5. The max depth of the call is 3. The call happens for each base register (typically less than 3) of all formulae (typically a couple) for each LSRUse.
So, that gives us a total of: 5 * 3 * 3 * 2 = 90 reassociation formulae per LSRUse and given that they are typically 4 uses per loop, the grant total is less than 400, which does not sound too bad to explore.

However, with such limit, the compile time in the example from PR32042 was already blowing up (I killed it after 4 min).

So, I tried with a limit of 2 reassociations per call, i.e., about 36 formulae per LSRUse (to be exact the example generate 25 formulae), and still opt didn't finish after 4 min.

In the end, the only way to get a reasonable compile time was to not allow any reassociation. I started to become suspicious that the problem is not in the size of the search space and after doing a quick profile, I found that at least 80% of the compile time is spent materializing the solution in SCEVExpander. The problem seems to be related to the "nesting-level" of the expression of the formula that the solver picks.
Apply my patch and compare the output of lsr-num-reassoc=1 and lsr-num-reassoc=2 to see how the nesting level materialize.

Admittedly my profile may be bogus (my Instruments is quite old and the trace looked weird), but I would nonetheless recommend to look into creating a proper profile and investigating what is going on instead of bypassing the whole reassociation process.

That being said, I also understand that this patch blocks some people, so I would be okay for you to push it now as long as you commit to look at the actual problem in a reasonable time. In other words, if this patch is really needed to unblock people, go for it, but I want a different solution in a near future.

Cheers,
-Quentin

lib/Transforms/Scalar/LoopStrengthReduce.cpp
3430 ↗	(On Diff #89939)	I agree with Sanjoy that there is nothing preventing this for happening when AllFixupsOutsideLoop is false.
3429 ↗	(On Diff #89714)	What is the highest number you get from the LLVM test suite? Same for clang self-host?

@qcolombet, thanks for the analysis. It would be even better if we had a fix to the SCEVExpander's pathological behavior. Now I'm afraid this patch just hides the problem for this particular category of test case.

Quentin, it is really a good finding, thanks a lot! I was cheated by the large amount of reassociation candidates and I have verified the non-linear increase of compile time is indeed because of SCEVExpand!

I digged into it a little and found a problem in SCEV:
For an SCEVAddRecExpr like this:
{{{{{{{{{{{{2002,+,-1}<nsw><%bb5>,+,-1}<nsw><%bb23>,+,-1}<nsw><%bb41>,+,-1}<nsw><%bb59>,+,-1}<nsw><%bb77>,+,-1}<nsw><%bb95>,+,-1}<nsw><%bb113>,+,-1}<nsw><%bb131>,+,-1}<nsw><%bb149>,+,-1}<nsw><%bb167>,+,-1}<nsw><%bb185>,+,-1}<nw><%bb203>

When ScalarEvolution::getZeroExtendExpr is called for a SCEVAddRecExpr, it will try to prove the wrap flag by computing whether sext(start + step *iterations) == sext(start) + sext(step*iterations). It means getZeroExtendExpr will be called at least twice for each level. If the SCEVAddRecExpr above has N level of SCEVAddRecExpr, the complexity is O(2^N). However, if we have cache for wrap flag, many such getZeroExtendExpr calls can be saved.

If the result of wrap flag analysis for a SCEVAddRecExpr is FlagNSW/FlagNUW, it will be recorded. If the wrap analysis result is FlagAnyWrap, it will be computed again next time. A problem of current implementation is: FlagAnyWrap cannot tell us whether this is the first time to analyze the wrap flag of the SCEV.

I try a hack by adding a FlagMayWrap in SCEV::NoWrapFlags. When we have done analysis for a SCEVAddRecExpr and still have no idea whether it will wrap, we set it to be FlagMayWrap. Then we can skip wrap analysis for any SCEV with flag other than FlagAnyWrap. I find the hack can solve the compile time problem, but it[[

patch3 KBDownload

| name ]] definitely needs to be improved.

atrick added a reviewer: sanjoy.Mar 25 2017, 6:29 PM

It would be even better if we had a fix to the SCEVExpander's pathological behavior.

Completely agree! The patch was a mean to demonstrate the problem :).

Hi Wei,

If this indeed is a problem with ScalarEvolution::getZeroExtendExpr then you should be able to write a .ll file that triggers the exponential behavior on -analyze -scalar-evolution. Do you mind giving writing such a test case (as a first step) a shot?

I extended the test and now it took more than one hour on my sandybridge machine when built with clang in release mode.
I added early returns in getZeroExtendExpr/getSignExtendExpr for SCEVAddRecExpr with NW flag. Like the test shows, the compile explosion can only happen when the step of SCEVAddRecExpr is negative and NW flag can be marked. With the change, the test now takes less than one second.

Sanjoy, could you take a look?

Hi Wei,

I may be wrong, but I'm not convinced that your approach is correct (the reason is inline).

Is it possible to solve this by adding a cache that maps SCEV expressions to their zero (and sign) extended variants? We'd not want this cache to be permanent, since it can lock SCEV into a pessimistic state that it won't get out of even after (say) proving a value to be nuw. However, we can create one such cache in the lexical scope of the top level getZeroExtendExpr (and maybe split out a getZeroExtendImpl that takes a pointer to said cache).

lib/Analysis/ScalarEvolution.cpp
1790 ↗	(On Diff #94115)	I'm not sure if this is correct. You're saying that, e.g., `sext({A,+,B}<nw>` is `{sext A,+,zext B}`, but what if `A` is `INT_MAX`, `B` is `1`, and the backedge taken count count is 1?
unittests/Analysis/ScalarEvolutionTest.cpp
625 ↗	(On Diff #94115)	Did you consider generating this IR programmatically? That would be easier to modify.

This revision now requires changes to proceed.Apr 9 2017, 11:55 PM

Sanjoy, thanks for the comments.

Is it possible to solve this by adding a cache that maps SCEV expressions to their zero (and sign) extended variants? We'd not want this cache to be permanent, since it can lock SCEV into a pessimistic state that it won't get out of even after (say) proving a value to be nuw. However, we can create one such cache in the lexical scope of the top level getZeroExtendExpr (and maybe split out a getZeroExtendImpl that takes a pointer to said cache).

It sounds a good solution. I will try it if my simple fix is still not enough after adding the constraint.

lib/Analysis/ScalarEvolution.cpp
1790 ↗	(On Diff #94115)	Is adding one more constraint isKnownNegative(Step) enough?
unittests/Analysis/ScalarEvolutionTest.cpp
625 ↗	(On Diff #94115)	Ok, I will do it.

sanjoy added inline comments.Apr 10 2017, 2:17 PM

lib/Analysis/ScalarEvolution.cpp
1790 ↗	(On Diff #94115)	I don't think so (btw, the same issue applies to the getSignExtendExpr case as well): `{-1,0,-1}` is `<nw>` for a loop that runs (say) 10 times, but it unsigned overflows the first time the loop's backedge is taken.

Address Sanjoy's comments.

Implement local caches for getZeroExtendExpr and getSignExtendExpr.
Generate the IR programatically for the testcase.

sanjoy requested changes to this revision.Apr 12 2017, 2:15 PM

sanjoy added inline comments.

lib/Analysis/ScalarEvolution.cpp
1509 ↗	(On Diff #95000)	`DenseMap` is probably too heavyweight here -- can you change this to `SmallDenseMap<8>`?
2002 ↗	(On Diff #95000)	Can you create a lambda for this that use as `return InsertResult(SExt);` instead of `goto RET;`?

This revision now requires changes to proceed.Apr 12 2017, 2:15 PM

Address Sanjoy's comments: Use SmallDenseMap and lambda.

I think the code can be structured to make things more robust against future changes. PTAL and see if you agree:

lib/Analysis/ScalarEvolution.cpp
1520 ↗	(On Diff #95071)	s/use/Use/ s/func/function/
1547 ↗	(On Diff #95071)	No need to move this comment?
1734 ↗	(On Diff #95071)	Can you please make `SmallDenseMap<std::pair<const SCEV , Type >, const SCEV *, 8>` a typedef?
1739 ↗	(On Diff #95071)	s/use/Use/ s/func/function/ Also `CacheResult` may be a better name (sorry for not suggesting it before! -- hopefully this should be a quick find-replace). Also, what do you think about an RAII object (under `#ifndef NDEBUG`) that checks that `{Op, Ty}` is in the cache on destruction? I can imagine someone adding a new return path and forgetting to call `CacheResult`. Actually, a better solution would be to make the caching guarantee true by construction: `getZeroExtendExpr` creates a cache, and calls `getZeroExtendExprCached` with the cache. `getZeroExtendExprCached` calls `getZeroExtendExprImpl` to do the heavy lifting (which recursively calls `getZeroExtendExprCached` when needed), and also does the pre-check in the cache, and also the insertion into the cache (that way `getZeroExtendExprImpl` does not have to remember to insert into the cache at each return site).

This revision now requires changes to proceed.Apr 12 2017, 6:32 PM

Address Sanjoy's comments: Add another two proxy functions: getZeroExtendExprCached and getSignExtendExprCached.

lgtm with nits

include/llvm/Analysis/ScalarEvolution.h
1163 ↗	(On Diff #95151)	I think the usual naming scheme here is `ExtendCacheTy`?
lib/Analysis/ScalarEvolution.cpp
1522 ↗	(On Diff #95151)	If possible, you should auto InsertResult = Cache.insert({{Op, Ty}, ZExt}); assert(InsertResult.second); otherwise state that it is possible (how?) that the key is already present in the cache.

This revision is now accepted and ready to land.Apr 16 2017, 10:26 PM

Sanjoy, thanks for all the helpful comments.

include/llvm/Analysis/ScalarEvolution.h
1163 ↗	(On Diff #95151)	Fixed.
lib/Analysis/ScalarEvolution.cpp
1522 ↗	(On Diff #95151)	I added the assertion and at least unittest and llvm testsuite didn't trigger it.

Closed by commit rL300494: [SCEV] Add a local cache for getZeroExtendExpr and getSignExtendExpr to prevent (authored by wmi). · Explain WhyApr 17 2017, 1:52 PM

This revision was automatically updated to reflect the committed changes.

Revision Contents

Path

Size

llvm/

trunk/

include/

llvm/

Analysis/

ScalarEvolution.h

12 lines

lib/

Analysis/

ScalarEvolution.cpp

176 lines

unittests/

Analysis/

ScalarEvolutionTest.cpp

90 lines

Diff 95480

llvm/trunk/include/llvm/Analysis/ScalarEvolution.h

Show First 20 Lines • Show All 1,153 Lines • ▼ Show 20 Lines	public:
/// Return a SCEV expression for the full generality of the specified		/// Return a SCEV expression for the full generality of the specified
/// expression.		/// expression.
const SCEV getSCEV(Value V);		const SCEV getSCEV(Value V);

const SCEV getConstant(ConstantInt V);		const SCEV getConstant(ConstantInt V);
const SCEV *getConstant(const APInt &Val);		const SCEV *getConstant(const APInt &Val);
const SCEV getConstant(Type Ty, uint64_t V, bool isSigned = false);		const SCEV getConstant(Type Ty, uint64_t V, bool isSigned = false);
const SCEV getTruncateExpr(const SCEV Op, Type *Ty);		const SCEV getTruncateExpr(const SCEV Op, Type *Ty);

		typedef SmallDenseMap<std::pair<const SCEV , Type >, const SCEV *, 8>
		ExtendCacheTy;
const SCEV getZeroExtendExpr(const SCEV Op, Type *Ty);		const SCEV getZeroExtendExpr(const SCEV Op, Type *Ty);
		const SCEV getZeroExtendExprCached(const SCEV Op, Type *Ty,
		ExtendCacheTy &Cache);
		const SCEV getZeroExtendExprImpl(const SCEV Op, Type *Ty,
		ExtendCacheTy &Cache);

const SCEV getSignExtendExpr(const SCEV Op, Type *Ty);		const SCEV getSignExtendExpr(const SCEV Op, Type *Ty);
		const SCEV getSignExtendExprCached(const SCEV Op, Type *Ty,
		ExtendCacheTy &Cache);
		const SCEV getSignExtendExprImpl(const SCEV Op, Type *Ty,
		ExtendCacheTy &Cache);
const SCEV getAnyExtendExpr(const SCEV Op, Type *Ty);		const SCEV getAnyExtendExpr(const SCEV Op, Type *Ty);
const SCEV getAddExpr(SmallVectorImpl<const SCEV > &Ops,		const SCEV getAddExpr(SmallVectorImpl<const SCEV > &Ops,
SCEV::NoWrapFlags Flags = SCEV::FlagAnyWrap,		SCEV::NoWrapFlags Flags = SCEV::FlagAnyWrap,
unsigned Depth = 0);		unsigned Depth = 0);
const SCEV getAddExpr(const SCEV LHS, const SCEV *RHS,		const SCEV getAddExpr(const SCEV LHS, const SCEV *RHS,
SCEV::NoWrapFlags Flags = SCEV::FlagAnyWrap) {		SCEV::NoWrapFlags Flags = SCEV::FlagAnyWrap) {
SmallVector<const SCEV *, 2> Ops = {LHS, RHS};		SmallVector<const SCEV *, 2> Ops = {LHS, RHS};
return getAddExpr(Ops, Flags);		return getAddExpr(Ops, Flags);
▲ Show 20 Lines • Show All 618 Lines • Show Last 20 Lines

llvm/trunk/lib/Analysis/ScalarEvolution.cpp

This file is larger than 256 KB, so syntax highlighting is disabled by default.

Show First 20 Lines • Show All 1,270 Lines • ▼ Show 20 Lines	static const SCEV getUnsignedOverflowLimitForStep(const SCEV Step,

return SE->getConstant(APInt::getMinValue(BitWidth) -		return SE->getConstant(APInt::getMinValue(BitWidth) -
SE->getUnsignedRange(Step).getUnsignedMax());		SE->getUnsignedRange(Step).getUnsignedMax());
}		}

namespace {		namespace {

struct ExtendOpTraitsBase {		struct ExtendOpTraitsBase {
typedef const SCEV (ScalarEvolution::GetExtendExprTy)(const SCEV , Type );		typedef const SCEV (ScalarEvolution::GetExtendExprTy)(
		const SCEV , Type , ScalarEvolution::ExtendCacheTy &Cache);
};		};

// Used to make code generic over signed and unsigned overflow.		// Used to make code generic over signed and unsigned overflow.
template <typename ExtendOp> struct ExtendOpTraits {		template <typename ExtendOp> struct ExtendOpTraits {
// Members present:		// Members present:
//		//
// static const SCEV::NoWrapFlags WrapType;		// static const SCEV::NoWrapFlags WrapType;
//		//
Show All 12 Lines	struct ExtendOpTraits<SCEVSignExtendExpr> : public ExtendOpTraitsBase {

static const SCEV getOverflowLimitForStep(const SCEV Step,		static const SCEV getOverflowLimitForStep(const SCEV Step,
ICmpInst::Predicate *Pred,		ICmpInst::Predicate *Pred,
ScalarEvolution *SE) {		ScalarEvolution *SE) {
return getSignedOverflowLimitForStep(Step, Pred, SE);		return getSignedOverflowLimitForStep(Step, Pred, SE);
}		}
};		};

const ExtendOpTraitsBase::GetExtendExprTy ExtendOpTraits<		const ExtendOpTraitsBase::GetExtendExprTy
SCEVSignExtendExpr>::GetExtendExpr = &ScalarEvolution::getSignExtendExpr;		ExtendOpTraits<SCEVSignExtendExpr>::GetExtendExpr =
		&ScalarEvolution::getSignExtendExprCached;

template <>		template <>
struct ExtendOpTraits<SCEVZeroExtendExpr> : public ExtendOpTraitsBase {		struct ExtendOpTraits<SCEVZeroExtendExpr> : public ExtendOpTraitsBase {
static const SCEV::NoWrapFlags WrapType = SCEV::FlagNUW;		static const SCEV::NoWrapFlags WrapType = SCEV::FlagNUW;

static const GetExtendExprTy GetExtendExpr;		static const GetExtendExprTy GetExtendExpr;

static const SCEV getOverflowLimitForStep(const SCEV Step,		static const SCEV getOverflowLimitForStep(const SCEV Step,
ICmpInst::Predicate *Pred,		ICmpInst::Predicate *Pred,
ScalarEvolution *SE) {		ScalarEvolution *SE) {
return getUnsignedOverflowLimitForStep(Step, Pred, SE);		return getUnsignedOverflowLimitForStep(Step, Pred, SE);
}		}
};		};

const ExtendOpTraitsBase::GetExtendExprTy ExtendOpTraits<		const ExtendOpTraitsBase::GetExtendExprTy
SCEVZeroExtendExpr>::GetExtendExpr = &ScalarEvolution::getZeroExtendExpr;		ExtendOpTraits<SCEVZeroExtendExpr>::GetExtendExpr =
		&ScalarEvolution::getZeroExtendExprCached;
}		}

// The recurrence AR has been shown to have no signed/unsigned wrap or something		// The recurrence AR has been shown to have no signed/unsigned wrap or something
// close to it. Typically, if we can prove NSW/NUW for AR, then we can just as		// close to it. Typically, if we can prove NSW/NUW for AR, then we can just as
// easily prove NSW/NUW for its preincrement or postincrement sibling. This		// easily prove NSW/NUW for its preincrement or postincrement sibling. This
// allows normalizing a sign/zero extended AddRec as such: {sext/zext(Step +		// allows normalizing a sign/zero extended AddRec as such: {sext/zext(Step +
// Start),+,Step} => {(Step + sext/zext(Start),+,Step} As a result, the		// Start),+,Step} => {(Step + sext/zext(Start),+,Step} As a result, the
// expression "Step + sext/zext(PreIncAR)" is congruent with		// expression "Step + sext/zext(PreIncAR)" is congruent with
// "sext/zext(PostIncAR)"		// "sext/zext(PostIncAR)"
template <typename ExtendOpTy>		template <typename ExtendOpTy>
static const SCEV getPreStartForExtend(const SCEVAddRecExpr AR, Type *Ty,		static const SCEV getPreStartForExtend(const SCEVAddRecExpr AR, Type *Ty,
ScalarEvolution *SE) {		ScalarEvolution *SE,
		ScalarEvolution::ExtendCacheTy &Cache) {
auto WrapType = ExtendOpTraits<ExtendOpTy>::WrapType;		auto WrapType = ExtendOpTraits<ExtendOpTy>::WrapType;
auto GetExtendExpr = ExtendOpTraits<ExtendOpTy>::GetExtendExpr;		auto GetExtendExpr = ExtendOpTraits<ExtendOpTy>::GetExtendExpr;

const Loop *L = AR->getLoop();		const Loop *L = AR->getLoop();
const SCEV *Start = AR->getStart();		const SCEV *Start = AR->getStart();
const SCEV Step = AR->getStepRecurrence(SE);		const SCEV Step = AR->getStepRecurrence(SE);

// Check for a simple looking step prior to loop entry.		// Check for a simple looking step prior to loop entry.
Show All 30 Lines	static const SCEV getPreStartForExtend(const SCEVAddRecExpr AR, Type *Ty,
if (PreAR && PreAR->getNoWrapFlags(WrapType) &&		if (PreAR && PreAR->getNoWrapFlags(WrapType) &&
!isa<SCEVCouldNotCompute>(BECount) && SE->isKnownPositive(BECount))		!isa<SCEVCouldNotCompute>(BECount) && SE->isKnownPositive(BECount))
return PreStart;		return PreStart;

// 2. Direct overflow check on the step operation's expression.		// 2. Direct overflow check on the step operation's expression.
unsigned BitWidth = SE->getTypeSizeInBits(AR->getType());		unsigned BitWidth = SE->getTypeSizeInBits(AR->getType());
Type WideTy = IntegerType::get(SE->getContext(), BitWidth 2);		Type WideTy = IntegerType::get(SE->getContext(), BitWidth 2);
const SCEV *OperandExtendedStart =		const SCEV *OperandExtendedStart =
SE->getAddExpr((SE->*GetExtendExpr)(PreStart, WideTy),		SE->getAddExpr((SE->*GetExtendExpr)(PreStart, WideTy, Cache),
(SE->*GetExtendExpr)(Step, WideTy));		(SE->*GetExtendExpr)(Step, WideTy, Cache));
if ((SE->*GetExtendExpr)(Start, WideTy) == OperandExtendedStart) {		if ((SE->*GetExtendExpr)(Start, WideTy, Cache) == OperandExtendedStart) {
if (PreAR && AR->getNoWrapFlags(WrapType)) {		if (PreAR && AR->getNoWrapFlags(WrapType)) {
// If we know `AR` == {`PreStart`+`Step`,+,`Step`} is `WrapType` (FlagNSW		// If we know `AR` == {`PreStart`+`Step`,+,`Step`} is `WrapType` (FlagNSW
// or FlagNUW) and that `PreStart` + `Step` is `WrapType` too, then		// or FlagNUW) and that `PreStart` + `Step` is `WrapType` too, then
// `PreAR` == {`PreStart`,+,`Step`} is also `WrapType`. Cache this fact.		// `PreAR` == {`PreStart`,+,`Step`} is also `WrapType`. Cache this fact.
const_cast<SCEVAddRecExpr *>(PreAR)->setNoWrapFlags(WrapType);		const_cast<SCEVAddRecExpr *>(PreAR)->setNoWrapFlags(WrapType);
}		}
return PreStart;		return PreStart;
}		}

// 3. Loop precondition.		// 3. Loop precondition.
ICmpInst::Predicate Pred;		ICmpInst::Predicate Pred;
const SCEV *OverflowLimit =		const SCEV *OverflowLimit =
ExtendOpTraits<ExtendOpTy>::getOverflowLimitForStep(Step, &Pred, SE);		ExtendOpTraits<ExtendOpTy>::getOverflowLimitForStep(Step, &Pred, SE);

if (OverflowLimit &&		if (OverflowLimit &&
SE->isLoopEntryGuardedByCond(L, Pred, PreStart, OverflowLimit))		SE->isLoopEntryGuardedByCond(L, Pred, PreStart, OverflowLimit))
return PreStart;		return PreStart;

return nullptr;		return nullptr;
}		}

// Get the normalized zero or sign extended expression for this AddRec's Start.		// Get the normalized zero or sign extended expression for this AddRec's Start.
template <typename ExtendOpTy>		template <typename ExtendOpTy>
static const SCEV getExtendAddRecStart(const SCEVAddRecExpr AR, Type *Ty,		static const SCEV getExtendAddRecStart(const SCEVAddRecExpr AR, Type *Ty,
ScalarEvolution *SE) {		ScalarEvolution *SE,
		ScalarEvolution::ExtendCacheTy &Cache) {
auto GetExtendExpr = ExtendOpTraits<ExtendOpTy>::GetExtendExpr;		auto GetExtendExpr = ExtendOpTraits<ExtendOpTy>::GetExtendExpr;

const SCEV *PreStart = getPreStartForExtend<ExtendOpTy>(AR, Ty, SE);		const SCEV *PreStart = getPreStartForExtend<ExtendOpTy>(AR, Ty, SE, Cache);
if (!PreStart)		if (!PreStart)
return (SE->*GetExtendExpr)(AR->getStart(), Ty);		return (SE->*GetExtendExpr)(AR->getStart(), Ty, Cache);

return SE->getAddExpr((SE->GetExtendExpr)(AR->getStepRecurrence(SE), Ty),		return SE->getAddExpr(
(SE->*GetExtendExpr)(PreStart, Ty));		(SE->GetExtendExpr)(AR->getStepRecurrence(SE), Ty, Cache),
		(SE->*GetExtendExpr)(PreStart, Ty, Cache));
}		}

// Try to prove away overflow by looking at "nearby" add recurrences. A		// Try to prove away overflow by looking at "nearby" add recurrences. A
// motivating example for this rule: if we know `{0,+,4}` is `ult` `-1` and it		// motivating example for this rule: if we know `{0,+,4}` is `ult` `-1` and it
// does not itself wrap then we can conclude that `{1,+,4}` is `nuw`.		// does not itself wrap then we can conclude that `{1,+,4}` is `nuw`.
//		//
// Formally:		// Formally:
//		//
▲ Show 20 Lines • Show All 63 Lines • ▼ Show 20 Lines	if (PreAR && PreAR->getNoWrapFlags(WrapType)) { // proves (2)
if (Limit && isKnownPredicate(Pred, PreAR, Limit)) // proves (1)		if (Limit && isKnownPredicate(Pred, PreAR, Limit)) // proves (1)
return true;		return true;
}		}
}		}

return false;		return false;
}		}

const SCEV ScalarEvolution::getZeroExtendExpr(const SCEV Op,		const SCEV ScalarEvolution::getZeroExtendExpr(const SCEV Op, Type *Ty) {
Type *Ty) {		// Use the local cache to prevent exponential behavior of
		// getZeroExtendExprImpl.
		ExtendCacheTy Cache;
		return getZeroExtendExprCached(Op, Ty, Cache);
		}

		/// Query \p Cache before calling getZeroExtendExprImpl. If there is no
		/// related entry in the \p Cache, call getZeroExtendExprImpl and save
		/// the result in the \p Cache.
		const SCEV ScalarEvolution::getZeroExtendExprCached(const SCEV Op, Type *Ty,
		ExtendCacheTy &Cache) {
		auto It = Cache.find({Op, Ty});
		if (It != Cache.end())
		return It->second;
		const SCEV *ZExt = getZeroExtendExprImpl(Op, Ty, Cache);
		auto InsertResult = Cache.insert({{Op, Ty}, ZExt});
		assert(InsertResult.second && "Expect the key was not in the cache");
		return ZExt;
		}

		/// The real implementation of getZeroExtendExpr.
		const SCEV ScalarEvolution::getZeroExtendExprImpl(const SCEV Op, Type *Ty,
		ExtendCacheTy &Cache) {
assert(getTypeSizeInBits(Op->getType()) < getTypeSizeInBits(Ty) &&		assert(getTypeSizeInBits(Op->getType()) < getTypeSizeInBits(Ty) &&
"This is not an extending conversion!");		"This is not an extending conversion!");
assert(isSCEVable(Ty) &&		assert(isSCEVable(Ty) &&
"This is not a conversion to a SCEVable type!");		"This is not a conversion to a SCEVable type!");
Ty = getEffectiveSCEVType(Ty);		Ty = getEffectiveSCEVType(Ty);

// Fold if the operand is constant.		// Fold if the operand is constant.
if (const SCEVConstant *SC = dyn_cast<SCEVConstant>(Op))		if (const SCEVConstant *SC = dyn_cast<SCEVConstant>(Op))
return getConstant(		return getConstant(
cast<ConstantInt>(ConstantExpr::getZExt(SC->getValue(), Ty)));		cast<ConstantInt>(ConstantExpr::getZExt(SC->getValue(), Ty)));

// zext(zext(x)) --> zext(x)		// zext(zext(x)) --> zext(x)
if (const SCEVZeroExtendExpr *SZ = dyn_cast<SCEVZeroExtendExpr>(Op))		if (const SCEVZeroExtendExpr *SZ = dyn_cast<SCEVZeroExtendExpr>(Op))
return getZeroExtendExpr(SZ->getOperand(), Ty);		return getZeroExtendExprCached(SZ->getOperand(), Ty, Cache);

// Before doing any expensive analysis, check to see if we've already		// Before doing any expensive analysis, check to see if we've already
// computed a SCEV for this Op and Ty.		// computed a SCEV for this Op and Ty.
FoldingSetNodeID ID;		FoldingSetNodeID ID;
ID.AddInteger(scZeroExtend);		ID.AddInteger(scZeroExtend);
ID.AddPointer(Op);		ID.AddPointer(Op);
ID.AddPointer(Ty);		ID.AddPointer(Ty);
void *IP = nullptr;		void *IP = nullptr;
Show All 27 Lines	if (AR->isAffine()) {
auto NewFlags = proveNoWrapViaConstantRanges(AR);		auto NewFlags = proveNoWrapViaConstantRanges(AR);
const_cast<SCEVAddRecExpr *>(AR)->setNoWrapFlags(NewFlags);		const_cast<SCEVAddRecExpr *>(AR)->setNoWrapFlags(NewFlags);
}		}

// If we have special knowledge that this addrec won't overflow,		// If we have special knowledge that this addrec won't overflow,
// we don't need to do any further analysis.		// we don't need to do any further analysis.
if (AR->hasNoUnsignedWrap())		if (AR->hasNoUnsignedWrap())
return getAddRecExpr(		return getAddRecExpr(
getExtendAddRecStart<SCEVZeroExtendExpr>(AR, Ty, this),		getExtendAddRecStart<SCEVZeroExtendExpr>(AR, Ty, this, Cache),
getZeroExtendExpr(Step, Ty), L, AR->getNoWrapFlags());		getZeroExtendExprCached(Step, Ty, Cache), L, AR->getNoWrapFlags());

// Check whether the backedge-taken count is SCEVCouldNotCompute.		// Check whether the backedge-taken count is SCEVCouldNotCompute.
// Note that this serves two purposes: It filters out loops that are		// Note that this serves two purposes: It filters out loops that are
// simply not analyzable, and it covers the case where this code is		// simply not analyzable, and it covers the case where this code is
// being called from within backedge-taken count analysis, such that		// being called from within backedge-taken count analysis, such that
// attempting to ask for the backedge-taken count would likely result		// attempting to ask for the backedge-taken count would likely result
// in infinite recursion. In the later case, the analysis code will		// in infinite recursion. In the later case, the analysis code will
// cope with a conservative value, and it will take care to purge		// cope with a conservative value, and it will take care to purge
// that value once it has finished.		// that value once it has finished.
const SCEV *MaxBECount = getMaxBackedgeTakenCount(L);		const SCEV *MaxBECount = getMaxBackedgeTakenCount(L);
if (!isa<SCEVCouldNotCompute>(MaxBECount)) {		if (!isa<SCEVCouldNotCompute>(MaxBECount)) {
// Manually compute the final value for AR, checking for		// Manually compute the final value for AR, checking for
// overflow.		// overflow.

// Check whether the backedge-taken count can be losslessly casted to		// Check whether the backedge-taken count can be losslessly casted to
// the addrec's type. The count is always unsigned.		// the addrec's type. The count is always unsigned.
const SCEV *CastedMaxBECount =		const SCEV *CastedMaxBECount =
getTruncateOrZeroExtend(MaxBECount, Start->getType());		getTruncateOrZeroExtend(MaxBECount, Start->getType());
const SCEV *RecastedMaxBECount =		const SCEV *RecastedMaxBECount =
getTruncateOrZeroExtend(CastedMaxBECount, MaxBECount->getType());		getTruncateOrZeroExtend(CastedMaxBECount, MaxBECount->getType());
if (MaxBECount == RecastedMaxBECount) {		if (MaxBECount == RecastedMaxBECount) {
Type WideTy = IntegerType::get(getContext(), BitWidth 2);		Type WideTy = IntegerType::get(getContext(), BitWidth 2);
// Check whether Start+Step*MaxBECount has no unsigned overflow.		// Check whether Start+Step*MaxBECount has no unsigned overflow.
const SCEV *ZMul = getMulExpr(CastedMaxBECount, Step);		const SCEV *ZMul = getMulExpr(CastedMaxBECount, Step);
const SCEV *ZAdd = getZeroExtendExpr(getAddExpr(Start, ZMul), WideTy);		const SCEV *ZAdd =
const SCEV *WideStart = getZeroExtendExpr(Start, WideTy);		getZeroExtendExprCached(getAddExpr(Start, ZMul), WideTy, Cache);
		const SCEV *WideStart = getZeroExtendExprCached(Start, WideTy, Cache);
const SCEV *WideMaxBECount =		const SCEV *WideMaxBECount =
getZeroExtendExpr(CastedMaxBECount, WideTy);		getZeroExtendExprCached(CastedMaxBECount, WideTy, Cache);
const SCEV *OperandExtendedAdd =		const SCEV *OperandExtendedAdd = getAddExpr(
getAddExpr(WideStart,		WideStart, getMulExpr(WideMaxBECount, getZeroExtendExprCached(
getMulExpr(WideMaxBECount,		Step, WideTy, Cache)));
getZeroExtendExpr(Step, WideTy)));
if (ZAdd == OperandExtendedAdd) {		if (ZAdd == OperandExtendedAdd) {
// Cache knowledge of AR NUW, which is propagated to this AddRec.		// Cache knowledge of AR NUW, which is propagated to this AddRec.
const_cast<SCEVAddRecExpr *>(AR)->setNoWrapFlags(SCEV::FlagNUW);		const_cast<SCEVAddRecExpr *>(AR)->setNoWrapFlags(SCEV::FlagNUW);
// Return the expression with the addrec on the outside.		// Return the expression with the addrec on the outside.
return getAddRecExpr(		return getAddRecExpr(
getExtendAddRecStart<SCEVZeroExtendExpr>(AR, Ty, this),		getExtendAddRecStart<SCEVZeroExtendExpr>(AR, Ty, this, Cache),
getZeroExtendExpr(Step, Ty), L, AR->getNoWrapFlags());		getZeroExtendExprCached(Step, Ty, Cache), L,
		AR->getNoWrapFlags());
}		}
// Similar to above, only this time treat the step value as signed.		// Similar to above, only this time treat the step value as signed.
// This covers loops that count down.		// This covers loops that count down.
OperandExtendedAdd =		OperandExtendedAdd =
getAddExpr(WideStart,		getAddExpr(WideStart,
getMulExpr(WideMaxBECount,		getMulExpr(WideMaxBECount,
getSignExtendExpr(Step, WideTy)));		getSignExtendExpr(Step, WideTy)));
if (ZAdd == OperandExtendedAdd) {		if (ZAdd == OperandExtendedAdd) {
// Cache knowledge of AR NW, which is propagated to this AddRec.		// Cache knowledge of AR NW, which is propagated to this AddRec.
// Negative step causes unsigned wrap, but it still can't self-wrap.		// Negative step causes unsigned wrap, but it still can't self-wrap.
const_cast<SCEVAddRecExpr *>(AR)->setNoWrapFlags(SCEV::FlagNW);		const_cast<SCEVAddRecExpr *>(AR)->setNoWrapFlags(SCEV::FlagNW);
// Return the expression with the addrec on the outside.		// Return the expression with the addrec on the outside.
return getAddRecExpr(		return getAddRecExpr(
getExtendAddRecStart<SCEVZeroExtendExpr>(AR, Ty, this),		getExtendAddRecStart<SCEVZeroExtendExpr>(AR, Ty, this, Cache),
getSignExtendExpr(Step, Ty), L, AR->getNoWrapFlags());		getSignExtendExpr(Step, Ty), L, AR->getNoWrapFlags());
}		}
}		}
}		}

// Normally, in the cases we can prove no-overflow via a		// Normally, in the cases we can prove no-overflow via a
// backedge guarding condition, we can also compute a backedge		// backedge guarding condition, we can also compute a backedge
// taken count for the loop. The exceptions are assumptions and		// taken count for the loop. The exceptions are assumptions and
Show All 15 Lines	if (AR->isAffine()) {
(isLoopEntryGuardedByCond(L, ICmpInst::ICMP_ULT, Start, N) &&		(isLoopEntryGuardedByCond(L, ICmpInst::ICMP_ULT, Start, N) &&
isLoopBackedgeGuardedByCond(L, ICmpInst::ICMP_ULT,		isLoopBackedgeGuardedByCond(L, ICmpInst::ICMP_ULT,
AR->getPostIncExpr(*this), N))) {		AR->getPostIncExpr(*this), N))) {
// Cache knowledge of AR NUW, which is propagated to this		// Cache knowledge of AR NUW, which is propagated to this
// AddRec.		// AddRec.
const_cast<SCEVAddRecExpr *>(AR)->setNoWrapFlags(SCEV::FlagNUW);		const_cast<SCEVAddRecExpr *>(AR)->setNoWrapFlags(SCEV::FlagNUW);
// Return the expression with the addrec on the outside.		// Return the expression with the addrec on the outside.
return getAddRecExpr(		return getAddRecExpr(
getExtendAddRecStart<SCEVZeroExtendExpr>(AR, Ty, this),		getExtendAddRecStart<SCEVZeroExtendExpr>(AR, Ty, this, Cache),
getZeroExtendExpr(Step, Ty), L, AR->getNoWrapFlags());		getZeroExtendExprCached(Step, Ty, Cache), L,
		AR->getNoWrapFlags());
}		}
} else if (isKnownNegative(Step)) {		} else if (isKnownNegative(Step)) {
const SCEV *N = getConstant(APInt::getMaxValue(BitWidth) -		const SCEV *N = getConstant(APInt::getMaxValue(BitWidth) -
getSignedRange(Step).getSignedMin());		getSignedRange(Step).getSignedMin());
if (isLoopBackedgeGuardedByCond(L, ICmpInst::ICMP_UGT, AR, N) \|\|		if (isLoopBackedgeGuardedByCond(L, ICmpInst::ICMP_UGT, AR, N) \|\|
(isLoopEntryGuardedByCond(L, ICmpInst::ICMP_UGT, Start, N) &&		(isLoopEntryGuardedByCond(L, ICmpInst::ICMP_UGT, Start, N) &&
isLoopBackedgeGuardedByCond(L, ICmpInst::ICMP_UGT,		isLoopBackedgeGuardedByCond(L, ICmpInst::ICMP_UGT,
AR->getPostIncExpr(*this), N))) {		AR->getPostIncExpr(*this), N))) {
// Cache knowledge of AR NW, which is propagated to this		// Cache knowledge of AR NW, which is propagated to this
// AddRec. Negative step causes unsigned wrap, but it		// AddRec. Negative step causes unsigned wrap, but it
// still can't self-wrap.		// still can't self-wrap.
const_cast<SCEVAddRecExpr *>(AR)->setNoWrapFlags(SCEV::FlagNW);		const_cast<SCEVAddRecExpr *>(AR)->setNoWrapFlags(SCEV::FlagNW);
// Return the expression with the addrec on the outside.		// Return the expression with the addrec on the outside.
return getAddRecExpr(		return getAddRecExpr(
getExtendAddRecStart<SCEVZeroExtendExpr>(AR, Ty, this),		getExtendAddRecStart<SCEVZeroExtendExpr>(AR, Ty, this, Cache),
getSignExtendExpr(Step, Ty), L, AR->getNoWrapFlags());		getSignExtendExpr(Step, Ty), L, AR->getNoWrapFlags());
}		}
}		}
}		}

if (proveNoWrapByVaryingStart<SCEVZeroExtendExpr>(Start, Step, L)) {		if (proveNoWrapByVaryingStart<SCEVZeroExtendExpr>(Start, Step, L)) {
const_cast<SCEVAddRecExpr *>(AR)->setNoWrapFlags(SCEV::FlagNUW);		const_cast<SCEVAddRecExpr *>(AR)->setNoWrapFlags(SCEV::FlagNUW);
return getAddRecExpr(		return getAddRecExpr(
getExtendAddRecStart<SCEVZeroExtendExpr>(AR, Ty, this),		getExtendAddRecStart<SCEVZeroExtendExpr>(AR, Ty, this, Cache),
getZeroExtendExpr(Step, Ty), L, AR->getNoWrapFlags());		getZeroExtendExprCached(Step, Ty, Cache), L, AR->getNoWrapFlags());
}		}
}		}

if (auto *SA = dyn_cast<SCEVAddExpr>(Op)) {		if (auto *SA = dyn_cast<SCEVAddExpr>(Op)) {
// zext((A + B + ...)<nuw>) --> (zext(A) + zext(B) + ...)<nuw>		// zext((A + B + ...)<nuw>) --> (zext(A) + zext(B) + ...)<nuw>
if (SA->hasNoUnsignedWrap()) {		if (SA->hasNoUnsignedWrap()) {
// If the addition does not unsign overflow then we can, by definition,		// If the addition does not unsign overflow then we can, by definition,
// commute the zero extension with the addition operation.		// commute the zero extension with the addition operation.
SmallVector<const SCEV *, 4> Ops;		SmallVector<const SCEV *, 4> Ops;
for (const auto *Op : SA->operands())		for (const auto *Op : SA->operands())
Ops.push_back(getZeroExtendExpr(Op, Ty));		Ops.push_back(getZeroExtendExprCached(Op, Ty, Cache));
return getAddExpr(Ops, SCEV::FlagNUW);		return getAddExpr(Ops, SCEV::FlagNUW);
}		}
}		}

// The cast wasn't folded; create an explicit cast node.		// The cast wasn't folded; create an explicit cast node.
// Recompute the insert position, as it may have been invalidated.		// Recompute the insert position, as it may have been invalidated.
if (const SCEV *S = UniqueSCEVs.FindNodeOrInsertPos(ID, IP)) return S;		if (const SCEV *S = UniqueSCEVs.FindNodeOrInsertPos(ID, IP)) return S;
SCEV *S = new (SCEVAllocator) SCEVZeroExtendExpr(ID.Intern(SCEVAllocator),		SCEV *S = new (SCEVAllocator) SCEVZeroExtendExpr(ID.Intern(SCEVAllocator),
Op, Ty);		Op, Ty);
UniqueSCEVs.InsertNode(S, IP);		UniqueSCEVs.InsertNode(S, IP);
return S;		return S;
}		}

const SCEV ScalarEvolution::getSignExtendExpr(const SCEV Op,		const SCEV ScalarEvolution::getSignExtendExpr(const SCEV Op, Type *Ty) {
Type *Ty) {		// Use the local cache to prevent exponential behavior of
		// getSignExtendExprImpl.
		ExtendCacheTy Cache;
		return getSignExtendExprCached(Op, Ty, Cache);
		}

		/// Query \p Cache before calling getSignExtendExprImpl. If there is no
		/// related entry in the \p Cache, call getSignExtendExprImpl and save
		/// the result in the \p Cache.
		const SCEV ScalarEvolution::getSignExtendExprCached(const SCEV Op, Type *Ty,
		ExtendCacheTy &Cache) {
		auto It = Cache.find({Op, Ty});
		if (It != Cache.end())
		return It->second;
		const SCEV *SExt = getSignExtendExprImpl(Op, Ty, Cache);
		auto InsertResult = Cache.insert({{Op, Ty}, SExt});
		assert(InsertResult.second && "Expect the key was not in the cache");
		return SExt;
		}

		/// The real implementation of getSignExtendExpr.
		const SCEV ScalarEvolution::getSignExtendExprImpl(const SCEV Op, Type *Ty,
		ExtendCacheTy &Cache) {
assert(getTypeSizeInBits(Op->getType()) < getTypeSizeInBits(Ty) &&		assert(getTypeSizeInBits(Op->getType()) < getTypeSizeInBits(Ty) &&
"This is not an extending conversion!");		"This is not an extending conversion!");
assert(isSCEVable(Ty) &&		assert(isSCEVable(Ty) &&
"This is not a conversion to a SCEVable type!");		"This is not a conversion to a SCEVable type!");
Ty = getEffectiveSCEVType(Ty);		Ty = getEffectiveSCEVType(Ty);

// Fold if the operand is constant.		// Fold if the operand is constant.
if (const SCEVConstant *SC = dyn_cast<SCEVConstant>(Op))		if (const SCEVConstant *SC = dyn_cast<SCEVConstant>(Op))
return getConstant(		return getConstant(
cast<ConstantInt>(ConstantExpr::getSExt(SC->getValue(), Ty)));		cast<ConstantInt>(ConstantExpr::getSExt(SC->getValue(), Ty)));

// sext(sext(x)) --> sext(x)		// sext(sext(x)) --> sext(x)
if (const SCEVSignExtendExpr *SS = dyn_cast<SCEVSignExtendExpr>(Op))		if (const SCEVSignExtendExpr *SS = dyn_cast<SCEVSignExtendExpr>(Op))
return getSignExtendExpr(SS->getOperand(), Ty);		return getSignExtendExprCached(SS->getOperand(), Ty, Cache);

// sext(zext(x)) --> zext(x)		// sext(zext(x)) --> zext(x)
if (const SCEVZeroExtendExpr *SZ = dyn_cast<SCEVZeroExtendExpr>(Op))		if (const SCEVZeroExtendExpr *SZ = dyn_cast<SCEVZeroExtendExpr>(Op))
return getZeroExtendExpr(SZ->getOperand(), Ty);		return getZeroExtendExpr(SZ->getOperand(), Ty);

// Before doing any expensive analysis, check to see if we've already		// Before doing any expensive analysis, check to see if we've already
// computed a SCEV for this Op and Ty.		// computed a SCEV for this Op and Ty.
FoldingSetNodeID ID;		FoldingSetNodeID ID;
Show All 22 Lines	if (SA->getNumOperands() == 2) {
auto *SC1 = dyn_cast<SCEVConstant>(SA->getOperand(0));		auto *SC1 = dyn_cast<SCEVConstant>(SA->getOperand(0));
auto *SMul = dyn_cast<SCEVMulExpr>(SA->getOperand(1));		auto *SMul = dyn_cast<SCEVMulExpr>(SA->getOperand(1));
if (SMul && SC1) {		if (SMul && SC1) {
if (auto *SC2 = dyn_cast<SCEVConstant>(SMul->getOperand(0))) {		if (auto *SC2 = dyn_cast<SCEVConstant>(SMul->getOperand(0))) {
const APInt &C1 = SC1->getAPInt();		const APInt &C1 = SC1->getAPInt();
const APInt &C2 = SC2->getAPInt();		const APInt &C2 = SC2->getAPInt();
if (C1.isStrictlyPositive() && C2.isStrictlyPositive() &&		if (C1.isStrictlyPositive() && C2.isStrictlyPositive() &&
C2.ugt(C1) && C2.isPowerOf2())		C2.ugt(C1) && C2.isPowerOf2())
return getAddExpr(getSignExtendExpr(SC1, Ty),		return getAddExpr(getSignExtendExprCached(SC1, Ty, Cache),
getSignExtendExpr(SMul, Ty));		getSignExtendExprCached(SMul, Ty, Cache));
}		}
}		}
}		}

// sext((A + B + ...)<nsw>) --> (sext(A) + sext(B) + ...)<nsw>		// sext((A + B + ...)<nsw>) --> (sext(A) + sext(B) + ...)<nsw>
if (SA->hasNoSignedWrap()) {		if (SA->hasNoSignedWrap()) {
// If the addition does not sign overflow then we can, by definition,		// If the addition does not sign overflow then we can, by definition,
// commute the sign extension with the addition operation.		// commute the sign extension with the addition operation.
SmallVector<const SCEV *, 4> Ops;		SmallVector<const SCEV *, 4> Ops;
for (const auto *Op : SA->operands())		for (const auto *Op : SA->operands())
Ops.push_back(getSignExtendExpr(Op, Ty));		Ops.push_back(getSignExtendExprCached(Op, Ty, Cache));
return getAddExpr(Ops, SCEV::FlagNSW);		return getAddExpr(Ops, SCEV::FlagNSW);
}		}
}		}
// If the input value is a chrec scev, and we can prove that the value		// If the input value is a chrec scev, and we can prove that the value
// did not overflow the old, smaller, value, we can sign extend all of the		// did not overflow the old, smaller, value, we can sign extend all of the
// operands (often constants). This allows analysis of something like		// operands (often constants). This allows analysis of something like
// this: for (signed char X = 0; X < 100; ++X) { int Y = X; }		// this: for (signed char X = 0; X < 100; ++X) { int Y = X; }
if (const SCEVAddRecExpr *AR = dyn_cast<SCEVAddRecExpr>(Op))		if (const SCEVAddRecExpr *AR = dyn_cast<SCEVAddRecExpr>(Op))
if (AR->isAffine()) {		if (AR->isAffine()) {
const SCEV *Start = AR->getStart();		const SCEV *Start = AR->getStart();
const SCEV Step = AR->getStepRecurrence(this);		const SCEV Step = AR->getStepRecurrence(this);
unsigned BitWidth = getTypeSizeInBits(AR->getType());		unsigned BitWidth = getTypeSizeInBits(AR->getType());
const Loop *L = AR->getLoop();		const Loop *L = AR->getLoop();

if (!AR->hasNoSignedWrap()) {		if (!AR->hasNoSignedWrap()) {
auto NewFlags = proveNoWrapViaConstantRanges(AR);		auto NewFlags = proveNoWrapViaConstantRanges(AR);
const_cast<SCEVAddRecExpr *>(AR)->setNoWrapFlags(NewFlags);		const_cast<SCEVAddRecExpr *>(AR)->setNoWrapFlags(NewFlags);
}		}

// If we have special knowledge that this addrec won't overflow,		// If we have special knowledge that this addrec won't overflow,
// we don't need to do any further analysis.		// we don't need to do any further analysis.
if (AR->hasNoSignedWrap())		if (AR->hasNoSignedWrap())
return getAddRecExpr(		return getAddRecExpr(
getExtendAddRecStart<SCEVSignExtendExpr>(AR, Ty, this),		getExtendAddRecStart<SCEVSignExtendExpr>(AR, Ty, this, Cache),
getSignExtendExpr(Step, Ty), L, SCEV::FlagNSW);		getSignExtendExprCached(Step, Ty, Cache), L, SCEV::FlagNSW);

// Check whether the backedge-taken count is SCEVCouldNotCompute.		// Check whether the backedge-taken count is SCEVCouldNotCompute.
// Note that this serves two purposes: It filters out loops that are		// Note that this serves two purposes: It filters out loops that are
// simply not analyzable, and it covers the case where this code is		// simply not analyzable, and it covers the case where this code is
// being called from within backedge-taken count analysis, such that		// being called from within backedge-taken count analysis, such that
// attempting to ask for the backedge-taken count would likely result		// attempting to ask for the backedge-taken count would likely result
// in infinite recursion. In the later case, the analysis code will		// in infinite recursion. In the later case, the analysis code will
// cope with a conservative value, and it will take care to purge		// cope with a conservative value, and it will take care to purge
// that value once it has finished.		// that value once it has finished.
const SCEV *MaxBECount = getMaxBackedgeTakenCount(L);		const SCEV *MaxBECount = getMaxBackedgeTakenCount(L);
if (!isa<SCEVCouldNotCompute>(MaxBECount)) {		if (!isa<SCEVCouldNotCompute>(MaxBECount)) {
// Manually compute the final value for AR, checking for		// Manually compute the final value for AR, checking for
// overflow.		// overflow.

// Check whether the backedge-taken count can be losslessly casted to		// Check whether the backedge-taken count can be losslessly casted to
// the addrec's type. The count is always unsigned.		// the addrec's type. The count is always unsigned.
const SCEV *CastedMaxBECount =		const SCEV *CastedMaxBECount =
getTruncateOrZeroExtend(MaxBECount, Start->getType());		getTruncateOrZeroExtend(MaxBECount, Start->getType());
const SCEV *RecastedMaxBECount =		const SCEV *RecastedMaxBECount =
getTruncateOrZeroExtend(CastedMaxBECount, MaxBECount->getType());		getTruncateOrZeroExtend(CastedMaxBECount, MaxBECount->getType());
if (MaxBECount == RecastedMaxBECount) {		if (MaxBECount == RecastedMaxBECount) {
Type WideTy = IntegerType::get(getContext(), BitWidth 2);		Type WideTy = IntegerType::get(getContext(), BitWidth 2);
// Check whether Start+Step*MaxBECount has no signed overflow.		// Check whether Start+Step*MaxBECount has no signed overflow.
const SCEV *SMul = getMulExpr(CastedMaxBECount, Step);		const SCEV *SMul = getMulExpr(CastedMaxBECount, Step);
const SCEV *SAdd = getSignExtendExpr(getAddExpr(Start, SMul), WideTy);		const SCEV *SAdd =
const SCEV *WideStart = getSignExtendExpr(Start, WideTy);		getSignExtendExprCached(getAddExpr(Start, SMul), WideTy, Cache);
		const SCEV *WideStart = getSignExtendExprCached(Start, WideTy, Cache);
const SCEV *WideMaxBECount =		const SCEV *WideMaxBECount =
getZeroExtendExpr(CastedMaxBECount, WideTy);		getZeroExtendExpr(CastedMaxBECount, WideTy);
const SCEV *OperandExtendedAdd =		const SCEV *OperandExtendedAdd = getAddExpr(
getAddExpr(WideStart,		WideStart, getMulExpr(WideMaxBECount, getSignExtendExprCached(
getMulExpr(WideMaxBECount,		Step, WideTy, Cache)));
getSignExtendExpr(Step, WideTy)));
if (SAdd == OperandExtendedAdd) {		if (SAdd == OperandExtendedAdd) {
// Cache knowledge of AR NSW, which is propagated to this AddRec.		// Cache knowledge of AR NSW, which is propagated to this AddRec.
const_cast<SCEVAddRecExpr *>(AR)->setNoWrapFlags(SCEV::FlagNSW);		const_cast<SCEVAddRecExpr *>(AR)->setNoWrapFlags(SCEV::FlagNSW);
// Return the expression with the addrec on the outside.		// Return the expression with the addrec on the outside.
return getAddRecExpr(		return getAddRecExpr(
getExtendAddRecStart<SCEVSignExtendExpr>(AR, Ty, this),		getExtendAddRecStart<SCEVSignExtendExpr>(AR, Ty, this, Cache),
getSignExtendExpr(Step, Ty), L, AR->getNoWrapFlags());		getSignExtendExprCached(Step, Ty, Cache), L,
		AR->getNoWrapFlags());
}		}
// Similar to above, only this time treat the step value as unsigned.		// Similar to above, only this time treat the step value as unsigned.
// This covers loops that count up with an unsigned step.		// This covers loops that count up with an unsigned step.
OperandExtendedAdd =		OperandExtendedAdd =
getAddExpr(WideStart,		getAddExpr(WideStart,
getMulExpr(WideMaxBECount,		getMulExpr(WideMaxBECount,
getZeroExtendExpr(Step, WideTy)));		getZeroExtendExpr(Step, WideTy)));
if (SAdd == OperandExtendedAdd) {		if (SAdd == OperandExtendedAdd) {
// If AR wraps around then		// If AR wraps around then
//		//
// abs(Step) * MaxBECount > unsigned-max(AR->getType())		// abs(Step) * MaxBECount > unsigned-max(AR->getType())
// => SAdd != OperandExtendedAdd		// => SAdd != OperandExtendedAdd
//		//
// Thus (AR is not NW => SAdd != OperandExtendedAdd) <=>		// Thus (AR is not NW => SAdd != OperandExtendedAdd) <=>
// (SAdd == OperandExtendedAdd => AR is NW)		// (SAdd == OperandExtendedAdd => AR is NW)

const_cast<SCEVAddRecExpr *>(AR)->setNoWrapFlags(SCEV::FlagNW);		const_cast<SCEVAddRecExpr *>(AR)->setNoWrapFlags(SCEV::FlagNW);

// Return the expression with the addrec on the outside.		// Return the expression with the addrec on the outside.
return getAddRecExpr(		return getAddRecExpr(
getExtendAddRecStart<SCEVSignExtendExpr>(AR, Ty, this),		getExtendAddRecStart<SCEVSignExtendExpr>(AR, Ty, this, Cache),
getZeroExtendExpr(Step, Ty), L, AR->getNoWrapFlags());		getZeroExtendExpr(Step, Ty), L, AR->getNoWrapFlags());
}		}
}		}
}		}

// Normally, in the cases we can prove no-overflow via a		// Normally, in the cases we can prove no-overflow via a
// backedge guarding condition, we can also compute a backedge		// backedge guarding condition, we can also compute a backedge
// taken count for the loop. The exceptions are assumptions and		// taken count for the loop. The exceptions are assumptions and
Show All 15 Lines	if (AR->isAffine()) {
if (OverflowLimit &&		if (OverflowLimit &&
(isLoopBackedgeGuardedByCond(L, Pred, AR, OverflowLimit) \|\|		(isLoopBackedgeGuardedByCond(L, Pred, AR, OverflowLimit) \|\|
(isLoopEntryGuardedByCond(L, Pred, Start, OverflowLimit) &&		(isLoopEntryGuardedByCond(L, Pred, Start, OverflowLimit) &&
isLoopBackedgeGuardedByCond(L, Pred, AR->getPostIncExpr(*this),		isLoopBackedgeGuardedByCond(L, Pred, AR->getPostIncExpr(*this),
OverflowLimit)))) {		OverflowLimit)))) {
// Cache knowledge of AR NSW, then propagate NSW to the wide AddRec.		// Cache knowledge of AR NSW, then propagate NSW to the wide AddRec.
const_cast<SCEVAddRecExpr *>(AR)->setNoWrapFlags(SCEV::FlagNSW);		const_cast<SCEVAddRecExpr *>(AR)->setNoWrapFlags(SCEV::FlagNSW);
return getAddRecExpr(		return getAddRecExpr(
getExtendAddRecStart<SCEVSignExtendExpr>(AR, Ty, this),		getExtendAddRecStart<SCEVSignExtendExpr>(AR, Ty, this, Cache),
getSignExtendExpr(Step, Ty), L, AR->getNoWrapFlags());		getSignExtendExprCached(Step, Ty, Cache), L,
		AR->getNoWrapFlags());
}		}
}		}

// If Start and Step are constants, check if we can apply this		// If Start and Step are constants, check if we can apply this
// transformation:		// transformation:
// sext{C1,+,C2} --> C1 + sext{0,+,C2} if C1 < C2		// sext{C1,+,C2} --> C1 + sext{0,+,C2} if C1 < C2
auto *SC1 = dyn_cast<SCEVConstant>(Start);		auto *SC1 = dyn_cast<SCEVConstant>(Start);
auto *SC2 = dyn_cast<SCEVConstant>(Step);		auto *SC2 = dyn_cast<SCEVConstant>(Step);
if (SC1 && SC2) {		if (SC1 && SC2) {
const APInt &C1 = SC1->getAPInt();		const APInt &C1 = SC1->getAPInt();
const APInt &C2 = SC2->getAPInt();		const APInt &C2 = SC2->getAPInt();
if (C1.isStrictlyPositive() && C2.isStrictlyPositive() && C2.ugt(C1) &&		if (C1.isStrictlyPositive() && C2.isStrictlyPositive() && C2.ugt(C1) &&
C2.isPowerOf2()) {		C2.isPowerOf2()) {
Start = getSignExtendExpr(Start, Ty);		Start = getSignExtendExprCached(Start, Ty, Cache);
const SCEV *NewAR = getAddRecExpr(getZero(AR->getType()), Step, L,		const SCEV *NewAR = getAddRecExpr(getZero(AR->getType()), Step, L,
AR->getNoWrapFlags());		AR->getNoWrapFlags());
return getAddExpr(Start, getSignExtendExpr(NewAR, Ty));		return getAddExpr(Start, getSignExtendExprCached(NewAR, Ty, Cache));
}		}
}		}

if (proveNoWrapByVaryingStart<SCEVSignExtendExpr>(Start, Step, L)) {		if (proveNoWrapByVaryingStart<SCEVSignExtendExpr>(Start, Step, L)) {
const_cast<SCEVAddRecExpr *>(AR)->setNoWrapFlags(SCEV::FlagNSW);		const_cast<SCEVAddRecExpr *>(AR)->setNoWrapFlags(SCEV::FlagNSW);
return getAddRecExpr(		return getAddRecExpr(
getExtendAddRecStart<SCEVSignExtendExpr>(AR, Ty, this),		getExtendAddRecStart<SCEVSignExtendExpr>(AR, Ty, this, Cache),
getSignExtendExpr(Step, Ty), L, AR->getNoWrapFlags());		getSignExtendExprCached(Step, Ty, Cache), L, AR->getNoWrapFlags());
}		}
}		}

// If the input value is provably positive and we could not simplify		// If the input value is provably positive and we could not simplify
// away the sext build a zext instead.		// away the sext build a zext instead.
if (isKnownNonNegative(Op))		if (isKnownNonNegative(Op))
return getZeroExtendExpr(Op, Ty);		return getZeroExtendExpr(Op, Ty);

▲ Show 20 Lines • Show All 8,834 Lines • Show Last 20 Lines

llvm/trunk/unittests/Analysis/ScalarEvolutionTest.cpp

Show First 20 Lines • Show All 660 Lines • ▼ Show 20 Lines	runWithSE(*M, "f_1", [&](Function &F, LoopInfo &LI, ScalarEvolution &SE) {
Loops.clear();		Loops.clear();
Loops.insert(S1->getLoop());		Loops.insert(S1->getLoop());
auto *N1 = normalizeForPostIncUse(S1, Loops, SE);		auto *N1 = normalizeForPostIncUse(S1, Loops, SE);
auto *D1 = denormalizeForPostIncUse(N1, Loops, SE);		auto *D1 = denormalizeForPostIncUse(N1, Loops, SE);
EXPECT_EQ(S1, D1) << S1 << " " << D1;		EXPECT_EQ(S1, D1) << S1 << " " << D1;
});		});
}		}

		// Expect the call of getZeroExtendExpr will not cost exponential time.
		TEST_F(ScalarEvolutionsTest, SCEVZeroExtendExpr) {
		LLVMContext C;
		SMDiagnostic Err;

		// Generate a function like below:
		// define void @foo() {
		// entry:
		// br label %for.cond
		//
		// for.cond:
		// %0 = phi i64 [ 100, %entry ], [ %dec, %for.inc ]
		// %cmp = icmp sgt i64 %0, 90
		// br i1 %cmp, label %for.inc, label %for.cond1
		//
		// for.inc:
		// %dec = add nsw i64 %0, -1
		// br label %for.cond
		//
		// for.cond1:
		// %1 = phi i64 [ 100, %for.cond ], [ %dec5, %for.inc2 ]
		// %cmp3 = icmp sgt i64 %1, 90
		// br i1 %cmp3, label %for.inc2, label %for.cond4
		//
		// for.inc2:
		// %dec5 = add nsw i64 %1, -1
		// br label %for.cond1
		//
		// ......
		//
		// for.cond89:
		// %19 = phi i64 [ 100, %for.cond84 ], [ %dec94, %for.inc92 ]
		// %cmp93 = icmp sgt i64 %19, 90
		// br i1 %cmp93, label %for.inc92, label %for.end
		//
		// for.inc92:
		// %dec94 = add nsw i64 %19, -1
		// br label %for.cond89
		//
		// for.end:
		// %gep = getelementptr i8, i8* null, i64 %dec
		// %gep6 = getelementptr i8, i8* %gep, i64 %dec5
		// ......
		// %gep95 = getelementptr i8, i8* %gep91, i64 %dec94
		// ret void
		// }
		FunctionType *FTy = FunctionType::get(Type::getVoidTy(Context), {}, false);
		Function *F = cast<Function>(M.getOrInsertFunction("foo", FTy));

		BasicBlock *EntryBB = BasicBlock::Create(Context, "entry", F);
		BasicBlock *CondBB = BasicBlock::Create(Context, "for.cond", F);
		BasicBlock *EndBB = BasicBlock::Create(Context, "for.end", F);
		BranchInst::Create(CondBB, EntryBB);
		BasicBlock *PrevBB = EntryBB;

		Type *I64Ty = Type::getInt64Ty(Context);
		Type *I8Ty = Type::getInt8Ty(Context);
		Type *I8PtrTy = Type::getInt8PtrTy(Context);
		Value *Accum = Constant::getNullValue(I8PtrTy);
		int Iters = 20;
		for (int i = 0; i < Iters; i++) {
		BasicBlock *IncBB = BasicBlock::Create(Context, "for.inc", F, EndBB);
		auto *PN = PHINode::Create(I64Ty, 2, "", CondBB);
		PN->addIncoming(ConstantInt::get(Context, APInt(64, 100)), PrevBB);
		auto *Cmp = CmpInst::Create(Instruction::ICmp, CmpInst::ICMP_SGT, PN,
		ConstantInt::get(Context, APInt(64, 90)), "cmp",
		CondBB);
		BasicBlock *NextBB;
		if (i != Iters - 1)
		NextBB = BasicBlock::Create(Context, "for.cond", F, EndBB);
		else
		NextBB = EndBB;
		BranchInst::Create(IncBB, NextBB, Cmp, CondBB);
		auto *Dec = BinaryOperator::CreateNSWAdd(
		PN, ConstantInt::get(Context, APInt(64, -1)), "dec", IncBB);
		PN->addIncoming(Dec, IncBB);
		BranchInst::Create(CondBB, IncBB);

		Accum = GetElementPtrInst::Create(I8Ty, Accum, Dec, "gep", EndBB);

		PrevBB = CondBB;
		CondBB = NextBB;
		}
		ReturnInst::Create(Context, nullptr, EndBB);
		ScalarEvolution SE = buildSE(*F);
		const SCEV *S = SE.getSCEV(Accum);
		Type *I128Ty = Type::getInt128Ty(Context);
		SE.getZeroExtendExpr(S, I128Ty);
		}

} // end anonymous namespace		} // end anonymous namespace
} // end namespace llvm		} // end namespace llvm

This is an archive of the discontinued LLVM Phabricator instance.

[LSR] Add a cap for reassociation of AllFixupsOutsideLoop type LSRUse to protect compile timeClosedPublic

Details

Diff Detail

Event Timeline

Revision Contents

Diff 95480

llvm/trunk/include/llvm/Analysis/ScalarEvolution.h

llvm/trunk/lib/Analysis/ScalarEvolution.cpp

llvm/trunk/unittests/Analysis/ScalarEvolutionTest.cpp

[LSR] Add a cap for reassociation of AllFixupsOutsideLoop type LSRUse to protect compile time
ClosedPublic