Download Raw Diff

Details

Reviewers

efriedma
nikic
fhahn

Commits

rGd24a0e88576d: [SCEV] Use constant range of RHS to prove NUW on narrow IV in trip count logic

Summary

The basic idea here is that given a zero extended narrow IV, we can prove the inner IV to be NUW if we can prove there's a value the inner IV must take before overflow which must exit the loop.

This is a follow to D108651.

Diff Detail

Event Timeline

reames created this revision.Sep 8 2021, 11:10 AM

Herald added subscribers: bollu, hiraditya, mcrosier. · View Herald TranscriptSep 8 2021, 11:10 AM

reames requested review of this revision.Sep 8 2021, 11:10 AM

Herald added a project: Restricted Project. · View Herald TranscriptSep 8 2021, 11:10 AM

reames mentioned this in D108651: [SCEV] Use no-self-wrap flags infered from exit structure to compute trip count.Sep 8 2021, 11:17 AM

Harbormaster completed remote builds in B123090: Diff 371397.Sep 8 2021, 12:28 PM

ping

efriedma added inline comments.Sep 13 2021, 3:45 PM

llvm/lib/Analysis/ScalarEvolution.cpp
11636	Why is the `StrideMax == 0` special case necessary?
11640	What is `APInt::getMaxValue(InnerBitWidth) - StrideMax` the limit of? I guess you're looking for the smallest possible value of `AR` at the step before it overflows. If that value forces the loop to exit, then the loop must exit before the overflow. A conservative way to figure that is based on the stride itself: just use the smallest value X such that X+Stride might overflow. A comment would be helpful. Also, I think you might be off by one here? `Limit` is one less than the value X I described. But maybe that cancels out somehow...

address review comment with better comments, also fixed a bug noticed in the process. I used max in two places, whereas one of them needed to be a min for the required purpose.

Planning to land the test, then rebase this.

reames mentioned this in rGdf7c2bcf4e45: precommit tests for D109457.Sep 16 2021, 12:43 PM

Harbormaster completed remote builds in B124252: Diff 373028.Sep 16 2021, 12:46 PM

Rebase over landed tests.

One question for reviewers - We don't currently get any value in handling non-constant strides because (due to limitations in flag inference), we basically never conclude the that step is non-zero from data flow. (We might from control flow, but that doesn't influence constant ranges.) Should I keep the complexity? Or just drop it to constant non-zero step and be done?

Harbormaster completed remote builds in B124259: Diff 373037.Sep 16 2021, 1:54 PM

ping

p.s. In offline discussion, @nlewycky suggested a nice generalization of this idea in terms of invertible operations once we've proven the value being compared against is in the co-domain of the LHS. I want to land this as is, but I'm hoping to leverage that idea into a generalization both here, and in IndVarSimplify.

ping - this has been outstanding for a while and is blocking progress for me, any chance I can get someone to review?

JFYI, all of the other approaches I've mentioned in the review thread so far have not panned out. I can cover some cases with each, but not the motivating example. This patch's use of mustprogress to disprove the infinite loop case is critical. Every time I tried implementing this differently, I ended up just re-implementing this same idea without the infrastructure that SCEV already provides for this. I think this really does have to be in SCEV's trip count logic.

reames mentioned this in rG8b31f07cdf13: [tests] Add indvars tests showing missing transforms with small IVs.Oct 14 2021, 1:32 PM

In D109457#3064633, @reames wrote:

JFYI, all of the other approaches I've mentioned in the review thread so far have not panned out.

As often happens, taking a fresh look at something reveals another approach. I've posted https://reviews.llvm.org/D111836 which isn't really a replacement for this patch, but tackles problems in the same area, and is probably easier to review/justify.

reames mentioned this in rGc0d9bf2f6afd: [indvars] Allow rotation (narrowing) of exit test when discovering trip count.Nov 4 2021, 2:49 PM

Drop the must-exit logic entirely. As recently demonstrated in the indvars approach, none of the original motivating examples actually require it after we incorporate more precise range reasoning.

Add applyLoopGuards to constrain RHS, and fix an off by one bug which resulted in inprecise results at the edge. Additionally, include tests specifically focused on the edge case to demonstrate the correctness of said change.

As can now be seen by the test changes in finite-exit-comparisons.ll (the test file exercising the new indvars logic), this sometimes lets us to LFTR an exit test instead of rotating/narrowing. I've manually confirmed that we generate exit tests for all of the examples, why some still hit the rotate path will be investigated separately. (Edit: See https://bugs.llvm.org/show_bug.cgi?id=52423 for result of that investigation. Its an LFTR limitation we should fix.) I'm hoping to be able to delete that logic again entirely, but well, we'll see.

@nikic, @mkazantsev, @fhahn - Given the the must-exit logic has been removed and this is now pretty much just basic constant range reasoning, I'd greatly appreciate a review so we can get this in. After implementing the indvars approach, I'm more convinced than ever that SCEV really should just be able to compute a trip count for these loops. Everything else seems like a massive hack.

Harbormaster completed remote builds in B132731: Diff 385125.Nov 5 2021, 11:53 AM

LGTM

llvm/lib/Analysis/ScalarEvolution.cpp
11634

This revision is now accepted and ready to land.Nov 5 2021, 2:23 PM

Closed by commit rGd24a0e88576d: [SCEV] Use constant range of RHS to prove NUW on narrow IV in trip count logic (authored by reames). · Explain WhyNov 5 2021, 3:37 PM

This revision was automatically updated to reflect the committed changes.

reames added a commit: rGd24a0e88576d: [SCEV] Use constant range of RHS to prove NUW on narrow IV in trip count logic.

Diff 371397

llvm/lib/Analysis/ScalarEvolution.cpp

This file is larger than 256 KB, so syntax highlighting is disabled by default.

Show First 20 Lines • Show All 11,575 Lines • ▼ Show 20 Lines

ScalarEvolution::howManyLessThans(const SCEV *LHS, const SCEV *RHS, ScalarEvolution::howManyLessThans(const SCEV *LHS, const SCEV *RHS,

const Loop *L, bool IsSigned, const Loop *L, bool IsSigned,

bool ControlsExit, bool AllowPredicates) { bool ControlsExit, bool AllowPredicates) {

SmallPtrSet<const SCEVPredicate *, 4> Predicates; SmallPtrSet<const SCEVPredicate *, 4> Predicates;

const SCEVAddRecExpr *IV = dyn_cast<SCEVAddRecExpr>(LHS); const SCEVAddRecExpr *IV = dyn_cast<SCEVAddRecExpr>(LHS);

bool PredicatedIV = false; bool PredicatedIV = false;

/// Return true if we can prove that this exit must taken on some iteration

/// of the loop.

auto exitMustBeTaken = [&]() {

Lint: Pre-merge checks

clang-tidy: warning: invalid case style for variable 'exitMustBeTaken' [readability-identifier-naming]
not useful

Lint: Pre-merge checks: clang-tidy: warning: invalid case style for variable 'exitMustBeTaken' [readability-identifier…

if (!ControlsExit || !loopHasNoAbnormalExits(L))

return false;

return loopIsFiniteByAssumption(L);

};

auto canAssumeNoSelfWrap = [&](const SCEVAddRecExpr *AR) { auto canAssumeNoSelfWrap = [&](const SCEVAddRecExpr *AR) {

// Can we prove this loop *must* be UB if overflow of IV occurs? // Can we prove this loop *must* be UB if overflow of IV occurs?

// Reasoning goes as follows: // Reasoning goes as follows:

// * Suppose the IV did self wrap. // * Suppose the IV did self wrap.

// * If Stride evenly divides the iteration space, then once wrap // * If Stride evenly divides the iteration space, then once wrap

// occurs, the loop must revisit the same values. // occurs, the loop must revisit the same values.

// * We know that RHS is invariant, and that none of those values // * We know that RHS is invariant, and that none of those values

// caused this exit to be taken previously. Thus, this exit is // caused this exit to be taken previously. Thus, this exit is

// dynamically dead. // dynamically dead.

// * If this is the sole exit, then a dead exit implies the loop // * If this is the sole exit, then a dead exit implies the loop

// must be infinite if there are no abnormal exits. // must be infinite if there are no abnormal exits.

// * If the loop were infinite, then it must either not be mustprogress // * If the loop were infinite, then it must either not be mustprogress

// or have side effects. Otherwise, it must be UB. // or have side effects. Otherwise, it must be UB.

// * It can't (by assumption), be UB so we have contradicted our // * It can't (by assumption), be UB so we have contradicted our

// premise and can conclude the IV did not in fact self-wrap. // premise and can conclude the IV did not in fact self-wrap.

if (!isLoopInvariant(RHS, L)) if (!isLoopInvariant(RHS, L))

return false; return false;

auto *StrideC = dyn_cast<SCEVConstant>(AR->getStepRecurrence(*this)); auto *StrideC = dyn_cast<SCEVConstant>(AR->getStepRecurrence(*this));

if (!StrideC || !StrideC->getAPInt().isPowerOf2()) if (!StrideC || !StrideC->getAPInt().isPowerOf2())

return false; return false;

if (!ControlsExit || !loopHasNoAbnormalExits(L)) return exitMustBeTaken();

return false;

return loopIsFiniteByAssumption(L);

}; };

if (!IV) { if (!IV) {

if (auto *ZExt = dyn_cast<SCEVZeroExtendExpr>(LHS)) { if (auto *ZExt = dyn_cast<SCEVZeroExtendExpr>(LHS)) {

const SCEVAddRecExpr *AR = dyn_cast<SCEVAddRecExpr>(ZExt->getOperand()); const SCEVAddRecExpr *AR = dyn_cast<SCEVAddRecExpr>(ZExt->getOperand());

if (AR && AR->getLoop() == L && AR->isAffine()) { if (AR && AR->getLoop() == L && AR->isAffine()) {

auto Flags = AR->getNoWrapFlags(); auto Flags = AR->getNoWrapFlags();

if (!hasFlags(Flags, SCEV::FlagNW) && canAssumeNoSelfWrap(AR)) { if (!hasFlags(Flags, SCEV::FlagNW) && canAssumeNoSelfWrap(AR)) {

Flags = setFlags(Flags, SCEV::FlagNW); Flags = setFlags(Flags, SCEV::FlagNW);

SmallVector<const SCEV*> Operands{AR->operands()}; SmallVector<const SCEV*> Operands{AR->operands()};

Flags = StrengthenNoWrapFlags(this, scAddRecExpr, Operands, Flags); Flags = StrengthenNoWrapFlags(this, scAddRecExpr, Operands, Flags);

}

setNoWrapFlags(const_cast<SCEVAddRecExpr *>(AR), Flags); auto canProveNUW = [&]() {

Lint: Pre-merge checks

clang-tidy: warning: invalid case style for variable 'canProveNUW' [readability-identifier-naming]
not useful

Lint: Pre-merge checks: clang-tidy: warning: invalid case style for variable 'canProveNUW' [readability-identifier…

if (!isLoopInvariant(RHS, L))

return false;

auto StrideMax = getUnsignedRangeMax(AR->getStepRecurrence(*this));

nikicUnsubmitted

Not Done

return false;

- if (getUnsignedRangeMin(AR->getStepRecurrence(*this)).isZero())

+ if (!isKnownNonZero(AR->getStepRecurrence(*this)))

// We need the sequence defined by AR to strictly increase in the

nikic:

if (StrideMax == 0)

return false;

efriedmaUnsubmitted

Not Done

Why is the StrideMax == 0 special case necessary?

efriedma: Why is the `StrideMax == 0` special case necessary?

const unsigned InnerBitWidth = getTypeSizeInBits(AR->getType());

const unsigned OuterBitWidth = getTypeSizeInBits(RHS->getType());

APInt Limit = APInt::getMaxValue(InnerBitWidth) - StrideMax;

efriedmaUnsubmitted

Not Done

What is APInt::getMaxValue(InnerBitWidth) - StrideMax the limit of?

I guess you're looking for the smallest possible value of AR at the step before it overflows. If that value forces the loop to exit, then the loop must exit before the overflow. A conservative way to figure that is based on the stride itself: just use the smallest value X such that X+Stride might overflow.

A comment would be helpful.

Also, I think you might be off by one here? Limit is one less than the value X I described. But maybe that cancels out somehow...

efriedma: What is `APInt::getMaxValue(InnerBitWidth) - StrideMax` the limit of? I guess you're looking…

Limit = Limit.zext(OuterBitWidth);

auto RHSCR = getUnsignedRange(RHS);

if (exitMustBeTaken()) {

// If the exit must be taken, then we know that RHS can't take a

// value which can't be represented by the extension of the narrow

// IV. Otherwise, we'd have UB.

auto FullCR = ConstantRange::getFull(InnerBitWidth);

FullCR = FullCR.zeroExtend(OuterBitWidth);

RHSCR = RHSCR.intersectWith(FullCR, ConstantRange::Unsigned);

} }

return RHSCR.getUnsignedMax().ule(Limit);

};

if (!hasFlags(Flags, SCEV::FlagNUW) && canProveNUW())

Flags = setFlags(Flags, SCEV::FlagNUW);

setNoWrapFlags(const_cast<SCEVAddRecExpr *>(AR), Flags);

if (AR->hasNoUnsignedWrap()) { if (AR->hasNoUnsignedWrap()) {

// Emulate what getZeroExtendExpr would have done during construction // Emulate what getZeroExtendExpr would have done during construction

// if we'd been able to infer the fact just above at that time. // if we'd been able to infer the fact just above at that time.

const SCEV *Step = AR->getStepRecurrence(*this); const SCEV *Step = AR->getStepRecurrence(*this);

Type *Ty = ZExt->getType(); Type *Ty = ZExt->getType();

auto *S = getAddRecExpr( auto *S = getAddRecExpr(

getExtendAddRecStart<SCEVZeroExtendExpr>(AR, Ty, this, 0), getExtendAddRecStart<SCEVZeroExtendExpr>(AR, Ty, this, 0),

getZeroExtendExpr(Step, Ty, 0), L, AR->getNoWrapFlags()); getZeroExtendExpr(Step, Ty, 0), L, AR->getNoWrapFlags());

▲ Show 20 Lines • Show All 2,499 Lines • Show Last 20 Lines

llvm/test/Analysis/ScalarEvolution/trip-count-implied-addrec.ll

Show First 20 Lines • Show All 94 Lines • ▼ Show 20 Lines	for.body: ; preds = %entry, %for.body
%zext = zext i8 %iv to i16		%zext = zext i8 %iv to i16
%cmp = icmp ult i16 %zext, 257		%cmp = icmp ult i16 %zext, 257
br i1 %cmp, label %for.body, label %for.end		br i1 %cmp, label %for.body, label %for.end

for.end: ; preds = %for.body, %entry		for.end: ; preds = %for.body, %entry
ret void		ret void
}		}

		; CHECK: Determining loop execution counts for: @rhs_mustexit_1
		; CHECK: Loop %for.body: backedge-taken count is (-1 + (1 umax (-1 + (zext i8 (trunc i16 %n.raw to i8) to i16))<nsw>))
		; CHECK: Loop %for.body: max backedge-taken count is -2
		define void @rhs_mustexit_1(i16 %n.raw) mustprogress {
		entry:
		%n.and = and i16 %n.raw, 255
		%n = add nsw i16 %n.and, -1
		br label %for.body

		for.body: ; preds = %entry, %for.body
		%iv = phi i8 [ %iv.next, %for.body ], [ 0, %entry ]
		%iv.next = add i8 %iv, 1
		store i8 %iv, i8* @G
		%zext = zext i8 %iv.next to i16
		%cmp = icmp ult i16 %zext, %n
		br i1 %cmp, label %for.body, label %for.end

		for.end: ; preds = %for.body, %entry
		ret void
		}

		; CHECK: Determining loop execution counts for: @rhs_mustexit_3
		; CHECK: Loop %for.body: backedge-taken count is ((-1 + (3 umax (-3 + (zext i8 (trunc i16 %n.raw to i8) to i16))<nsw>)) /u 3)
		; CHECK: Loop %for.body: max backedge-taken count is 21844
		define void @rhs_mustexit_3(i16 %n.raw) mustprogress {
		entry:
		%n.and = and i16 %n.raw, 255
		%n = add nsw i16 %n.and, -3
		br label %for.body

		for.body: ; preds = %entry, %for.body
		%iv = phi i8 [ %iv.next, %for.body ], [ 0, %entry ]
		%iv.next = add i8 %iv, 3
		store i8 %iv, i8* @G
		%zext = zext i8 %iv.next to i16
		%cmp = icmp ult i16 %zext, %n
		br i1 %cmp, label %for.body, label %for.end

		for.end: ; preds = %for.body, %entry
		ret void
		}

		; CHECK: Determining loop execution counts for: @neg_rhs_wrong_range
		; CHECK: Loop %for.body: Unpredictable backedge-taken count.
		; CHECK: Loop %for.body: Unpredictable max backedge-taken count.
		define void @neg_rhs_wrong_range(i16 %n.raw) mustprogress {
		entry:
		%n.and = and i16 %n.raw, 255
		%n = add nsw i16 %n.and, -1
		br label %for.body

		for.body: ; preds = %entry, %for.body
		%iv = phi i8 [ %iv.next, %for.body ], [ 0, %entry ]
		%iv.next = add i8 %iv, 2
		store i8 %iv, i8* @G
		%zext = zext i8 %iv.next to i16
		%cmp = icmp ult i16 %zext, %n
		br i1 %cmp, label %for.body, label %for.end

		for.end: ; preds = %for.body, %entry
		ret void
		}

		; CHECK: Determining loop execution counts for: @neg_rhs_maybe_infinite
		; CHECK: Loop %for.body: Unpredictable backedge-taken count.
		; CHECK: Loop %for.body: Unpredictable max backedge-taken count.
		define void @neg_rhs_maybe_infinite(i16 %n.raw) {
		entry:
		%n.and = and i16 %n.raw, 255
		%n = add nsw i16 %n.and, -1
		br label %for.body

		for.body: ; preds = %entry, %for.body
		%iv = phi i8 [ %iv.next, %for.body ], [ 0, %entry ]
		%iv.next = add i8 %iv, 1
		store i8 %iv, i8* @G
		%zext = zext i8 %iv.next to i16
		%cmp = icmp ult i16 %zext, %n
		br i1 %cmp, label %for.body, label %for.end

		for.end: ; preds = %for.body, %entry
		ret void
		}

		; Because of the range on RHS including only values within i8, we don't need
		; the must exit property
		; CHECK: Determining loop execution counts for: @rhs_narrow_range
		; CHECK: Loop %for.body: backedge-taken count is (-1 + (1 umax (2 * (zext i7 (trunc i16 (%n.raw /u 2) to i7) to i16))<nuw><nsw>))<nsw>
		; CHECK: Loop %for.body: max backedge-taken count is 253
		define void @rhs_narrow_range(i16 %n.raw) {
		entry:
		%n = and i16 %n.raw, 254
		br label %for.body

		for.body: ; preds = %entry, %for.body
		%iv = phi i8 [ %iv.next, %for.body ], [ 0, %entry ]
		%iv.next = add i8 %iv, 1
		store i8 %iv, i8* @G
		%zext = zext i8 %iv.next to i16
		%cmp = icmp ult i16 %zext, %n
		br i1 %cmp, label %for.body, label %for.end

		for.end: ; preds = %for.body, %entry
		ret void
		}


declare void @llvm.assume(i1)		declare void @llvm.assume(i1)

This is an archive of the discontinued LLVM Phabricator instance.

[SCEV] Use constant range of RHS to prove NUW on narrow IV in trip count logic
ClosedPublic

Details

Diff Detail

Event Timeline

Revision Contents

Diff 371397

llvm/lib/Analysis/ScalarEvolution.cpp

llvm/test/Analysis/ScalarEvolution/trip-count-implied-addrec.ll

This is an archive of the discontinued LLVM Phabricator instance.

[SCEV] Use constant range of RHS to prove NUW on narrow IV in trip count logicClosedPublic

Details

Diff Detail

Event Timeline

Revision Contents

Diff 371397

llvm/lib/Analysis/ScalarEvolution.cpp

llvm/test/Analysis/ScalarEvolution/trip-count-implied-addrec.ll

[SCEV] Use constant range of RHS to prove NUW on narrow IV in trip count logic
ClosedPublic