This is an archive of the discontinued LLVM Phabricator instance.

[SCEV] Use no-self-wrap flags infered from exit structure to compute trip count
ClosedPublic

Authored by reames on Aug 24 2021, 11:31 AM.

Download Raw Diff

Details

Reviewers

nikic
efriedma
lebedev.ri

Commits

rG6cdca906c79f: [SCEV] Use no-self-wrap flags infered from exit structure to compute trip count

Summary

Ready for review.

The basic problem being solved is that we largely give up when encountering a trip count involving an IV which is not an addrec. We will fall back to the brute force constant eval, but that doesn't have the information about the fact that we can't cycle back through the same set of values.

There's a high level design question of whether this is the right place to handle this, and if not, where that place is.

The major alternative here would be to return a conservative upper bound, and then rely on two invocations of indvars to add the facts to the narrow IV, and then reconstruct SCEV. (I have not implemented the alternative and am not 100% sure this would work out.) That's arguably more in line with existing code, but I find this substantially easier to reason about.

Thoughts?

Diff Detail

Repository: rG LLVM Github Monorepo

Event Timeline

reames created this revision.Aug 24 2021, 11:31 AM

Herald added subscribers: javed.absar, bollu, hiraditya, mcrosier. · View Herald TranscriptAug 24 2021, 11:31 AM

reames requested review of this revision.Aug 24 2021, 11:31 AM

Herald added a project: Restricted Project. · View Herald TranscriptAug 24 2021, 11:31 AM

reames added a parent revision: D108601: [SCEV] Infer nuw from nw for addrecs.Aug 24 2021, 11:31 AM

Harbormaster completed remote builds in B121016: Diff 368416.Aug 24 2021, 12:34 PM

efriedma added inline comments.Aug 24 2021, 12:58 PM

llvm/lib/Analysis/ScalarEvolution.cpp
11630	Can we move this logic to computeExitLimitFromICmp, or maybe even createAddRecFromPHI? It's basically independent of the predicate.
11640	cast<SCEVAddRecExpr>?

reames added inline comments.Aug 24 2021, 2:09 PM

llvm/lib/Analysis/ScalarEvolution.cpp
11630	We can move it out to the caller (computeExitLimitFromICmp), though I'd strongly prefer to land this and then do the move with separate tests added. I'd argue not in createAddRecFromPHI as I don't want to add the use list walk which would be required there.
11640	At least in theory, this could e.g. be folded to a constant with the additional facts. It's non ideal that we wouldn't really exploit that, but this can't be a cast.

reames mentioned this in rG35b0b1a64af5: [test] Prcommit tests for D108651.Aug 24 2021, 2:19 PM

reames mentioned this in rG4d235bf75d04: [tests] Add a couple tests for intersection of ec8d87e and D108651.Aug 24 2021, 2:29 PM

Rebase over landed changes and tests

Harbormaster completed remote builds in B121056: Diff 368475.Aug 24 2021, 4:00 PM

ping

ping x2

ping x 3

@efriedma Any chance you could take a look?

LGTM. Sorry about the delay.

(Please clang-format the patch.)

llvm/lib/Analysis/ScalarEvolution.cpp
11630	I'd argue not in createAddRecFromPHI as I don't want to add the use list walk which would be required there. I think we already do a similar sort of walk to try to prove poison? Maybe worth looking into. But not a priority, sure; the nowrap is less likely to be useful in other contexts.

This revision is now accepted and ready to land.Sep 7 2021, 4:06 PM

This revision was landed with ongoing or failed builds.Sep 7 2021, 5:01 PM

Closed by commit rG6cdca906c79f: [SCEV] Use no-self-wrap flags infered from exit structure to compute trip count (authored by reames). · Explain Why

This revision was automatically updated to reflect the committed changes.

reames added a commit: rG6cdca906c79f: [SCEV] Use no-self-wrap flags infered from exit structure to compute trip count.

reames mentioned this in D109457: [SCEV] Use constant range of RHS to prove NUW on narrow IV in trip count logic.Sep 8 2021, 11:10 AM

After landing this, I've explored more test cases and have largely convinced myself that this should live in indvarsimplify instead. I don't fully understand why yet, but despite recording the stronger flags in SCEV, we're never able to back propagate them into the IR. (I believe it has something to do with the mutation of existing SCEVs and lack of invalidation resulting in imprecise results continuing to be returned for certain queries.) I think we'd be overall better off letting them be added to the IR, even if SCEV isn't able to leverage them until the next rebuild. The other option is that we duplicate the logic, but I'd really rather not.

I've proposed D109457 which builds on this, and I want to land it with this architecture, but once that's done, I plan on moving the whole bit of logic to indvars (with separate review).

Revision Contents

Path

Size

llvm/

lib/

Analysis/

ScalarEvolution.cpp

83 lines

test/

Analysis/

ScalarEvolution/

trip-count-implied-addrec.ll

4 lines

Diff 371218

llvm/lib/Analysis/ScalarEvolution.cpp

This file is larger than 256 KB, so syntax highlighting is disabled by default.

Show First 20 Lines • Show All 11,575 Lines • ▼ Show 20 Lines
ScalarEvolution::howManyLessThans(const SCEV LHS, const SCEV RHS,		ScalarEvolution::howManyLessThans(const SCEV LHS, const SCEV RHS,
const Loop *L, bool IsSigned,		const Loop *L, bool IsSigned,
bool ControlsExit, bool AllowPredicates) {		bool ControlsExit, bool AllowPredicates) {
SmallPtrSet<const SCEVPredicate *, 4> Predicates;		SmallPtrSet<const SCEVPredicate *, 4> Predicates;

const SCEVAddRecExpr *IV = dyn_cast<SCEVAddRecExpr>(LHS);		const SCEVAddRecExpr *IV = dyn_cast<SCEVAddRecExpr>(LHS);
bool PredicatedIV = false;		bool PredicatedIV = false;

		auto canAssumeNoSelfWrap = [&](const SCEVAddRecExpr *AR) {
		// Can we prove this loop must be UB if overflow of IV occurs?
		// Reasoning goes as follows:
		// * Suppose the IV did self wrap.
		// * If Stride evenly divides the iteration space, then once wrap
		// occurs, the loop must revisit the same values.
		// * We know that RHS is invariant, and that none of those values
		// caused this exit to be taken previously. Thus, this exit is
		// dynamically dead.
		// * If this is the sole exit, then a dead exit implies the loop
		// must be infinite if there are no abnormal exits.
		// * If the loop were infinite, then it must either not be mustprogress
		// or have side effects. Otherwise, it must be UB.
		// * It can't (by assumption), be UB so we have contradicted our
		// premise and can conclude the IV did not in fact self-wrap.
		if (!isLoopInvariant(RHS, L))
		return false;

		auto StrideC = dyn_cast<SCEVConstant>(AR->getStepRecurrence(this));
		if (!StrideC \|\| !StrideC->getAPInt().isPowerOf2())
		return false;

		if (!ControlsExit \|\| !loopHasNoAbnormalExits(L))
		return false;

		return loopIsFiniteByAssumption(L);
		};

		if (!IV) {
		if (auto *ZExt = dyn_cast<SCEVZeroExtendExpr>(LHS)) {
		const SCEVAddRecExpr *AR = dyn_cast<SCEVAddRecExpr>(ZExt->getOperand());
		if (AR && AR->getLoop() == L && AR->isAffine()) {
		auto Flags = AR->getNoWrapFlags();
		if (!hasFlags(Flags, SCEV::FlagNW) && canAssumeNoSelfWrap(AR)) {
		Flags = setFlags(Flags, SCEV::FlagNW);

		SmallVector<const SCEV*> Operands{AR->operands()};
		Flags = StrengthenNoWrapFlags(this, scAddRecExpr, Operands, Flags);

		setNoWrapFlags(const_cast<SCEVAddRecExpr *>(AR), Flags);
		}
		if (AR->hasNoUnsignedWrap()) {
		// Emulate what getZeroExtendExpr would have done during construction
		// if we'd been able to infer the fact just above at that time.
		const SCEV Step = AR->getStepRecurrence(this);
		Type *Ty = ZExt->getType();
		auto *S = getAddRecExpr(
		efriedmaUnsubmitted Not Done Reply Inline Actions Can we move this logic to computeExitLimitFromICmp, or maybe even createAddRecFromPHI? It's basically independent of the predicate. efriedma: Can we move this logic to computeExitLimitFromICmp, or maybe even createAddRecFromPHI? It's…
		reamesAuthorUnsubmitted Done Reply Inline Actions We can move it out to the caller (computeExitLimitFromICmp), though I'd strongly prefer to land this and then do the move with separate tests added. I'd argue not in createAddRecFromPHI as I don't want to add the use list walk which would be required there. reames: We can move it out to the caller (computeExitLimitFromICmp), though I'd strongly prefer to land…
		efriedmaUnsubmitted Not Done Reply Inline Actions I'd argue not in createAddRecFromPHI as I don't want to add the use list walk which would be required there. I think we already do a similar sort of walk to try to prove poison? Maybe worth looking into. But not a priority, sure; the nowrap is less likely to be useful in other contexts. efriedma: > I'd argue not in createAddRecFromPHI as I don't want to add the use list walk which would be…
		getExtendAddRecStart<SCEVZeroExtendExpr>(AR, Ty, this, 0),
		getZeroExtendExpr(Step, Ty, 0), L, AR->getNoWrapFlags());
		IV = dyn_cast<SCEVAddRecExpr>(S);
		}
		}
		}
		}


if (!IV && AllowPredicates) {		if (!IV && AllowPredicates) {
		efriedmaUnsubmitted Not Done Reply Inline Actions cast<SCEVAddRecExpr>? efriedma: cast<SCEVAddRecExpr>?
		reamesAuthorUnsubmitted Done Reply Inline Actions At least in theory, this could e.g. be folded to a constant with the additional facts. It's non ideal that we wouldn't really exploit that, but this can't be a cast. reames: At least in theory, this could e.g. be folded to a constant with the additional facts. It's…
// Try to make this an AddRec using runtime tests, in the first X		// Try to make this an AddRec using runtime tests, in the first X
// iterations of this loop, where X is the SCEV expression found by the		// iterations of this loop, where X is the SCEV expression found by the
// algorithm below.		// algorithm below.
IV = convertSCEVToAddRecWithPredicates(LHS, L, Predicates);		IV = convertSCEVToAddRecWithPredicates(LHS, L, Predicates);
PredicatedIV = true;		PredicatedIV = true;
}		}

// Avoid weird loops		// Avoid weird loops
▲ Show 20 Lines • Show All 96 Lines • ▼ Show 20 Lines	if (!isKnownNonZero(Stride)) {
return isLoopEntryGuardedByCond(L, Cond, StartIfZero, RHS);		return isLoopEntryGuardedByCond(L, Cond, StartIfZero, RHS);
};		};
if (!wouldZeroStrideBeUB()) {		if (!wouldZeroStrideBeUB()) {
Stride = getUMaxExpr(Stride, getOne(Stride->getType()));		Stride = getUMaxExpr(Stride, getOne(Stride->getType()));
}		}
}		}
} else if (!Stride->isOne() && !NoWrap) {		} else if (!Stride->isOne() && !NoWrap) {
auto isUBOnWrap = [&]() {		auto isUBOnWrap = [&]() {
// Can we prove this loop must be UB if overflow of IV occurs?
// Reasoning goes as follows:
// * Suppose the IV did self wrap.
// * If Stride evenly divides the iteration space, then once wrap
// occurs, the loop must revisit the same values.
// * We know that RHS is invariant, and that none of those values
// caused this exit to be taken previously. Thus, this exit is
// dynamically dead.
// * If this is the sole exit, then a dead exit implies the loop
// must be infinite if there are no abnormal exits.
// * If the loop were infinite, then it must either not be mustprogress
// or have side effects. Otherwise, it must be UB.
// * It can't (by assumption), be UB so we have contradicted our
// premise and can conclude the IV did not in fact self-wrap.
// From no-self-wrap, we need to then prove no-(un)signed-wrap. This		// From no-self-wrap, we need to then prove no-(un)signed-wrap. This
// follows trivially from the fact that every (un)signed-wrapped, but		// follows trivially from the fact that every (un)signed-wrapped, but
// not self-wrapped value must be LT than the last value before		// not self-wrapped value must be LT than the last value before
// (un)signed wrap. Since we know that last value didn't exit, nor		// (un)signed wrap. Since we know that last value didn't exit, nor
// will any smaller one.		// will any smaller one.
		return canAssumeNoSelfWrap(IV);
if (!isLoopInvariant(RHS, L))
return false;

auto *StrideC = dyn_cast<SCEVConstant>(Stride);
if (!StrideC \|\| !StrideC->getAPInt().isPowerOf2())
return false;

if (!ControlsExit \|\| !loopHasNoAbnormalExits(L))
return false;

return loopIsFiniteByAssumption(L);
};		};

// Avoid proven overflow cases: this will ensure that the backedge taken		// Avoid proven overflow cases: this will ensure that the backedge taken
// count will not generate any unsigned overflow. Relaxed no-overflow		// count will not generate any unsigned overflow. Relaxed no-overflow
// conditions exploit NoWrapFlags, allowing to optimize in presence of		// conditions exploit NoWrapFlags, allowing to optimize in presence of
// undefined behaviors like the case of C language.		// undefined behaviors like the case of C language.
if (canIVOverflowOnLT(RHS, Stride, IsSigned) && !isUBOnWrap())		if (canIVOverflowOnLT(RHS, Stride, IsSigned) && !isUBOnWrap())
return getCouldNotCompute();		return getCouldNotCompute();
▲ Show 20 Lines • Show All 2,365 Lines • Show Last 20 Lines

llvm/test/Analysis/ScalarEvolution/trip-count-implied-addrec.ll

	; RUN: opt < %s -disable-output "-passes=print<scalar-evolution>" -scalar-evolution-classify-expressions=0 2>&1 \| FileCheck %s			; RUN: opt < %s -disable-output "-passes=print<scalar-evolution>" -scalar-evolution-classify-expressions=0 2>&1 \| FileCheck %s

	; A collection of tests that show we can use facts about an exit test to			; A collection of tests that show we can use facts about an exit test to
	; infer tighter bounds on an IV, and thus refine an IV into an addrec. The			; infer tighter bounds on an IV, and thus refine an IV into an addrec. The
	; basic tactic being used is proving NW from exit structure and then			; basic tactic being used is proving NW from exit structure and then
	; implying NUW/NSW. Once NSW/NUW is inferred, we can derive addrecs from			; implying NUW/NSW. Once NSW/NUW is inferred, we can derive addrecs from
	; the zext/sext cases that we couldn't at initial SCEV construction.			; the zext/sext cases that we couldn't at initial SCEV construction.

	@G = external global i8			@G = external global i8

	; CHECK-LABEL: Determining loop execution counts for: @nw_implies_nuw			; CHECK-LABEL: Determining loop execution counts for: @nw_implies_nuw
	; CHECK: Loop %for.body: Unpredictable backedge-taken count			; CHECK: Loop %for.body: backedge-taken count is %n
	; CHECK: Loop %for.body: Unpredictable max backedge-taken count			; CHECK: Loop %for.body: max backedge-taken count is -1
	define void @nw_implies_nuw(i16 %n) mustprogress {			define void @nw_implies_nuw(i16 %n) mustprogress {
	entry:			entry:
	br label %for.body			br label %for.body

	for.body: ; preds = %entry, %for.body			for.body: ; preds = %entry, %for.body
	%iv = phi i8 [ %iv.next, %for.body ], [ 0, %entry ]			%iv = phi i8 [ %iv.next, %for.body ], [ 0, %entry ]
	%iv.next = add i8 %iv, 1			%iv.next = add i8 %iv, 1
	%zext = zext i8 %iv to i16			%zext = zext i8 %iv to i16
	▲ Show 20 Lines • Show All 83 Lines • Show Last 20 Lines