This is an archive of the discontinued LLVM Phabricator instance.

Paths

Table of Contentst

-
llvm/
-
include/llvm/Analysis/
-
llvm/
-
Analysis/
-
ScalarEvolution.h
-
lib/Analysis/
-
Analysis/
-
ScalarEvolution.cpp
-
test/
-
Analysis/ScalarEvolution/
-
ScalarEvolution/
-
max-backedge-taken-count-guard-info.ll
-
Transforms/LoopVectorize/AArch64/
-
LoopVectorize/
-
AArch64/
-
pr36032.ll
-
unittests/Analysis/
-
Analysis/
-
ScalarEvolutionTest.cpp

Differential D67178

[SCEV] Use loop guard info when computing the max BE taken count in howFarToZero.
ClosedPublic

Authored by fhahn on Sep 4 2019, 8:14 AM.

Download Raw Diff

Details

Reviewers

sanjoy.google
reames
efriedma

Commits

rGd4ddf63fc40c: [SCEV] Use loop guard info when computing the max BE taken count in…

Summary

For some expressions, we can use information from loop guards when
we are looking for a maximum. This patch applies information from
loop guards to the expression used to compute the maximum backedge
taken count in howFarToZero. It currently replaces an unknown
expression X with UMin(X, Y), if the loop is guarded by
X ult Y.

This patch is minimal in what conditions it applies, and there
are a few TODOs to generalize.

This partly addresses PR40961. We will also need an update to
LV to address it completely.

Diff Detail

Repository: rG LLVM Github Monorepo

Event Timeline

fhahn created this revision.Sep 4 2019, 8:14 AM

Herald added a project: Restricted Project. · View Herald TranscriptSep 4 2019, 8:14 AM

Herald added subscribers: hiraditya, javed.absar. · View Herald Transcript

Harbormaster completed remote builds in B37731: Diff 218713.Sep 4 2019, 8:14 AM

fhahn added parent revisions: D67177: [SCEV] Support SCEVUMinExpr in getRangeRef., D67176: [SCEV] Generalize SCEVParameterRewriter to accept SCEV expression as target..Sep 4 2019, 8:15 AM

Instead of writing a C++ unittest, you should be able to use "opt -analyze -scalar-evolution" to test this.

Not sure I like the approach; rewriting the SCEV to a more complicated expression seems like it's stacking complexity. But it might be the least code overall, I guess.

In D67178#1658197, @efriedma wrote:

Instead of writing a C++ unittest, you should be able to use "opt -analyze -scalar-evolution" to test this.

Sure, I'll do that. The current version of the unit test only tests stuff covered by -analyze -scalar-evolution.

Not sure I like the approach; rewriting the SCEV to a more complicated expression seems like it's stacking complexity. But it might be the least code overall, I guess.

Yep, the re-writing the expression was the least invasive change. But I would be happy to change the implementation to use additional conditions differently. To cover the motivating case for the patch, it would be enough to collect the additional conditions, pass them to getRangeRef and use them there. But I think the additional conditions could be helpful in the various reasoning functions as well. Currently we can pass one additional condition to most reasoning functions, but here it would be helpful to pass multiple conditions. I am happy with either direction, please let me know what you prefer :)

In terms of general API, I don't think we want to expose "applyLoopGuards"; the SCEV transform proposed here isn't really useful outside of trying to find the minimum or maximum, as far as I can tell. Which min/max expressions we want to form depends on whether we're computing a "max" or a "min". And restricting the API so the point in the CFG we're querying has to be a loop header doesn't seem helpful; other places might care about values after a loop etc.

In terms of the implementation, this composes well, in a sense: it annotates the relevant SCEV expressions with the relevant conditions, then uses the general implementation that ignores control flow. But I'm not sure how it scales for larger SCEV expressions; keeping a map on the side seems like it would have more predictable performance.

I don't really have a general vision for what the range computation implementation should look like in the future: what sources of information are practical to use? What overall data structure do we use to integrate them?

In D67178#1658602, @efriedma wrote:

In terms of general API, I don't think we want to expose "applyLoopGuards"; the SCEV transform proposed here isn't really useful outside of trying to find the minimum or maximum, as far as I can tell. Which min/max expressions we want to form depends on whether we're computing a "max" or a "min". And restricting the API so the point in the CFG we're querying has to be a loop header doesn't seem helpful; other places might care about values after a loop etc.

In terms of the implementation, this composes well, in a sense: it annotates the relevant SCEV expressions with the relevant conditions, then uses the general implementation that ignores control flow. But I'm not sure how it scales for larger SCEV expressions; keeping a map on the side seems like it would have more predictable performance.

Yeah, duplicating a whole expression just because we want to attach additional range information could lead to poor performance for large expressions. I'll look into the map approach.

I don't really have a general vision for what the range computation implementation should look like in the future: what sources of information are practical to use? What overall data structure do we use to integrate them?

Do you mean in general or in for SCEV?

Do you mean in general or in for SCEV?

Well, I guess there's a question in general, but here I'm more concerned specifically about SCEV: what other analysis can it use, what sort of data structures can we use for caching. But maybe I'm looking too many steps ahead; I don't want to get in the way of an incremental improvement.

fhahn mentioned this in D67176: [SCEV] Generalize SCEVParameterRewriter to accept SCEV expression as target..Sep 11 2019, 9:03 AM

fhahn mentioned this in D67514: [SCEV] Add smin/umin support to getRangeRef.Sep 12 2019, 1:07 PM

SCEV already has support for isLoopEntryGuardedByCond which should be a super set of the added code. Have you explored why the distance expression isn't be canonicalized at construction?

(There are some circularity issues between exit counts and scev canonical forms, so that might be your issue. Or it might be something easy to fix. We can always hope.)

In D67178#1670316, @reames wrote:

SCEV already has support for isLoopEntryGuardedByCond which should be a super set of the added code.

Agreed, I think the final version of the patch should definitely share the guard discovery logic with isLoopEntryGuardedByCond.

Have you explored why the distance expression isn't be canonicalized at construction?

Do you mean why the information from the loop guard is not included in the distance expression when we create it?
IIUC, the distance expression here is the subtraction of RHS and LHS of the exit condition. As Eli indicated, including information from the guards in the expressions for the operands of the condition might increase the complexity, without much gain, besides using it to get more precise upper bounds on ranges. Do you think there would be additional benefits for including information from the guards in the distance expression, beside better range information? AFAIK, some of the reasoning methods themselves try to rewrite/simplify expressions based on information from loop guards and we might be able to skip some of those rewrites.

Re-ping

@efriedma @reames

Finally had some time to get back to this one. Rebased and also added SCEV test for PR40961. I will also post a follow-up that extends this to also support assumes to catch PR47247.

Sorry for the long delay! There is still bits that can be improved (e.g. sharing code with isLoopEntryGuardedByCond), but I'd like to converge on the general approach before focusing on that.

In D67178#1670316, @reames wrote:

SCEV already has support for isLoopEntryGuardedByCond which should be a super set of the added code.

Yes, this code here is similar to isLoopEntryGuardedByCond in that it looks at the same information (conditions dominating the loop body). But it is also different conceptually I think. isLoopEntryGuardedByCond is useful, if we known which question to ask, e.g. we could use the loop guards to check if an expressions is less than a constant (but we need to know which specific constant/condition to check).

Unfortunately I do not think this is very useful to clamp the range of the expressions at hand here, because it would require both explicitly checking what information is available through guards, as well as checking how it is used in the expression.

The approach of applyLoopGuards is different: the idea is to re-write an expression with information from the loop guards and use the existing reasoning to try to use the additional info to simplify the overall expression. This approach seems quite effective and allows for all the existing simplification logic to be completely reused.

Have you explored why the distance expression isn't be canonicalized at construction?

(There are some circularity issues between exit counts and scev canonical forms, so that might be your issue. Or it might be something easy to fix. We can always hope.)

I thought about this one a bit more. I am not sure any general canonicalization is missing here, because the information we apply here is very context specific: we can use information from the guards, because we construct an expression that is only valid in the loop body. And the backedge taken count is a artificial expression we are building up specifically in howFarToZero (and it's sibling functions). Unless I am missing something, I don't think there is any general canonicalization that is missing here.

Harbormaster completed remote builds in B72067: Diff 292596.Sep 17 2020, 12:55 PM

fhahn added a child revision: D87854: [SCEV] Also use info from assumes in applyLoopGuards..Sep 17 2020, 12:57 PM

LGTM. This makes a nice starting point - as you note, we can definitely extend this.

This revision is now accepted and ready to land.Sep 17 2020, 1:55 PM

In D67178#2280288, @reames wrote:

LGTM. This makes a nice starting point - as you note, we can definitely extend this.

Thank you very much for taking a look! I made some smaller changes to fix a few edge-cases and extended the test coverage in 3cbdfe424fec.

I plan to land this in a day or two.

Harbormaster completed remote builds in B72403: Diff 293210.Sep 21 2020, 10:14 AM

Another rebase just before committing.

Harbormaster completed remote builds in B72789: Diff 293991.Sep 24 2020, 3:02 AM

This revision was landed with ongoing or failed builds.Sep 24 2020, 3:14 AM

Closed by commit rGd4ddf63fc40c: [SCEV] Use loop guard info when computing the max BE taken count in… (authored by fhahn). · Explain Why

This revision was automatically updated to reflect the committed changes.

fhahn added a commit: rGd4ddf63fc40c: [SCEV] Use loop guard info when computing the max BE taken count in….

Revision Contents

Path

Size

llvm/

include/

llvm/

Analysis/

ScalarEvolution.h

3 lines

lib/

Analysis/

ScalarEvolution.cpp

55 lines

test/

Analysis/

ScalarEvolution/

max-backedge-taken-count-guard-info.ll

8 lines

Transforms/

LoopVectorize/

AArch64/

pr36032.ll

77 lines

unittests/

Analysis/

ScalarEvolutionTest.cpp

65 lines

Diff 293998

llvm/include/llvm/Analysis/ScalarEvolution.h

Show First 20 Lines • Show All 1,894 Lines • ▼ Show 20 Lines	private:
/// Find all of the loops transitively used in \p S, and update \c LoopUsers		/// Find all of the loops transitively used in \p S, and update \c LoopUsers
/// accordingly.		/// accordingly.
void addToLoopUseLists(const SCEV *S);		void addToLoopUseLists(const SCEV *S);

/// Try to match the pattern generated by getURemExpr(A, B). If successful,		/// Try to match the pattern generated by getURemExpr(A, B). If successful,
/// Assign A and B to LHS and RHS, respectively.		/// Assign A and B to LHS and RHS, respectively.
bool matchURem(const SCEV Expr, const SCEV &LHS, const SCEV *&RHS);		bool matchURem(const SCEV Expr, const SCEV &LHS, const SCEV *&RHS);

		/// Try to apply information from loop guards for \p L to \p Expr.
		const SCEV applyLoopGuards(const SCEV Expr, const Loop *L);

/// Look for a SCEV expression with type `SCEVType` and operands `Ops` in		/// Look for a SCEV expression with type `SCEVType` and operands `Ops` in
/// `UniqueSCEVs`.		/// `UniqueSCEVs`.
///		///
/// The first component of the returned tuple is the SCEV if found and null		/// The first component of the returned tuple is the SCEV if found and null
/// otherwise. The second component is the `FoldingSetNodeID` that was		/// otherwise. The second component is the `FoldingSetNodeID` that was
/// constructed to look up the SCEV and the third component is the insertion		/// constructed to look up the SCEV and the third component is the insertion
/// point.		/// point.
std::tuple<SCEV , FoldingSetNodeID, void >		std::tuple<SCEV , FoldingSetNodeID, void >
▲ Show 20 Lines • Show All 172 Lines • Show Last 20 Lines

llvm/lib/Analysis/ScalarEvolution.cpp

This file is larger than 256 KB, so syntax highlighting is disabled by default.

Show First 20 Lines • Show All 8,685 Lines • ▼ Show 20 Lines	ScalarEvolution::howFarToZero(const SCEV V, const Loop L, bool ControlsExit,
// First compute the unsigned distance from zero in the direction of Step.		// First compute the unsigned distance from zero in the direction of Step.
bool CountDown = StepC->getAPInt().isNegative();		bool CountDown = StepC->getAPInt().isNegative();
const SCEV *Distance = CountDown ? Start : getNegativeSCEV(Start);		const SCEV *Distance = CountDown ? Start : getNegativeSCEV(Start);

// Handle unitary steps, which cannot wraparound.		// Handle unitary steps, which cannot wraparound.
// 1N = -Start; -1N = Start (mod 2^BW), so:		// 1N = -Start; -1N = Start (mod 2^BW), so:
// N = Distance (as unsigned)		// N = Distance (as unsigned)
if (StepC->getValue()->isOne() \|\| StepC->getValue()->isMinusOne()) {		if (StepC->getValue()->isOne() \|\| StepC->getValue()->isMinusOne()) {
APInt MaxBECount = getUnsignedRangeMax(Distance);		APInt MaxBECount = getUnsignedRangeMax(applyLoopGuards(Distance, L));
		APInt MaxBECountBase = getUnsignedRangeMax(Distance);
		if (MaxBECountBase.ult(MaxBECount))
		MaxBECount = MaxBECountBase;

// When a loop like "for (int i = 0; i != n; ++i) { /* body */ }" is rotated,		// When a loop like "for (int i = 0; i != n; ++i) { /* body */ }" is rotated,
// we end up with a loop whose backedge-taken count is n - 1. Detect this		// we end up with a loop whose backedge-taken count is n - 1. Detect this
// case, and see if we can improve the bound.		// case, and see if we can improve the bound.
//		//
// Explicitly handling this here is necessary because getUnsignedRange		// Explicitly handling this here is necessary because getUnsignedRange
// isn't context-sensitive; it doesn't know that we only care about the		// isn't context-sensitive; it doesn't know that we only care about the
// range inside the loop.		// range inside the loop.
▲ Show 20 Lines • Show All 3,878 Lines • ▼ Show 20 Lines	if (!isa<SCEVCouldNotCompute>(ExitCount)) {
"dominate latch!");		"dominate latch!");
ExitCounts.push_back(ExitCount);		ExitCounts.push_back(ExitCount);
}		}
}		}
if (ExitCounts.empty())		if (ExitCounts.empty())
return getCouldNotCompute();		return getCouldNotCompute();
return getUMinFromMismatchedTypes(ExitCounts);		return getUMinFromMismatchedTypes(ExitCounts);
}		}

		const SCEV ScalarEvolution::applyLoopGuards(const SCEV Expr, const Loop *L) {
		// Starting at the loop predecessor, climb up the predecessor chain, as long
		// as there are predecessors that can be found that have unique successors
		// leading to the original header.
		// TODO: share this logic with isLoopEntryGuardedByCond.
		ValueToSCEVMapTy RewriteMap;
		for (std::pair<const BasicBlock , const BasicBlock > Pair(
		L->getLoopPredecessor(), L->getHeader());
		Pair.first; Pair = getPredecessorWithUniqueSuccessorForBB(Pair.first)) {

		const BranchInst *LoopEntryPredicate =
		dyn_cast<BranchInst>(Pair.first->getTerminator());
		if (!LoopEntryPredicate \|\| LoopEntryPredicate->isUnconditional())
		continue;

		// TODO: use information from more complex conditions, e.g. AND expressions.
		auto *Cmp = dyn_cast<ICmpInst>(LoopEntryPredicate->getCondition());
		if (!Cmp)
		continue;

		auto Predicate = Cmp->getPredicate();
		if (LoopEntryPredicate->getSuccessor(1) == Pair.second)
		Predicate = CmpInst::getInversePredicate(Predicate);
		// TODO: use information from more predicates.
		switch (Predicate) {
		case CmpInst::ICMP_ULT: {
		const SCEV *LHS = getSCEV(Cmp->getOperand(0));
		const SCEV *RHS = getSCEV(Cmp->getOperand(1));
		if (isa<SCEVUnknown>(LHS) && !isa<UndefValue>(Cmp->getOperand(0)) &&
		!containsAddRecurrence(RHS)) {
		const SCEV *Base = LHS;
		auto I = RewriteMap.find(Cmp->getOperand(0));
		if (I != RewriteMap.end())
		Base = I->second;

		RewriteMap[Cmp->getOperand(0)] =
		getUMinExpr(Base, getMinusSCEV(RHS, getOne(RHS->getType())));
		}
		break;
		}
		default:
		break;
		}
		}

		if (RewriteMap.empty())
		return Expr;
		return SCEVParameterRewriter::rewrite(Expr, *this, RewriteMap);
		}

llvm/test/Analysis/ScalarEvolution/max-backedge-taken-count-guard-info.ll

	; RUN: opt -analyze -scalar-evolution %s -enable-new-pm=0 \| FileCheck %s			; RUN: opt -analyze -scalar-evolution %s -enable-new-pm=0 \| FileCheck %s
	; RUN: opt -passes='print<scalar-evolution>' -disable-output %s 2>&1 \| FileCheck %s			; RUN: opt -passes='print<scalar-evolution>' -disable-output %s 2>&1 \| FileCheck %s

	; Test case for PR40961. The loop guard limit the max backedge-taken count.			; Test case for PR40961. The loop guard limit the max backedge-taken count.

	define void @test_guard_less_than_16(i32* nocapture %a, i64 %i) {			define void @test_guard_less_than_16(i32* nocapture %a, i64 %i) {
	; CHECK-LABEL: Determining loop execution counts for: @test_guard_less_than_16			; CHECK-LABEL: Determining loop execution counts for: @test_guard_less_than_16
	; CHECK-NEXT: Loop %loop: backedge-taken count is (15 + (-1 * %i))			; CHECK-NEXT: Loop %loop: backedge-taken count is (15 + (-1 * %i))
	; CHECK-NEXT: Loop %loop: max backedge-taken count is -1			; CHECK-NEXT: Loop %loop: max backedge-taken count is 15
	; CHECK-NEXT: Loop %loop: Predicated backedge-taken count is (15 + (-1 * %i))			; CHECK-NEXT: Loop %loop: Predicated backedge-taken count is (15 + (-1 * %i))
	;			;
	entry:			entry:
	%cmp3 = icmp ult i64 %i, 16			%cmp3 = icmp ult i64 %i, 16
	br i1 %cmp3, label %loop, label %exit			br i1 %cmp3, label %loop, label %exit

	loop:			loop:
	%iv = phi i64 [ %iv.next, %loop ], [ %i, %entry ]			%iv = phi i64 [ %iv.next, %loop ], [ %i, %entry ]
	Show All 27 Lines

	exit:			exit:
	ret void			ret void
	}			}

	define void @test_guard_uge_16_branches_flipped(i32* nocapture %a, i64 %i) {			define void @test_guard_uge_16_branches_flipped(i32* nocapture %a, i64 %i) {
	; CHECK-LABEL: Determining loop execution counts for: @test_guard_uge_16_branches_flipped			; CHECK-LABEL: Determining loop execution counts for: @test_guard_uge_16_branches_flipped
	; CHECK-NEXT: Loop %loop: backedge-taken count is (15 + (-1 * %i))			; CHECK-NEXT: Loop %loop: backedge-taken count is (15 + (-1 * %i))
	; CHECK-NEXT: Loop %loop: max backedge-taken count is -1			; CHECK-NEXT: Loop %loop: max backedge-taken count is 15
	; CHECK-NEXT: Loop %loop: Predicated backedge-taken count is (15 + (-1 * %i))			; CHECK-NEXT: Loop %loop: Predicated backedge-taken count is (15 + (-1 * %i))
	;			;
	entry:			entry:
	%cmp3 = icmp uge i64 %i, 16			%cmp3 = icmp uge i64 %i, 16
	br i1 %cmp3, label %exit, label %loop			br i1 %cmp3, label %exit, label %loop

	loop:			loop:
	%iv = phi i64 [ %iv.next, %loop ], [ %i, %entry ]			%iv = phi i64 [ %iv.next, %loop ], [ %i, %entry ]
	%idx = getelementptr inbounds i32, i32* %a, i64 %iv			%idx = getelementptr inbounds i32, i32* %a, i64 %iv
	store i32 1, i32* %idx, align 4			store i32 1, i32* %idx, align 4
	%iv.next = add nuw nsw i64 %iv, 1			%iv.next = add nuw nsw i64 %iv, 1
	%exitcond = icmp eq i64 %iv.next, 16			%exitcond = icmp eq i64 %iv.next, 16
	br i1 %exitcond, label %exit, label %loop			br i1 %exitcond, label %exit, label %loop

	exit:			exit:
	ret void			ret void
	}			}

	define void @test_multiple_const_guards_order1(i32* nocapture %a, i64 %i) {			define void @test_multiple_const_guards_order1(i32* nocapture %a, i64 %i) {
	; CHECK-LABEL: @test_multiple_const_guards_order1			; CHECK-LABEL: @test_multiple_const_guards_order1
	; CHECK: Loop %loop: backedge-taken count is %i			; CHECK: Loop %loop: backedge-taken count is %i
	; CHECK-NEXT: Loop %loop: max backedge-taken count is -1			; CHECK-NEXT: Loop %loop: max backedge-taken count is 9
	; CHECK-NEXT: Loop %loop: Predicated backedge-taken count is %i			; CHECK-NEXT: Loop %loop: Predicated backedge-taken count is %i
	;			;
	entry:			entry:
	%c.1 = icmp ult i64 %i, 16			%c.1 = icmp ult i64 %i, 16
	br i1 %c.1, label %guardbb, label %exit			br i1 %c.1, label %guardbb, label %exit

	guardbb:			guardbb:
	%c.2 = icmp ult i64 %i, 10			%c.2 = icmp ult i64 %i, 10
	Show All 9 Lines

	exit:			exit:
	ret void			ret void
	}			}

	define void @test_multiple_const_guards_order2(i32* nocapture %a, i64 %i) {			define void @test_multiple_const_guards_order2(i32* nocapture %a, i64 %i) {
	; CHECK-LABEL: @test_multiple_const_guards_order2			; CHECK-LABEL: @test_multiple_const_guards_order2
	; CHECK: Loop %loop: backedge-taken count is %i			; CHECK: Loop %loop: backedge-taken count is %i
	; CHECK-NEXT: Loop %loop: max backedge-taken count is -1			; CHECK-NEXT: Loop %loop: max backedge-taken count is 9
	; CHECK-NEXT: Loop %loop: Predicated backedge-taken count is %i			; CHECK-NEXT: Loop %loop: Predicated backedge-taken count is %i
	;			;
	entry:			entry:
	%c.1 = icmp ult i64 %i, 10			%c.1 = icmp ult i64 %i, 10
	br i1 %c.1, label %guardbb, label %exit			br i1 %c.1, label %guardbb, label %exit

	guardbb:			guardbb:
	%c.2 = icmp ult i64 %i, 16			%c.2 = icmp ult i64 %i, 16
	▲ Show 20 Lines • Show All 189 Lines • Show Last 20 Lines

llvm/test/Transforms/LoopVectorize/AArch64/pr36032.ll

	Show All 9 Lines
	@c = local_unnamed_addr global [6 x i8] zeroinitializer, align 1			@c = local_unnamed_addr global [6 x i8] zeroinitializer, align 1
	@b = internal global %struct.anon zeroinitializer, align 1			@b = internal global %struct.anon zeroinitializer, align 1

	; Function Attrs: noreturn nounwind			; Function Attrs: noreturn nounwind
	define void @_Z1dv() local_unnamed_addr #0 {			define void @_Z1dv() local_unnamed_addr #0 {
	; CHECK-LABEL: @_Z1dv(			; CHECK-LABEL: @_Z1dv(
	; CHECK-NEXT: entry:			; CHECK-NEXT: entry:
	; CHECK-NEXT: [[CALL:%.]] = tail call i8 @"_ZN3$_01aEv"(%struct.anon* nonnull @b)			; CHECK-NEXT: [[CALL:%.]] = tail call i8 @"_ZN3$_01aEv"(%struct.anon* nonnull @b)
	; CHECK-NEXT: [[SCEVGEP1:%.]] = getelementptr i8, i8 [[CALL]], i64 4
	; CHECK-NEXT: br label [[FOR_COND:%.*]]			; CHECK-NEXT: br label [[FOR_COND:%.*]]
	; CHECK: for.cond:			; CHECK: for.cond:
	; CHECK-NEXT: [[F_0:%.]] = phi i32 [ 0, [[ENTRY:%.]] ], [ [[ADD5:%.]], [[FOR_COND_CLEANUP:%.]] ]			; CHECK-NEXT: [[F_0:%.]] = phi i32 [ 0, [[ENTRY:%.]] ], [ [[ADD5:%.]], [[FOR_COND_CLEANUP:%.]] ]
	; CHECK-NEXT: [[G_0:%.]] = phi i32 [ undef, [[ENTRY]] ], [ [[G_1_LCSSA:%.]], [[FOR_COND_CLEANUP]] ]			; CHECK-NEXT: [[G_0:%.]] = phi i32 [ undef, [[ENTRY]] ], [ [[G_1_LCSSA:%.]], [[FOR_COND_CLEANUP]] ]
	; CHECK-NEXT: [[CMP12:%.*]] = icmp ult i32 [[G_0]], 4			; CHECK-NEXT: [[CMP12:%.*]] = icmp ult i32 [[G_0]], 4
	; CHECK-NEXT: [[CONV:%.*]] = and i32 [[F_0]], 65535			; CHECK-NEXT: [[CONV:%.*]] = and i32 [[F_0]], 65535
	; CHECK-NEXT: br i1 [[CMP12]], label [[FOR_BODY_LR_PH:%.*]], label [[FOR_COND_CLEANUP]]			; CHECK-NEXT: br i1 [[CMP12]], label [[FOR_BODY_LR_PH:%.*]], label [[FOR_COND_CLEANUP]]
	; CHECK: for.body.lr.ph:			; CHECK: for.body.lr.ph:
	; CHECK-NEXT: [[TMP0:%.*]] = zext i32 [[G_0]] to i64			; CHECK-NEXT: [[TMP0:%.*]] = zext i32 [[G_0]] to i64
	; CHECK-NEXT: [[TMP1:%.*]] = sub i64 4, [[TMP0]]
	; CHECK-NEXT: [[MIN_ITERS_CHECK:%.*]] = icmp ult i64 [[TMP1]], 4
	; CHECK-NEXT: br i1 [[MIN_ITERS_CHECK]], label [[SCALAR_PH:%.]], label [[VECTOR_SCEVCHECK:%.]]
	; CHECK: vector.scevcheck:
	; CHECK-NEXT: [[TMP2:%.*]] = sub i64 3, [[TMP0]]
	; CHECK-NEXT: [[TMP3:%.*]] = add i32 [[G_0]], [[CONV]]
	; CHECK-NEXT: [[TMP4:%.*]] = trunc i64 [[TMP2]] to i32
	; CHECK-NEXT: [[MUL:%.*]] = call { i32, i1 } @llvm.umul.with.overflow.i32(i32 1, i32 [[TMP4]])
	; CHECK-NEXT: [[MUL_RESULT:%.*]] = extractvalue { i32, i1 } [[MUL]], 0
	; CHECK-NEXT: [[MUL_OVERFLOW:%.*]] = extractvalue { i32, i1 } [[MUL]], 1
	; CHECK-NEXT: [[TMP5:%.*]] = add i32 [[TMP3]], [[MUL_RESULT]]
	; CHECK-NEXT: [[TMP6:%.*]] = sub i32 [[TMP3]], [[MUL_RESULT]]
	; CHECK-NEXT: [[TMP7:%.*]] = icmp ugt i32 [[TMP6]], [[TMP3]]
	; CHECK-NEXT: [[TMP8:%.*]] = icmp ult i32 [[TMP5]], [[TMP3]]
	; CHECK-NEXT: [[TMP9:%.*]] = select i1 false, i1 [[TMP7]], i1 [[TMP8]]
	; CHECK-NEXT: [[TMP10:%.*]] = icmp ugt i64 [[TMP2]], 4294967295
	; CHECK-NEXT: [[TMP11:%.*]] = or i1 [[TMP9]], [[TMP10]]
	; CHECK-NEXT: [[TMP12:%.*]] = or i1 [[TMP11]], [[MUL_OVERFLOW]]
	; CHECK-NEXT: [[TMP13:%.*]] = or i1 false, [[TMP12]]
	; CHECK-NEXT: br i1 [[TMP13]], label [[SCALAR_PH]], label [[VECTOR_MEMCHECK:%.*]]
	; CHECK: vector.memcheck:
	; CHECK-NEXT: [[SCEVGEP:%.]] = getelementptr i8, i8 [[CALL]], i64 [[TMP0]]
	; CHECK-NEXT: [[TMP14:%.*]] = add i32 [[G_0]], [[CONV]]
	; CHECK-NEXT: [[TMP15:%.*]] = zext i32 [[TMP14]] to i64
	; CHECK-NEXT: [[SCEVGEP2:%.]] = getelementptr [6 x i8], [6 x i8] @c, i64 0, i64 [[TMP15]]
	; CHECK-NEXT: [[TMP16:%.*]] = sub i64 [[TMP15]], [[TMP0]]
	; CHECK-NEXT: [[SCEVGEP3:%.]] = getelementptr i8, i8 getelementptr inbounds ([6 x i8], [6 x i8]* @c, i64 0, i64 4), i64 [[TMP16]]
	; CHECK-NEXT: [[BOUND0:%.]] = icmp ult i8 [[SCEVGEP]], [[SCEVGEP3]]
	; CHECK-NEXT: [[BOUND1:%.]] = icmp ult i8 [[SCEVGEP2]], [[SCEVGEP1]]
	; CHECK-NEXT: [[FOUND_CONFLICT:%.*]] = and i1 [[BOUND0]], [[BOUND1]]
	; CHECK-NEXT: [[MEMCHECK_CONFLICT:%.*]] = and i1 [[FOUND_CONFLICT]], true
	; CHECK-NEXT: br i1 [[MEMCHECK_CONFLICT]], label [[SCALAR_PH]], label [[VECTOR_PH:%.*]]
	; CHECK: vector.ph:
	; CHECK-NEXT: [[N_MOD_VF:%.*]] = urem i64 [[TMP1]], 4
	; CHECK-NEXT: [[N_VEC:%.*]] = sub i64 [[TMP1]], [[N_MOD_VF]]
	; CHECK-NEXT: [[IND_END:%.*]] = add i64 [[TMP0]], [[N_VEC]]
	; CHECK-NEXT: br label [[VECTOR_BODY:%.*]]
	; CHECK: vector.body:
	; CHECK-NEXT: [[INDEX:%.]] = phi i64 [ 0, [[VECTOR_PH]] ], [ [[INDEX_NEXT:%.]], [[VECTOR_BODY]] ]
	; CHECK-NEXT: [[OFFSET_IDX:%.*]] = add i64 [[TMP0]], [[INDEX]]
	; CHECK-NEXT: [[TMP17:%.*]] = add i64 [[OFFSET_IDX]], 0
	; CHECK-NEXT: [[OFFSET_IDX4:%.*]] = add i64 [[TMP0]], [[INDEX]]
	; CHECK-NEXT: [[TMP18:%.*]] = trunc i64 [[OFFSET_IDX4]] to i32
	; CHECK-NEXT: [[TMP19:%.*]] = add i32 [[TMP18]], 0
	; CHECK-NEXT: [[TMP20:%.*]] = add i32 [[CONV]], [[TMP19]]
	; CHECK-NEXT: [[TMP21:%.*]] = zext i32 [[TMP20]] to i64
	; CHECK-NEXT: [[TMP22:%.]] = getelementptr inbounds [6 x i8], [6 x i8] @c, i64 0, i64 [[TMP21]]
	; CHECK-NEXT: [[TMP23:%.]] = getelementptr inbounds i8, i8 [[TMP22]], i32 0
	; CHECK-NEXT: [[TMP24:%.]] = bitcast i8 [[TMP23]] to <4 x i8>*
	; CHECK-NEXT: [[WIDE_LOAD:%.]] = load <4 x i8>, <4 x i8> [[TMP24]], align 1, !alias.scope !0
	; CHECK-NEXT: [[TMP25:%.]] = getelementptr inbounds i8, i8 [[CALL]], i64 [[TMP17]]
	; CHECK-NEXT: [[TMP26:%.]] = getelementptr inbounds i8, i8 [[TMP25]], i32 0
	; CHECK-NEXT: [[TMP27:%.]] = bitcast i8 [[TMP26]] to <4 x i8>*
	; CHECK-NEXT: store <4 x i8> [[WIDE_LOAD]], <4 x i8>* [[TMP27]], align 1, !alias.scope !3, !noalias !0
	; CHECK-NEXT: [[INDEX_NEXT]] = add i64 [[INDEX]], 4
	; CHECK-NEXT: [[TMP28:%.*]] = icmp eq i64 [[INDEX_NEXT]], [[N_VEC]]
	; CHECK-NEXT: br i1 [[TMP28]], label [[MIDDLE_BLOCK:%.*]], label [[VECTOR_BODY]], !llvm.loop !5
	; CHECK: middle.block:
	; CHECK-NEXT: [[CMP_N:%.*]] = icmp eq i64 [[TMP1]], [[N_VEC]]
	; CHECK-NEXT: br i1 [[CMP_N]], label [[FOR_COND_CLEANUP_LOOPEXIT:%.*]], label [[SCALAR_PH]]
	; CHECK: scalar.ph:
	; CHECK-NEXT: [[BC_RESUME_VAL:%.*]] = phi i64 [ [[IND_END]], [[MIDDLE_BLOCK]] ], [ [[TMP0]], [[FOR_BODY_LR_PH]] ], [ [[TMP0]], [[VECTOR_SCEVCHECK]] ], [ [[TMP0]], [[VECTOR_MEMCHECK]] ]
	; CHECK-NEXT: br label [[FOR_BODY:%.*]]			; CHECK-NEXT: br label [[FOR_BODY:%.*]]
	; CHECK: for.cond.cleanup.loopexit:			; CHECK: for.cond.cleanup.loopexit:
	; CHECK-NEXT: br label [[FOR_COND_CLEANUP]]			; CHECK-NEXT: br label [[FOR_COND_CLEANUP]]
	; CHECK: for.cond.cleanup:			; CHECK: for.cond.cleanup:
	; CHECK-NEXT: [[G_1_LCSSA]] = phi i32 [ [[G_0]], [[FOR_COND]] ], [ 4, [[FOR_COND_CLEANUP_LOOPEXIT]] ]			; CHECK-NEXT: [[G_1_LCSSA]] = phi i32 [ [[G_0]], [[FOR_COND]] ], [ 4, [[FOR_COND_CLEANUP_LOOPEXIT:%.*]] ]
	; CHECK-NEXT: [[ADD5]] = add nuw nsw i32 [[CONV]], 4			; CHECK-NEXT: [[ADD5]] = add nuw nsw i32 [[CONV]], 4
	; CHECK-NEXT: br label [[FOR_COND]]			; CHECK-NEXT: br label [[FOR_COND]]
	; CHECK: for.body:			; CHECK: for.body:
	; CHECK-NEXT: [[INDVARS_IV:%.]] = phi i64 [ [[BC_RESUME_VAL]], [[SCALAR_PH]] ], [ [[INDVARS_IV_NEXT:%.]], [[FOR_BODY]] ]			; CHECK-NEXT: [[INDVARS_IV:%.]] = phi i64 [ [[TMP0]], [[FOR_BODY_LR_PH]] ], [ [[INDVARS_IV_NEXT:%.]], [[FOR_BODY]] ]
	; CHECK-NEXT: [[TMP29:%.*]] = trunc i64 [[INDVARS_IV]] to i32			; CHECK-NEXT: [[TMP1:%.*]] = trunc i64 [[INDVARS_IV]] to i32
	; CHECK-NEXT: [[ADD:%.*]] = add i32 [[CONV]], [[TMP29]]			; CHECK-NEXT: [[ADD:%.*]] = add i32 [[CONV]], [[TMP1]]
	; CHECK-NEXT: [[IDXPROM:%.*]] = zext i32 [[ADD]] to i64			; CHECK-NEXT: [[IDXPROM:%.*]] = zext i32 [[ADD]] to i64
	; CHECK-NEXT: [[ARRAYIDX:%.]] = getelementptr inbounds [6 x i8], [6 x i8] @c, i64 0, i64 [[IDXPROM]]			; CHECK-NEXT: [[ARRAYIDX:%.]] = getelementptr inbounds [6 x i8], [6 x i8] @c, i64 0, i64 [[IDXPROM]]
	; CHECK-NEXT: [[TMP30:%.]] = load i8, i8 [[ARRAYIDX]], align 1			; CHECK-NEXT: [[TMP2:%.]] = load i8, i8 [[ARRAYIDX]], align 1
	; CHECK-NEXT: [[ARRAYIDX3:%.]] = getelementptr inbounds i8, i8 [[CALL]], i64 [[INDVARS_IV]]			; CHECK-NEXT: [[ARRAYIDX3:%.]] = getelementptr inbounds i8, i8 [[CALL]], i64 [[INDVARS_IV]]
	; CHECK-NEXT: store i8 [[TMP30]], i8* [[ARRAYIDX3]], align 1			; CHECK-NEXT: store i8 [[TMP2]], i8* [[ARRAYIDX3]], align 1
	; CHECK-NEXT: [[INDVARS_IV_NEXT]] = add nuw nsw i64 [[INDVARS_IV]], 1			; CHECK-NEXT: [[INDVARS_IV_NEXT]] = add nuw nsw i64 [[INDVARS_IV]], 1
	; CHECK-NEXT: [[EXITCOND:%.*]] = icmp eq i64 [[INDVARS_IV_NEXT]], 4			; CHECK-NEXT: [[EXITCOND:%.*]] = icmp eq i64 [[INDVARS_IV_NEXT]], 4
	; CHECK-NEXT: br i1 [[EXITCOND]], label [[FOR_COND_CLEANUP_LOOPEXIT]], label [[FOR_BODY]], !llvm.loop !7			; CHECK-NEXT: br i1 [[EXITCOND]], label [[FOR_COND_CLEANUP_LOOPEXIT]], label [[FOR_BODY]]
	;			;
	entry:			entry:
	%call = tail call i8* @"_ZN3$_01aEv"(%struct.anon* nonnull @b) #2			%call = tail call i8* @"_ZN3$_01aEv"(%struct.anon* nonnull @b) #2
	br label %for.cond			br label %for.cond

	for.cond: ; preds = %for.cond.cleanup, %entry			for.cond: ; preds = %for.cond.cleanup, %entry
	%f.0 = phi i32 [ 0, %entry ], [ %add5, %for.cond.cleanup ]			%f.0 = phi i32 [ 0, %entry ], [ %add5, %for.cond.cleanup ]
	%g.0 = phi i32 [ undef, %entry ], [ %g.1.lcssa, %for.cond.cleanup ]			%g.0 = phi i32 [ undef, %entry ], [ %g.1.lcssa, %for.cond.cleanup ]
	Show All 31 Lines

llvm/unittests/Analysis/ScalarEvolutionTest.cpp

Show First 20 Lines • Show All 1,180 Lines • ▼ Show 20 Lines	runWithSE(*M, "foo", [](Function &F, LoopInfo &LI, ScalarEvolution &SE) {
auto *X = SE.getSCEV(getArgByName(F, "x"));		auto *X = SE.getSCEV(getArgByName(F, "x"));
auto *One = SE.getOne(X->getType());		auto *One = SE.getOne(X->getType());
auto *Sum = SE.getAddExpr(X, One, SCEV::FlagNUW);		auto *Sum = SE.getAddExpr(X, One, SCEV::FlagNUW);
EXPECT_TRUE(SE.isKnownPredicate(ICmpInst::ICMP_UGE, Sum, X));		EXPECT_TRUE(SE.isKnownPredicate(ICmpInst::ICMP_UGE, Sum, X));
EXPECT_TRUE(SE.isKnownPredicate(ICmpInst::ICMP_UGT, Sum, X));		EXPECT_TRUE(SE.isKnownPredicate(ICmpInst::ICMP_UGT, Sum, X));
});		});
}		}

		TEST_F(ScalarEvolutionsTest, SCEVgetRanges) {
		LLVMContext C;
		SMDiagnostic Err;
		std::unique_ptr<Module> M = parseAssemblyString(
		"define void @foo(i32 %i) { "
		"entry: "
		" br label %loop.body "
		"loop.body: "
		" %iv = phi i32 [ %iv.next, %loop.body ], [ 0, %entry ] "
		" %iv.next = add nsw i32 %iv, 1 "
		" %cmp = icmp eq i32 %iv.next, 16 "
		" br i1 %cmp, label %exit, label %loop.body "
		"exit: "
		" ret void "
		"} ",
		Err, C);

		runWithSE(*M, "foo", [](Function &F, LoopInfo &LI, ScalarEvolution &SE) {
		auto *ScevIV = SE.getSCEV(getInstructionByName(F, "iv")); // {0,+,1}
		auto *ScevI = SE.getSCEV(getArgByName(F, "i"));
		EXPECT_EQ(SE.getUnsignedRange(ScevIV).getLower(), 0);
		EXPECT_EQ(SE.getUnsignedRange(ScevIV).getUpper(), 16);

		auto *Add = SE.getAddExpr(ScevI, ScevIV);
		ValueToSCEVMapTy RewriteMap;
		RewriteMap[cast<SCEVUnknown>(ScevI)->getValue()] =
		SE.getUMinExpr(ScevI, SE.getConstant(ScevI->getType(), 17));
		auto *AddWithUMin = SCEVParameterRewriter::rewrite(Add, SE, RewriteMap);
		EXPECT_EQ(SE.getUnsignedRange(AddWithUMin).getLower(), 0);
		EXPECT_EQ(SE.getUnsignedRange(AddWithUMin).getUpper(), 33);
		});
		}

		TEST_F(ScalarEvolutionsTest, SCEVgetExitLimitForGuardedLoop) {
		LLVMContext C;
		SMDiagnostic Err;
		std::unique_ptr<Module> M = parseAssemblyString(
		"define void @foo(i32 %i) { "
		"entry: "
		" %cmp3 = icmp ult i32 %i, 16 "
		" br i1 %cmp3, label %loop.body, label %exit "
		"loop.body: "
		" %iv = phi i32 [ %iv.next, %loop.body ], [ %i, %entry ] "
		" %iv.next = add nsw i32 %iv, 1 "
		" %cmp = icmp eq i32 %iv.next, 16 "
		" br i1 %cmp, label %exit, label %loop.body "
		"exit: "
		" ret void "
		"} ",
		Err, C);

		ASSERT_TRUE(M && "Could not parse module?");
		ASSERT_TRUE(!verifyModule(*M) && "Must have been well formed!");

		runWithSE(*M, "foo", [](Function &F, LoopInfo &LI, ScalarEvolution &SE) {
		auto *ScevIV = SE.getSCEV(getInstructionByName(F, "iv")); // {0,+,1}
		const Loop *L = cast<SCEVAddRecExpr>(ScevIV)->getLoop();

		const SCEV *BTC = SE.getBackedgeTakenCount(L);
		EXPECT_FALSE(isa<SCEVConstant>(BTC));
		const SCEV *MaxBTC = SE.getConstantMaxBackedgeTakenCount(L);
		EXPECT_EQ(cast<SCEVConstant>(MaxBTC)->getAPInt(), 15);
		});
		}

} // end namespace llvm		} // end namespace llvm

This is an archive of the discontinued LLVM Phabricator instance.

[SCEV] Use loop guard info when computing the max BE taken count in howFarToZero.ClosedPublic

Details

Diff Detail

Event Timeline

Revision Contents

Diff 293998

llvm/include/llvm/Analysis/ScalarEvolution.h

llvm/lib/Analysis/ScalarEvolution.cpp

llvm/test/Analysis/ScalarEvolution/max-backedge-taken-count-guard-info.ll

llvm/test/Transforms/LoopVectorize/AArch64/pr36032.ll

llvm/unittests/Analysis/ScalarEvolutionTest.cpp

[SCEV] Use loop guard info when computing the max BE taken count in howFarToZero.
ClosedPublic