This is an archive of the discontinued LLVM Phabricator instance.

Paths

Table of Contentst

-
llvm/
-
lib/Analysis/
-
Analysis/
2
ScalarEvolution.cpp
-
test/Analysis/ScalarEvolution/
-
Analysis/
-
ScalarEvolution/
2
decrementing_addrecs.ll

Differential D147557

[SCEV] Improve AddRecs' range computation
ClosedPublic

Authored by aleksandr.popov on Apr 4 2023, 12:20 PM.

Download Raw Diff

Details

Reviewers

mkazantsev
nikic
fhahn
dmakogon

Commits

rG5b96b13fdffe: [SCEV] Improve AddRecs' range computation in Expensive Range Sharpening mode

Summary

Apply loop guards to AddRec's start in range computation for
non-self-wrapping AddRecs

Diff Detail

Repository: rG LLVM Github Monorepo

Event Timeline

aleksandr.popov created this revision.Apr 4 2023, 12:20 PM

Herald added a project: Restricted Project. · View Herald TranscriptApr 4 2023, 12:20 PM

Herald added subscribers: Groverkss, hiraditya. · View Herald Transcript

aleksandr.popov requested review of this revision.Apr 4 2023, 12:20 PM

Herald added a project: Restricted Project. · View Herald TranscriptApr 4 2023, 12:20 PM

Herald added a subscriber: llvm-commits. · View Herald Transcript

Harbormaster completed remote builds in B223645: Diff 510890.Apr 4 2023, 3:22 PM

aleksandr.popov added reviewers: mkazantsev, nikic, fhahn, dmakogon.Apr 4 2023, 11:55 PM

Herald added a subscriber: StephenFan. · View Herald TranscriptApr 4 2023, 11:55 PM

mkazantsev added inline comments.Apr 5 2023, 1:46 AM

llvm/lib/Analysis/ScalarEvolution.cpp
6736	Do we also need `getRangeViaFactoring`?

mkazantsev added inline comments.Apr 5 2023, 2:02 AM

llvm/lib/Analysis/ScalarEvolution.cpp
6711	Refactor with moving `const Loop *L = AddRec->getLoop();` earlier?
llvm/test/Analysis/ScalarEvolution/decrementing_addrecs.ll
21	Remove FIXME's

This is too expensive in terms of compile-time: http://llvm-compile-time-tracker.com/compare.php?from=712dfec1781db8aa92782b98cac5517db548b7f9&to=2a04edbcecc77f3d80597662a4ba19f521874d62&stat=instructions:u You're probably forcing an extra invalidation cycle here or something.

I'm also not sure whether the overall approach really makes sense. We shouldn't really get more information from the exit value than we already get from the addrec and the BE count. From what I can tell based on the test changes, the primary benefit here seems to be that we now take into account guard information on the addrec start value -- so a possible alternative would be to apply guards to the start value when calculating addrec ranges, or something along those lines?

This revision now requires changes to proceed.Apr 5 2023, 2:08 AM

Yeah. Why exactly getRangeForAffineAR could not infer the same facts? Maybe it's doing something over-conservative? I think we could do more in terms of symbolic computations and not range computations in there.

aleksandr.popov updated this revision to Diff 511348.Apr 6 2023, 3:14 AM

aleksandr.popov edited the summary of this revision. (Show Details)

@nikic Thanks for the advice, I've applied loop guards to the AddRec's start value in the getRangeForAffineNoSelfWrappingAR and now symbolic computations work for the example from decrementing_addrecs.ll

Harbormaster completed remote builds in B223966: Diff 511348.Apr 6 2023, 3:51 AM

mkazantsev added inline comments.Apr 6 2023, 4:02 AM

llvm/test/Analysis/ScalarEvolution/decrementing_addrecs.ll
2–3	Let's have 2 run commands - with and without expensive checks, to highlight the difference.

Code change LG, I'll update this test with two different run commands to see what change comes from expensive sharpening and what from your patch.

Also please wait for Nikita to take a look.

aleksandr.popov updated this revision to Diff 511371.Apr 6 2023, 4:48 AM

Harbormaster completed remote builds in B223980: Diff 511371.Apr 6 2023, 5:30 AM

LG, but let's wait few days in case if anyone else has concerns/objections.

It would also be great if we could find a way to do it w/o expensive sharpening. There are usually several ways to achieving the same effect.

@aleksandr.popov during this chill period, could you please try to construct a test that shows benefit from your patch w/o expensive sharpening?

In D147557#4250538, @mkazantsev wrote:

LG, but let's wait few days in case if anyone else has concerns/objections.

It would also be great if we could find a way to do it w/o expensive sharpening. There are usually several ways to achieving the same effect.

@aleksandr.popov during this chill period, could you please try to construct a test that shows benefit from your patch w/o expensive sharpening?

The changed code is only used by expensive sharpening, so no :)

I tried applying loop guards to other places like getRangeForAffineAR() as well, but this again had very bad compile-time impact: https://llvm-compile-time-tracker.com/compare.php?from=2caaec65c04ea7d0e9568b7895b7a46d6100cb75&to=87d1063758a038ffa7a0e358dc97c16ef6297aa3&stat=instructions:u Here's the corresponding diff: https://github.com/llvm/llvm-project/commit/87d1063758a038ffa7a0e358dc97c16ef6297aa3 I was hoping that doing this is pretty cheap, but apparently it isn't.

Ah ok, since it's only in expensive sharpening, then it should be fine.

Nikita, what's overall impact of expensive range sharpening together with this patch? It's sad for me that it is switched off by default. I'll maybe need to investigate why exactly it's so expensive and fix it.

This revision was not accepted when it landed; it landed in state Needs Review.Apr 10 2023, 2:48 AM

Closed by commit rG5b96b13fdffe: [SCEV] Improve AddRecs' range computation in Expensive Range Sharpening mode (authored by mkazantsev). · Explain Why

This revision was automatically updated to reflect the committed changes.

mkazantsev added a commit: rG5b96b13fdffe: [SCEV] Improve AddRecs' range computation in Expensive Range Sharpening mode.

In D147557#4255027, @mkazantsev wrote:

Nikita, what's overall impact of expensive range sharpening together with this patch? It's sad for me that it is switched off by default. I'll maybe need to investigate why exactly it's so expensive and fix it.

Here are the compile-time results for just flipping the flag to true: http://llvm-compile-time-tracker.com/compare.php?from=b8917ac62ad49a0ce6de026c086599fc5fa35566&to=f0c3b441b6b01bb09c28c8608c5904d1910cd6d1&stat=instructions:u

Revision Contents

Path

Size

llvm/

lib/

Analysis/

ScalarEvolution.cpp

2 lines

test/

Analysis/

ScalarEvolution/

decrementing_addrecs.ll

12 lines

Diff 512102

llvm/lib/Analysis/ScalarEvolution.cpp

This file is larger than 256 KB, so syntax highlighting is disabled by default.

Show First 20 Lines • Show All 6,702 Lines • ▼ Show 20 Lines	if (AddRec->hasNoSignedWrap()) {
getSignedRangeMax(AddRec->getStart()) +		getSignedRangeMax(AddRec->getStart()) +
1),		1),
RangeType);		RangeType);
}		}

// TODO: non-affine addrec		// TODO: non-affine addrec
if (AddRec->isAffine()) {		if (AddRec->isAffine()) {
const SCEV *MaxBECount =		const SCEV *MaxBECount =
getConstantMaxBackedgeTakenCount(AddRec->getLoop());		getConstantMaxBackedgeTakenCount(AddRec->getLoop());
		mkazantsevUnsubmitted Not Done Reply Inline Actions Refactor with moving `const Loop L = AddRec->getLoop();` earlier? mkazantsev:* Refactor with moving `const Loop *L = AddRec->getLoop();` earlier?
if (!isa<SCEVCouldNotCompute>(MaxBECount) &&		if (!isa<SCEVCouldNotCompute>(MaxBECount) &&
getTypeSizeInBits(MaxBECount->getType()) <= BitWidth) {		getTypeSizeInBits(MaxBECount->getType()) <= BitWidth) {
auto RangeFromAffine = getRangeForAffineAR(		auto RangeFromAffine = getRangeForAffineAR(
AddRec->getStart(), AddRec->getStepRecurrence(*this), MaxBECount,		AddRec->getStart(), AddRec->getStepRecurrence(*this), MaxBECount,
BitWidth);		BitWidth);
ConservativeResult =		ConservativeResult =
ConservativeResult.intersectWith(RangeFromAffine, RangeType);		ConservativeResult.intersectWith(RangeFromAffine, RangeType);

auto RangeFromFactoring = getRangeViaFactoring(		auto RangeFromFactoring = getRangeViaFactoring(
AddRec->getStart(), AddRec->getStepRecurrence(*this), MaxBECount,		AddRec->getStart(), AddRec->getStepRecurrence(*this), MaxBECount,
BitWidth);		BitWidth);
ConservativeResult =		ConservativeResult =
ConservativeResult.intersectWith(RangeFromFactoring, RangeType);		ConservativeResult.intersectWith(RangeFromFactoring, RangeType);
}		}

// Now try symbolic BE count and more powerful methods.		// Now try symbolic BE count and more powerful methods.
if (UseExpensiveRangeSharpening) {		if (UseExpensiveRangeSharpening) {
const SCEV *SymbolicMaxBECount =		const SCEV *SymbolicMaxBECount =
getSymbolicMaxBackedgeTakenCount(AddRec->getLoop());		getSymbolicMaxBackedgeTakenCount(AddRec->getLoop());
if (!isa<SCEVCouldNotCompute>(SymbolicMaxBECount) &&		if (!isa<SCEVCouldNotCompute>(SymbolicMaxBECount) &&
getTypeSizeInBits(MaxBECount->getType()) <= BitWidth &&		getTypeSizeInBits(MaxBECount->getType()) <= BitWidth &&
AddRec->hasNoSelfWrap()) {		AddRec->hasNoSelfWrap()) {
auto RangeFromAffineNew = getRangeForAffineNoSelfWrappingAR(		auto RangeFromAffineNew = getRangeForAffineNoSelfWrappingAR(
AddRec, SymbolicMaxBECount, BitWidth, SignHint);		AddRec, SymbolicMaxBECount, BitWidth, SignHint);
ConservativeResult =		ConservativeResult =
		mkazantsevUnsubmitted Not Done Reply Inline Actions Do we also need `getRangeViaFactoring`? mkazantsev: Do we also need `getRangeViaFactoring`?
ConservativeResult.intersectWith(RangeFromAffineNew, RangeType);		ConservativeResult.intersectWith(RangeFromAffineNew, RangeType);
}		}
}		}
}		}

return setRange(AddRec, SignHint, std::move(ConservativeResult));		return setRange(AddRec, SignHint, std::move(ConservativeResult));
}		}
case scUMaxExpr:		case scUMaxExpr:
▲ Show 20 Lines • Show All 253 Lines • ▼ Show 20 Lines	ConstantRange ScalarEvolution::getRangeForAffineNoSelfWrappingAR(
//		//
// Case 1: RangeMin ... Start V1 ... VN End ... RangeMax;		// Case 1: RangeMin ... Start V1 ... VN End ... RangeMax;
// Case 2: RangeMin Vk ... V1 Start ... End Vn ... Vk + 1 RangeMax;		// Case 2: RangeMin Vk ... V1 Start ... End Vn ... Vk + 1 RangeMax;
//		//
// No self wrap flag guarantees that the intermediate values cannot be BOTH		// No self wrap flag guarantees that the intermediate values cannot be BOTH
// outside and inside the range [Min(Start, End), Max(Start, End)]. Using that		// outside and inside the range [Min(Start, End), Max(Start, End)]. Using that
// knowledge, let's try to prove that we are dealing with Case 1. It is so if		// knowledge, let's try to prove that we are dealing with Case 1. It is so if
// Start <= End and step is positive, or Start >= End and step is negative.		// Start <= End and step is positive, or Start >= End and step is negative.
const SCEV *Start = AddRec->getStart();		const SCEV *Start = applyLoopGuards(AddRec->getStart(), AddRec->getLoop());
ConstantRange StartRange = getRangeRef(Start, SignHint);		ConstantRange StartRange = getRangeRef(Start, SignHint);
ConstantRange EndRange = getRangeRef(End, SignHint);		ConstantRange EndRange = getRangeRef(End, SignHint);
ConstantRange RangeBetween = StartRange.unionWith(EndRange);		ConstantRange RangeBetween = StartRange.unionWith(EndRange);
// If they already cover full iteration space, we will know nothing useful		// If they already cover full iteration space, we will know nothing useful
// even if we prove what we want to prove.		// even if we prove what we want to prove.
if (RangeBetween.isFullSet())		if (RangeBetween.isFullSet())
return RangeBetween;		return RangeBetween;
// Only deal with ranges that do not wrap (i.e. RangeMin < RangeMax).		// Only deal with ranges that do not wrap (i.e. RangeMin < RangeMax).
▲ Show 20 Lines • Show All 8,396 Lines • Show Last 20 Lines

llvm/test/Analysis/ScalarEvolution/decrementing_addrecs.ll

	; NOTE: Assertions have been autogenerated by utils/update_analyze_test_checks.py UTC_ARGS: --version 2			; NOTE: Assertions have been autogenerated by utils/update_analyze_test_checks.py UTC_ARGS: --version 2
	; RUN: opt -disable-output "-passes=print<scalar-evolution>" < %s 2>&1 \| FileCheck %s --check-prefix=DEFAULT			; RUN: opt -disable-output "-passes=print<scalar-evolution>" < %s 2>&1 \| FileCheck %s --check-prefix=DEFAULT
	; RUN: opt -disable-output "-passes=print<scalar-evolution>" -scalar-evolution-use-expensive-range-sharpening < %s 2>&1 \| FileCheck %s --check-prefix=EXPENSIVE_SHARPENING			; RUN: opt -disable-output "-passes=print<scalar-evolution>" -scalar-evolution-use-expensive-range-sharpening < %s 2>&1 \| FileCheck %s --check-prefix=EXPENSIVE_SHARPENING
				mkazantsevUnsubmitted Not Done Reply Inline Actions Let's have 2 run commands - with and without expensive checks, to highlight the difference. mkazantsev: Let's have 2 run commands - with and without expensive checks, to highlight the difference.

	; This test exercises the following scenario:			; This test exercises the following scenario:
	; given: n > 0			; given: n > 0
	; for (i = 0, j = n - 1; i < n; i++, j--) {			; for (i = 0, j = n - 1; i < n; i++, j--) {
	; a = n - i;			; a = n - i;
	; b = (n - 1) - i;			; b = (n - 1) - i;
	; c = 2147483647 - 1;			; c = 2147483647 - 1;
	; }			; }
	;			;
	; Note that value ranges of 'i' and 'j' are the same, just inverted. It means that			; Note that value ranges of 'i' and 'j' are the same, just inverted. It means that
	; they have same ranges and same no-wrap properties. 'b' is just an alternative			; they have same ranges and same no-wrap properties. 'b' is just an alternative
	; way to compute the same value as 'j'. 'a' is effectively 'j + 1' and 'c' is a			; way to compute the same value as 'j'. 'a' is effectively 'j + 1' and 'c' is a
	; a positive value. All involved addrecs for 'i', 'j', 'a', 'b', 'c' should have			; a positive value. All involved addrecs for 'i', 'j', 'a', 'b', 'c' should have
	; no-sign-wrap flag.			; no-sign-wrap flag.
	;			;
	; i's AddRec is expected to be proven no-sign-wrap			; i's AddRec is expected to be proven no-sign-wrap
	; j's AddRec is expected to be proven no-sign-wrap			; j's AddRec is expected to be proven no-sign-wrap
	; FIXME: a's AddRec is expected to be no-sign-wrap			; FIXME: a's AddRec is expected to be no-sign-wrap
				mkazantsevUnsubmitted Not Done Reply Inline Actions Remove FIXME's mkazantsev: Remove FIXME's
	; b's AddRec is expected to be no-sign-wrap			; b's AddRec is expected to be no-sign-wrap
	; FIXME: c's AddRec is expected to be no-sign-wrap			; FIXME: c's AddRec is expected to be no-sign-wrap
	; i is expected to be non-negative			; i is expected to be non-negative
	; j is expected to be non-negative			; j is expected to be non-negative
	; FIXME: a is expected to be positive			; a is expected to be positive
	; FIXME: b is expected to be non-negative			; b is expected to be non-negative
	; c is expected to be positive			; c is expected to be positive
	define i32 @test_step_1_flags(i32 %n) {			define i32 @test_step_1_flags(i32 %n) {
	; DEFAULT-LABEL: 'test_step_1_flags'			; DEFAULT-LABEL: 'test_step_1_flags'
	; DEFAULT-NEXT: Classifying expressions for: @test_step_1_flags			; DEFAULT-NEXT: Classifying expressions for: @test_step_1_flags
	; DEFAULT-NEXT: %n.minus.1 = sub nsw i32 %n, 1			; DEFAULT-NEXT: %n.minus.1 = sub nsw i32 %n, 1
	; DEFAULT-NEXT: --> (-1 + %n) U: full-set S: full-set			; DEFAULT-NEXT: --> (-1 + %n) U: full-set S: full-set
	; DEFAULT-NEXT: %i = phi i32 [ 0, %entry ], [ %i.next, %loop ]			; DEFAULT-NEXT: %i = phi i32 [ 0, %entry ], [ %i.next, %loop ]
	; DEFAULT-NEXT: --> {0,+,1}<nuw><nsw><%loop> U: [0,2147483647) S: [0,2147483647) Exits: (-1 + %n) LoopDispositions: { %loop: Computable }			; DEFAULT-NEXT: --> {0,+,1}<nuw><nsw><%loop> U: [0,2147483647) S: [0,2147483647) Exits: (-1 + %n) LoopDispositions: { %loop: Computable }
	Show All 19 Lines
	;			;
	; EXPENSIVE_SHARPENING-LABEL: 'test_step_1_flags'			; EXPENSIVE_SHARPENING-LABEL: 'test_step_1_flags'
	; EXPENSIVE_SHARPENING-NEXT: Classifying expressions for: @test_step_1_flags			; EXPENSIVE_SHARPENING-NEXT: Classifying expressions for: @test_step_1_flags
	; EXPENSIVE_SHARPENING-NEXT: %n.minus.1 = sub nsw i32 %n, 1			; EXPENSIVE_SHARPENING-NEXT: %n.minus.1 = sub nsw i32 %n, 1
	; EXPENSIVE_SHARPENING-NEXT: --> (-1 + %n) U: full-set S: full-set			; EXPENSIVE_SHARPENING-NEXT: --> (-1 + %n) U: full-set S: full-set
	; EXPENSIVE_SHARPENING-NEXT: %i = phi i32 [ 0, %entry ], [ %i.next, %loop ]			; EXPENSIVE_SHARPENING-NEXT: %i = phi i32 [ 0, %entry ], [ %i.next, %loop ]
	; EXPENSIVE_SHARPENING-NEXT: --> {0,+,1}<nuw><nsw><%loop> U: [0,2147483647) S: [0,2147483647) Exits: (-1 + %n) LoopDispositions: { %loop: Computable }			; EXPENSIVE_SHARPENING-NEXT: --> {0,+,1}<nuw><nsw><%loop> U: [0,2147483647) S: [0,2147483647) Exits: (-1 + %n) LoopDispositions: { %loop: Computable }
	; EXPENSIVE_SHARPENING-NEXT: %j = phi i32 [ %n.minus.1, %entry ], [ %j.next, %loop ]			; EXPENSIVE_SHARPENING-NEXT: %j = phi i32 [ %n.minus.1, %entry ], [ %j.next, %loop ]
	; EXPENSIVE_SHARPENING-NEXT: --> {(-1 + %n),+,-1}<nsw><%loop> U: full-set S: full-set Exits: 0 LoopDispositions: { %loop: Computable }			; EXPENSIVE_SHARPENING-NEXT: --> {(-1 + %n),+,-1}<nsw><%loop> U: [0,2147483647) S: [0,2147483647) Exits: 0 LoopDispositions: { %loop: Computable }
	; EXPENSIVE_SHARPENING-NEXT: %a = sub i32 %n, %i			; EXPENSIVE_SHARPENING-NEXT: %a = sub i32 %n, %i
	; EXPENSIVE_SHARPENING-NEXT: --> {%n,+,-1}<nw><%loop> U: full-set S: full-set Exits: 1 LoopDispositions: { %loop: Computable }			; EXPENSIVE_SHARPENING-NEXT: --> {%n,+,-1}<nw><%loop> U: [1,-2147483648) S: [1,-2147483648) Exits: 1 LoopDispositions: { %loop: Computable }
	; EXPENSIVE_SHARPENING-NEXT: %b = sub i32 %n.minus.1, %i			; EXPENSIVE_SHARPENING-NEXT: %b = sub i32 %n.minus.1, %i
	; EXPENSIVE_SHARPENING-NEXT: --> {(-1 + %n),+,-1}<nsw><%loop> U: full-set S: full-set Exits: 0 LoopDispositions: { %loop: Computable }			; EXPENSIVE_SHARPENING-NEXT: --> {(-1 + %n),+,-1}<nsw><%loop> U: [0,2147483647) S: [0,2147483647) Exits: 0 LoopDispositions: { %loop: Computable }
	; EXPENSIVE_SHARPENING-NEXT: %c = sub i32 2147483647, %i			; EXPENSIVE_SHARPENING-NEXT: %c = sub i32 2147483647, %i
	; EXPENSIVE_SHARPENING-NEXT: --> {2147483647,+,-1}<nw><%loop> U: [1,-2147483648) S: [1,-2147483648) Exits: (-2147483648 + (-1 * %n)) LoopDispositions: { %loop: Computable }			; EXPENSIVE_SHARPENING-NEXT: --> {2147483647,+,-1}<nw><%loop> U: [1,-2147483648) S: [1,-2147483648) Exits: (-2147483648 + (-1 * %n)) LoopDispositions: { %loop: Computable }
	; EXPENSIVE_SHARPENING-NEXT: %i.next = add nuw nsw i32 %i, 1			; EXPENSIVE_SHARPENING-NEXT: %i.next = add nuw nsw i32 %i, 1
	; EXPENSIVE_SHARPENING-NEXT: --> {1,+,1}<nuw><nsw><%loop> U: [1,-2147483648) S: [1,-2147483648) Exits: %n LoopDispositions: { %loop: Computable }			; EXPENSIVE_SHARPENING-NEXT: --> {1,+,1}<nuw><nsw><%loop> U: [1,-2147483648) S: [1,-2147483648) Exits: %n LoopDispositions: { %loop: Computable }
	; EXPENSIVE_SHARPENING-NEXT: %j.next = add nsw i32 %j, -1			; EXPENSIVE_SHARPENING-NEXT: %j.next = add nsw i32 %j, -1
	; EXPENSIVE_SHARPENING-NEXT: --> {(-2 + %n),+,-1}<nw><%loop> U: full-set S: full-set Exits: -1 LoopDispositions: { %loop: Computable }			; EXPENSIVE_SHARPENING-NEXT: --> {(-2 + %n),+,-1}<nsw><%loop> U: full-set S: [-1,2147483646) Exits: -1 LoopDispositions: { %loop: Computable }
	; EXPENSIVE_SHARPENING-NEXT: Determining loop execution counts for: @test_step_1_flags			; EXPENSIVE_SHARPENING-NEXT: Determining loop execution counts for: @test_step_1_flags
	; EXPENSIVE_SHARPENING-NEXT: Loop %loop: backedge-taken count is (-1 + %n)			; EXPENSIVE_SHARPENING-NEXT: Loop %loop: backedge-taken count is (-1 + %n)
	; EXPENSIVE_SHARPENING-NEXT: Loop %loop: constant max backedge-taken count is 2147483646			; EXPENSIVE_SHARPENING-NEXT: Loop %loop: constant max backedge-taken count is 2147483646
	; EXPENSIVE_SHARPENING-NEXT: Loop %loop: symbolic max backedge-taken count is (-1 + %n)			; EXPENSIVE_SHARPENING-NEXT: Loop %loop: symbolic max backedge-taken count is (-1 + %n)
	; EXPENSIVE_SHARPENING-NEXT: Loop %loop: Predicated backedge-taken count is (-1 + %n)			; EXPENSIVE_SHARPENING-NEXT: Loop %loop: Predicated backedge-taken count is (-1 + %n)
	; EXPENSIVE_SHARPENING-NEXT: Predicates:			; EXPENSIVE_SHARPENING-NEXT: Predicates:
	; EXPENSIVE_SHARPENING: Loop %loop: Trip multiple is 1			; EXPENSIVE_SHARPENING: Loop %loop: Trip multiple is 1
	;			;
	Show All 22 Lines