This is an archive of the discontinued LLVM Phabricator instance.

Paths

Table of Contentst

-
include/llvm/Analysis/
-
llvm/
-
Analysis/
-
ScalarEvolution.h
-
lib/Analysis/
-
Analysis/
2
ScalarEvolution.cpp

Differential D50985

[SCEV] LoopsUsed memoization
Needs RevisionPublic

Authored by rtereshin on Aug 20 2018, 11:41 AM.

Download Raw Diff

Details

Reviewers

mkazantsev
efriedma
sanjoy

Summary

Currently ScalarEvolution::getUsedLoops traverses a SCEV expression upon each call
w/o caching the results. As it's called by addToLoopUseLists for every new SCEV node created
it is time consuming. This patch adds a memoization map to speed up the calls and tries
to do so in the least invasive manner possible.

This partially addresses https://bugs.llvm.org/show_bug.cgi?id=32731

On the large-SCEVs-shallow-getSCEV-stack.ll test case attached to the Bugzilla bug I see
~70% reduction in overall (wall) time of

time ./bin/opt -slsr large-SCEVs-shallow-getSCEV-stack.ll -o /dev/null

run (the rest of the time (about 98% give or take) is still spent in checkValidity).

CTMark shows either a small improvement or is within noise, it's hard to tell even
on a 100 runs (x86):

Name	Prev	Current	%	Δ	MAD	Prev	Current	%	Δ	MAD
CTMark/ClamAV/clamscan	9.8202	9.8004	-0.20%	-0.0198	0.0754	9.9054	9.8899	-0.16%	-0.0155	0.0754
CTMark/kimwitu++/kc	11.0440	11.0182	-0.23%	-0.0258	0.0268	11.0805	11.0802	0.00%	-0.0003	0.0268
CTMark/tramp3d-v4/tramp3d-v4	11.6199	11.5813	-0.33%	-0.0386	0.0388	11.6906	11.6903	0.00%	-0.0003	0.0388
CTMark/7zip/7zip-benchmark	27.3075	27.2473	-0.22%	-0.0602	0.1511	27.8942	27.8941	0.00%	-0.0001	0.1511
CTMark/sqlite3/sqlite3	4.7515	4.7446	-0.15%	-0.0069	0.0278	4.7894	4.7893	0.00%	-0.0001	0.0278
CTMark/7zip/7zip-benchmark-link	0.0443	0.0438	-1.13%	-0.0005	0.0010	0.0457	0.0457	0.00%	0.0000	0.0010
CTMark/Bullet/bullet-link	0.0320	0.0319	-0.31%	-0.0001	0.0006	0.0329	0.0329	0.00%	0.0000	0.0006
CTMark/ClamAV/clamscan-link	0.0234	0.0234	0.00%	0.0000	0.0003	0.0238	0.0238	0.00%	0.0000	0.0003
CTMark/SPASS/SPASS-link	0.0227	0.0227	0.00%	0.0000	0.0002	0.0230	0.0230	0.00%	0.0000	0.0002
CTMark/consumer-typeset/consumer-typeset-link	0.0239	0.0239	0.00%	0.0000	0.0003	0.0246	0.0246	0.20%	0.0000	0.0003
CTMark/kimwitu++/kc-link	0.0528	0.0528	0.00%	0.0000	0.0003	0.0534	0.0534	0.09%	0.0000	0.0003
CTMark/lencod/lencod-link	0.0238	0.0238	0.00%	0.0000	0.0003	0.0246	0.0246	0.00%	0.0000	0.0003
CTMark/mafft/pairlocalalign-link	0.0174	0.0174	0.00%	0.0000	0.0001	0.0176	0.0176	0.28%	0.0000	0.0001
CTMark/sqlite3/sqlite3-link	0.0141	0.0141	0.00%	0.0000	0.0001	0.0143	0.0143	0.00%	0.0000	0.0001
CTMark/tramp3d-v4/tramp3d-v4-link	0.0238	0.0238	0.00%	0.0000	0.0001	0.0240	0.0240	0.00%	0.0000	0.0001
CTMark/consumer-typeset/consumer-typeset	7.6772	7.6499	-0.36%	-0.0273	0.0343	7.7051	7.7058	0.01%	0.0006	0.0343
CTMark/mafft/pairlocalalign	4.9020	4.8927	-0.19%	-0.0093	0.0199	4.9349	4.9355	0.01%	0.0006	0.0199
CTMark/SPASS/SPASS	9.0386	9.0233	-0.17%	-0.0153	0.0716	9.1167	9.1181	0.02%	0.0014	0.0716
CTMark/lencod/lencod	8.5570	8.5427	-0.17%	-0.0143	0.0397	8.5944	8.5960	0.02%	0.0016	0.0397
CTMark/Bullet/bullet	20.3327	20.3012	-0.15%	-0.0315	0.1420	20.7366	20.7445	0.04%	0.0079	0.1420

(the left half of the table uses minimum as an aggregate function, the right half - median, the underlying data are the same 100 samples,
rows are sorted by the absolute delta of medians)

If the memory consumption becomes a concern we can switch from having a full set of loops referenced attached to every SCEV node
to a mirrored tree (DAG really) structure in a skip-list fashion so every node contains a set of unique references to all closest AddRec
nodes. This way getUsedLoops will be able to traverse a compressed tree (DAG) containing AddRec's only for any SCEV expression.

Diff Detail

Repository: rL LLVM

Event Timeline

rtereshin created this revision.Aug 20 2018, 11:41 AM

Herald added a subscriber: javed.absar. · View Herald TranscriptAug 20 2018, 11:41 AM

I have a lot of doubts that it is correct. It might be, but I would request you to add some tests which aggressively exercise scenarios like "cached something -- called forgetMemoizedResults/forgetLoop -- attempted to get cached results". I am specifically worried about CFG changes and loop deletions.

lib/Analysis/ScalarEvolution.cpp
11763	I don't think it is sufficient for correctness. Imagine the situation: `A = {1,+,1}<some_loop>, B = A + 1`. Something has changed for A, and it is no longer using `some_loop`. How do we say that `B` is not using it as well? I also think that something needs to be done about it in `forgetLoop`. This method is called, for example, when a loop becomes non-existent, or when the set of its blocks changes. And you may end up with many cache bundles keeping references to this loop.

This revision now requires changes to proceed.Aug 20 2018, 7:28 PM

Hi Maxim,

Thank you for looking into this!

Given my current understanding I believe that as soon as (1) LoopsRefd is a proper inverse (or undefined) map of LoopUsers we should be fine, unless (2) some passes mis-use Scalar Evolution's APIs, for instance, do not call forgetLoop or forgetTopmostLoop on changed or deleted Loops.

For (1), I see LoopUsers getting changed only in a few places:
a) https://github.com/llvm-mirror/llvm/blob/5c1bd30b863cf271dbe70219b0bb5717d1e2ec7e/lib/Analysis/ScalarEvolution.cpp#L11813
b) https://github.com/llvm-mirror/llvm/blob/5c1bd30b863cf271dbe70219b0bb5717d1e2ec7e/lib/Analysis/ScalarEvolution.cpp#L6778

I believe in both cases the changes are reflected in LoopsRefd.

As for (2) if that's the case we already have a problem regardless applying this patch or not, which is also not going to be surfaced by testing Scalar Evolution's APIs, assuming such tests use the APIs as intended, which is probably a safe assumption.

I could also add that the CTMark compiles generated exactly the same binaries before and after this patch.

lib/Analysis/ScalarEvolution.cpp
11763	I don't think it is sufficient for correctness. Imagine the situation: A = {1,+,1}<some_loop>, B = A + 1. Something has changed for A, and it is no longer using some_loop. How do we say that B is not using it as well? The loop is part of the SCEV node's identity: https://github.com/llvm-mirror/llvm/blob/5c1bd30b863cf271dbe70219b0bb5717d1e2ec7e/lib/Analysis/ScalarEvolution.cpp#L3423 Therefore A can not just stop using some_loop, it could only be recreated with a different loop attached and it will be a different (by reference) node A'. Same for B, it will be a different node B' = A' + 1 with no cached results for either B' or A'. At least, this is my current understanding of what's going on here. I also think that something needs to be done about it in forgetLoop. This method is called, for example, when a loop becomes non-existent, or when the set of its blocks changes. I believe it's already done: https://github.com/llvm-mirror/llvm/blob/5c1bd30b863cf271dbe70219b0bb5717d1e2ec7e/lib/Analysis/ScalarEvolution.cpp#L6774-L6779 And you may end up with many cache bundles keeping references to this loop. I think this is only possible if a pass using Scalar Evolution and changing loops violates the API and doesn't call `forgetLoop` on an invalidated Loop. If so, it's that pass' issue, not Scalar Evolution's, and heavy-testing Scalar Evolution's APIs won't surface such bugs at all. If `forgetLoop` is properly called however, it would clean the cache for every SCEV referencing the Loop (as soon as `LoopsRefd` is a proper inverse map of `LoopUsers`), so no stale bundles I believe. Also, given that every SCEV B referencing a SCEV A references all the loops referenced by A (and potentially more), it will end up clearing this cache for every SCEV node (transitively) dependent on a directly affected SCEV. It's probably worth noting also that `forgetLoop` "forgets" all the contained loops as well, not just the one explicitly specified.

sanjoy resigned from this revision.Jan 29 2022, 5:41 PM

Herald added a subscriber: pengfei. · View Herald TranscriptJan 29 2022, 5:41 PM

Revision Contents

Path

Size

include/

llvm/

Analysis/

ScalarEvolution.h

3 lines

lib/

Analysis/

ScalarEvolution.cpp

18 lines

Diff 161516

include/llvm/Analysis/ScalarEvolution.h

Show First 20 Lines • Show All 1,850 Lines • ▼ Show 20 Lines	private:

FoldingSet<SCEV> UniqueSCEVs;		FoldingSet<SCEV> UniqueSCEVs;
FoldingSet<SCEVPredicate> UniquePreds;		FoldingSet<SCEVPredicate> UniquePreds;
BumpPtrAllocator SCEVAllocator;		BumpPtrAllocator SCEVAllocator;

/// This maps loops to a list of SCEV expressions that (transitively) use said		/// This maps loops to a list of SCEV expressions that (transitively) use said
/// loop.		/// loop.
DenseMap<const Loop , SmallVector<const SCEV , 4>> LoopUsers;		DenseMap<const Loop , SmallVector<const SCEV , 4>> LoopUsers;
		/// The inverse of LoopUsers map; maps a SCEV expression to a set of
		/// (transitively) referenced Loops.
		DenseMap<const SCEV , SmallPtrSet<const Loop , 4>> LoopsRefd;

/// Cache tentative mappings from UnknownSCEVs in a Loop, to a SCEV expression		/// Cache tentative mappings from UnknownSCEVs in a Loop, to a SCEV expression
/// they can be rewritten into under certain predicates.		/// they can be rewritten into under certain predicates.
DenseMap<std::pair<const SCEVUnknown , const Loop >,		DenseMap<std::pair<const SCEVUnknown , const Loop >,
std::pair<const SCEV , SmallVector<const SCEVPredicate , 3>>>		std::pair<const SCEV , SmallVector<const SCEVPredicate , 3>>>
PredicatedSCEVRewrites;		PredicatedSCEVRewrites;

/// The head of a linked list of all SCEVUnknown values that have been		/// The head of a linked list of all SCEVUnknown values that have been
▲ Show 20 Lines • Show All 148 Lines • Show Last 20 Lines

lib/Analysis/ScalarEvolution.cpp

This file is larger than 256 KB, so syntax highlighting is disabled by default.

	Show First 20 Lines • Show All 9,991 Lines • ▼ Show 20 Lines
	ValuesAtScopes.erase(S);			ValuesAtScopes.erase(S);
	LoopDispositions.erase(S);			LoopDispositions.erase(S);
	BlockDispositions.erase(S);			BlockDispositions.erase(S);
	UnsignedRanges.erase(S);			UnsignedRanges.erase(S);
	SignedRanges.erase(S);			SignedRanges.erase(S);
	ExprValueMap.erase(S);			ExprValueMap.erase(S);
	HasRecMap.erase(S);			HasRecMap.erase(S);
	MinTrailingZerosCache.erase(S);			MinTrailingZerosCache.erase(S);
				LoopsRefd.erase(S);
				mkazantsevUnsubmitted Not Done Reply Inline Actions I don't think it is sufficient for correctness. Imagine the situation: `A = {1,+,1}<some_loop>, B = A + 1`. Something has changed for A, and it is no longer using `some_loop`. How do we say that `B` is not using it as well? I also think that something needs to be done about it in `forgetLoop`. This method is called, for example, when a loop becomes non-existent, or when the set of its blocks changes. And you may end up with many cache bundles keeping references to this loop. mkazantsev: I don't think it is sufficient for correctness. Imagine the situation: `A = {1,+,1}<some_loop>…
				rtereshinAuthorUnsubmitted Not Done Reply Inline Actions I don't think it is sufficient for correctness. Imagine the situation: A = {1,+,1}<some_loop>, B = A + 1. Something has changed for A, and it is no longer using some_loop. How do we say that B is not using it as well? The loop is part of the SCEV node's identity: https://github.com/llvm-mirror/llvm/blob/5c1bd30b863cf271dbe70219b0bb5717d1e2ec7e/lib/Analysis/ScalarEvolution.cpp#L3423 Therefore A can not just stop using some_loop, it could only be recreated with a different loop attached and it will be a different (by reference) node A'. Same for B, it will be a different node B' = A' + 1 with no cached results for either B' or A'. At least, this is my current understanding of what's going on here. I also think that something needs to be done about it in forgetLoop. This method is called, for example, when a loop becomes non-existent, or when the set of its blocks changes. I believe it's already done: https://github.com/llvm-mirror/llvm/blob/5c1bd30b863cf271dbe70219b0bb5717d1e2ec7e/lib/Analysis/ScalarEvolution.cpp#L6774-L6779 And you may end up with many cache bundles keeping references to this loop. I think this is only possible if a pass using Scalar Evolution and changing loops violates the API and doesn't call `forgetLoop` on an invalidated Loop. If so, it's that pass' issue, not Scalar Evolution's, and heavy-testing Scalar Evolution's APIs won't surface such bugs at all. If `forgetLoop` is properly called however, it would clean the cache for every SCEV referencing the Loop (as soon as `LoopsRefd` is a proper inverse map of `LoopUsers`), so no stale bundles I believe. Also, given that every SCEV B referencing a SCEV A references all the loops referenced by A (and potentially more), it will end up clearing this cache for every SCEV node (transitively) dependent on a directly affected SCEV. It's probably worth noting also that `forgetLoop` "forgets" all the contained loops as well, not just the one explicitly specified. rtereshin: > I don't think it is sufficient for correctness. Imagine the situation: A = {1,+,1}<some_loop>…

	for (auto I = PredicatedSCEVRewrites.begin();			for (auto I = PredicatedSCEVRewrites.begin();
	I != PredicatedSCEVRewrites.end();) {			I != PredicatedSCEVRewrites.end();) {
	std::pair<const SCEV , const Loop > Entry = I->first;			std::pair<const SCEV , const Loop > Entry = I->first;
	if (Entry.first == S)			if (Entry.first == S)
	PredicatedSCEVRewrites.erase(I++);			PredicatedSCEVRewrites.erase(I++);
	else			else
	++I;			++I;
	Show All 14 Lines
	RemoveSCEVFromBackedgeMap(BackedgeTakenCounts);			RemoveSCEVFromBackedgeMap(BackedgeTakenCounts);
	RemoveSCEVFromBackedgeMap(PredicatedBackedgeTakenCounts);			RemoveSCEVFromBackedgeMap(PredicatedBackedgeTakenCounts);
	}			}

	void			void
	ScalarEvolution::getUsedLoops(const SCEV *S,			ScalarEvolution::getUsedLoops(const SCEV *S,
	SmallPtrSetImpl<const Loop *> &LoopsUsed) {			SmallPtrSetImpl<const Loop *> &LoopsUsed) {
	struct FindUsedLoops {			struct FindUsedLoops {
	FindUsedLoops(SmallPtrSetImpl<const Loop *> &LoopsUsed)			FindUsedLoops(SmallPtrSetImpl<const Loop *> &LoopsUsed, ScalarEvolution &SE)
	: LoopsUsed(LoopsUsed) {}			: LoopsUsed(LoopsUsed), SE(SE) {}
	SmallPtrSetImpl<const Loop *> &LoopsUsed;			SmallPtrSetImpl<const Loop *> &LoopsUsed;
				ScalarEvolution &SE;

	bool follow(const SCEV *S) {			bool follow(const SCEV *S) {
				auto It = SE.LoopsRefd.find(S);
				if (It != SE.LoopsRefd.end() && &It->second != &LoopsUsed) {
				LoopsUsed.insert(It->second.begin(), It->second.end());
				return false;
				}
	if (auto *AR = dyn_cast<SCEVAddRecExpr>(S))			if (auto *AR = dyn_cast<SCEVAddRecExpr>(S))
	LoopsUsed.insert(AR->getLoop());			LoopsUsed.insert(AR->getLoop());
	return true;			return true;
	}			}

	bool isDone() const { return false; }			bool isDone() const { return false; }
	};			};

	FindUsedLoops F(LoopsUsed);			FindUsedLoops F(LoopsUsed, *this);
	SCEVTraversal<FindUsedLoops>(F).visitAll(S);			SCEVTraversal<FindUsedLoops>(F).visitAll(S);
	}			}

	void ScalarEvolution::addToLoopUseLists(const SCEV *S) {			void ScalarEvolution::addToLoopUseLists(const SCEV *S) {
	SmallPtrSet<const Loop *, 8> LoopsUsed;			assert(LoopsRefd.find(S) == LoopsRefd.end() &&
				"addToLoopUseLists should be called exactly once per every new SCEV");
				SmallPtrSetImpl<const Loop *> &LoopsUsed = LoopsRefd[S];
	getUsedLoops(S, LoopsUsed);			getUsedLoops(S, LoopsUsed);
	for (auto *L : LoopsUsed)			for (auto *L : LoopsUsed)
	LoopUsers[L].push_back(S);			LoopUsers[L].push_back(S);
	}			}

	void ScalarEvolution::verify() const {			void ScalarEvolution::verify() const {
	ScalarEvolution &SE = const_cast<ScalarEvolution >(this);			ScalarEvolution &SE = const_cast<ScalarEvolution >(this);
	ScalarEvolution SE2(F, TLI, AC, DT, LI);			ScalarEvolution SE2(F, TLI, AC, DT, LI);
	▲ Show 20 Lines • Show All 627 Lines • Show Last 20 Lines