This is an archive of the discontinued LLVM Phabricator instance.

Paths

Table of Contentst

-
llvm/trunk/lib/Analysis/
-
trunk/
-
lib/
-
Analysis/
-
ScalarEvolution.cpp

Differential D37659

Improve ScalarEvolution::forgetLoop() performance
ClosedPublic

Authored by kariddi on Sep 9 2017, 2:47 AM.

Download Raw Diff

Details

Reviewers

bkramer
sunfish
dblaikie
sanjoy

Commits

rGce90060d1c19: [ScalarEvolution] Refactor forgetLoop() to improve performance
rL312920: [ScalarEvolution] Refactor forgetLoop() to improve performance

Summary

In some tests I found forgetLoop() in ScalarEvolution can be pretty slow.

Specifically in one test I measured performance and I found that UnrollLoop took 17% of total compile time with a good 10% of the 17% being SCEV::forgetLoop().

The code seems to be churning quite a bit of instructions in some worse case scenario. The fact also that the algorithm is recursive and that we don't reuse the data structures with the child loops don't help.

I refactored it to be iterative which enables us to reuse the data structures for computing the children loops and I made it such that the "Visited" set is shared between the processing of all the sub loops .

I'm not sure if that is correct because I'm not a SCEV expert and I would like feedback on that. At a naive look it seems like that if you processed an instruction already I don't see why , if you end up encountering it again while processing a child loop, you should process it again, as it doesn't seem the information of being part of a specific loop is used for "forgetMemoizedResults()" or "eraseValueFromMap()"

These changes pretty much halve the time taken by the method in my case which is still not ideal but it's much better. It passes all the LLVM and Clang make tests like this.

I also experimented in adding this:

    if (It != ValueExprMap.end()) {
      eraseValueFromMap(It->first);
      forgetMemoizedResults(It->second);
      if (PHINode *PN = dyn_cast<PHINode>(I))
        ConstantEvolutionLoopExitValue.erase(PN);
-  }
+ } else
+    continue;

Which basically makes the algorithm to not continue following the def-use chain if we couldn't find a value in the ValueExprMap. I'm not sure this change is correct either, but it is worth noting that with this change the time of the method goes to 0.1% compile time in my case (from 10% originally) and still passes all the LLVM and Clang make tests.
I tried this change only from intuition though. Does anybody know if it is actually valid to do so?

Diff Detail

Repository: rL LLVM

Event Timeline

kariddi created this revision.Sep 9 2017, 2:47 AM

kariddi edited the summary of this revision. (Show Details)

kariddi edited the summary of this revision. (Show Details)Sep 9 2017, 2:51 AM

Fixed a typo

Corrected the wrong patch :P

This patch LGTM.

I don't think the modification you stated in the description is safe, but thinking about this a bit, I think ScalarEvolution::forgetLoop is fundamentally broken. It only looks at IR level PHI nodes and their users and evicts them from the caches, but not all things that need to be evicted can be found this way.

I think we'll need some sort of a per Loop use-list to implement forgetLoop correctly (which should also speed it up, at the cost of using more memory). I'll give this a shot.

lib/Analysis/ScalarEvolution.cpp
6370 ↗	(On Diff #114490)	Use `auto *`.
6414 ↗	(On Diff #114490)	This can be `LoopWorklist.append(CurrL->begin(), CurrL->end())`.

This revision is now accepted and ready to land.Sep 10 2017, 12:37 PM

Closed by commit rL312920: [ScalarEvolution] Refactor forgetLoop() to improve performance (authored by mggm). · Explain WhySep 11 2017, 8:45 AM

This revision was automatically updated to reflect the committed changes.

Thanks, addressed the nitpicks

Revision Contents

Path

Size

llvm/

trunk/

lib/

Analysis/

ScalarEvolution.cpp

85 lines

Diff 114622

llvm/trunk/lib/Analysis/ScalarEvolution.cpp

This file is larger than 256 KB, so syntax highlighting is disabled by default.

Show First 20 Lines • Show All 6,347 Lines • ▼ Show 20 Lines	ScalarEvolution::getBackedgeTakenInfo(const Loop *L) {
// loop), which would invalidate the iterator computed		// loop), which would invalidate the iterator computed
// earlier.		// earlier.
return BackedgeTakenCounts.find(L)->second = std::move(Result);		return BackedgeTakenCounts.find(L)->second = std::move(Result);
}		}

void ScalarEvolution::forgetLoop(const Loop *L) {		void ScalarEvolution::forgetLoop(const Loop *L) {
// Drop any stored trip count value.		// Drop any stored trip count value.
auto RemoveLoopFromBackedgeMap =		auto RemoveLoopFromBackedgeMap =
[L](DenseMap<const Loop *, BackedgeTakenInfo> &Map) {		[](DenseMap<const Loop , BackedgeTakenInfo> &Map, const Loop L) {
auto BTCPos = Map.find(L);		auto BTCPos = Map.find(L);
if (BTCPos != Map.end()) {		if (BTCPos != Map.end()) {
BTCPos->second.clear();		BTCPos->second.clear();
Map.erase(BTCPos);		Map.erase(BTCPos);
}		}
};		};

RemoveLoopFromBackedgeMap(BackedgeTakenCounts);		SmallVector<const Loop *, 16> LoopWorklist(1, L);
RemoveLoopFromBackedgeMap(PredicatedBackedgeTakenCounts);		SmallVector<Instruction *, 32> Worklist;
		SmallPtrSet<Instruction *, 16> Visited;

		// Iterate over all the loops and sub-loops to drop SCEV information.
		while (!LoopWorklist.empty()) {
		auto *CurrL = LoopWorklist.pop_back_val();

		RemoveLoopFromBackedgeMap(BackedgeTakenCounts, CurrL);
		RemoveLoopFromBackedgeMap(PredicatedBackedgeTakenCounts, CurrL);

// Drop information about predicated SCEV rewrites for this loop.		// Drop information about predicated SCEV rewrites for this loop.
for (auto I = PredicatedSCEVRewrites.begin();		for (auto I = PredicatedSCEVRewrites.begin();
I != PredicatedSCEVRewrites.end();) {		I != PredicatedSCEVRewrites.end();) {
std::pair<const SCEV , const Loop > Entry = I->first;		std::pair<const SCEV , const Loop > Entry = I->first;
if (Entry.second == L)		if (Entry.second == CurrL)
PredicatedSCEVRewrites.erase(I++);		PredicatedSCEVRewrites.erase(I++);
else		else
++I;		++I;
}		}

// Drop information about expressions based on loop-header PHIs.		// Drop information about expressions based on loop-header PHIs.
SmallVector<Instruction *, 16> Worklist;		PushLoopPHIs(CurrL, Worklist);
PushLoopPHIs(L, Worklist);

SmallPtrSet<Instruction *, 8> Visited;
while (!Worklist.empty()) {		while (!Worklist.empty()) {
Instruction *I = Worklist.pop_back_val();		Instruction *I = Worklist.pop_back_val();
if (!Visited.insert(I).second)		if (!Visited.insert(I).second)
continue;		continue;

ValueExprMapType::iterator It =		ValueExprMapType::iterator It =
ValueExprMap.find_as(static_cast<Value *>(I));		ValueExprMap.find_as(static_cast<Value *>(I));
if (It != ValueExprMap.end()) {		if (It != ValueExprMap.end()) {
eraseValueFromMap(It->first);		eraseValueFromMap(It->first);
forgetMemoizedResults(It->second);		forgetMemoizedResults(It->second);
if (PHINode *PN = dyn_cast<PHINode>(I))		if (PHINode *PN = dyn_cast<PHINode>(I))
ConstantEvolutionLoopExitValue.erase(PN);		ConstantEvolutionLoopExitValue.erase(PN);
}		}

PushDefUseChildren(I, Worklist);		PushDefUseChildren(I, Worklist);
}		}

for (auto I = ExitLimits.begin(); I != ExitLimits.end(); ++I) {		for (auto I = ExitLimits.begin(); I != ExitLimits.end(); ++I) {
auto &Query = I->first;		auto &Query = I->first;
if (Query.L == L)		if (Query.L == CurrL)
ExitLimits.erase(I);		ExitLimits.erase(I);
}		}

		LoopPropertiesCache.erase(CurrL);
// Forget all contained loops too, to avoid dangling entries in the		// Forget all contained loops too, to avoid dangling entries in the
// ValuesAtScopes map.		// ValuesAtScopes map.
for (Loop I : L)		LoopWorklist.append(CurrL->begin(), CurrL->end());
forgetLoop(I);		}

LoopPropertiesCache.erase(L);
}		}

void ScalarEvolution::forgetValue(Value *V) {		void ScalarEvolution::forgetValue(Value *V) {
Instruction *I = dyn_cast<Instruction>(V);		Instruction *I = dyn_cast<Instruction>(V);
if (!I) return;		if (!I) return;

// Drop information about expressions based on loop-header PHIs.		// Drop information about expressions based on loop-header PHIs.
SmallVector<Instruction *, 16> Worklist;		SmallVector<Instruction *, 16> Worklist;
▲ Show 20 Lines • Show All 5,168 Lines • Show Last 20 Lines