This is an archive of the discontinued LLVM Phabricator instance.

[MemorySSA] Fix exponential compile-time updating MemorySSA.
ClosedPublic

Authored by efriedma on Mar 20 2018, 3:47 PM.

Download Raw Diff

Details

Reviewers

Commits

rG88e2bac94d14: [MemorySSA] Fix exponential compile-time updating MemorySSA.
rL328577: [MemorySSA] Fix exponential compile-time updating MemorySSA.

Summary

MemorySSAUpdater::getPreviousDefRecursive is a recursive algorithm, for each block, it computes the previous definition for each predecessor, then takes those definitions and combines them. But currently it doesn't remember results which it already computed; this means it can visit the same block multiple times, which adds up to exponential time overall.

To fix this, this patch adds a cache. If we computed the result for a block already, we don't need to visit it again because we'll come up with the same result. Well, unless we RAUW a MemoryPHI; in that case, the TrackingVH will be updated automatically.

I'm not sure this is the best fix, but it seems to work.

The testcase isn't really a test for the bug, but it adds coverage for the case where tryRemoveTrivialPhi erases an existing PHI node. (It's hard to write a good regression test for a performance issue.)

Diff Detail

Repository: rL LLVM

Event Timeline

efriedma created this revision.Mar 20 2018, 3:47 PM

Herald added a subscriber: Prazek. · View Herald TranscriptMar 20 2018, 3:47 PM

So, staring at https://pp.info.uni-karlsruhe.de/uploads/publikationen/braun13cc.pdf, i don't see how we can get exponential time, and thus wonder if i screwed this up somehow.
I can see we walk blocks to find the defs and they don't have to (because of how they use it), but that should only add a factor of O(N).

In particular, i believe each block should only ever be visited twice.

In particular, they should have the same time bounds as us, if both algorithms operate on blocks that each contains 1 instruction.
If the exponential factor comes from repeated walking, they should be just as exponential
Thus, my assumption is we screwed up.

(but otherwise, if we discover the paper is wrong/etc, this change looks reasnable)

I think the problem comes from the way we implement the marker variation.

In the main algorithm, readVariableRecursive only calls itself recursively in the case where a block has one predecessor. In this case, it's theoretically possible to visit a block an arbitrary number of times if you have a deeply nested if statement, but that's at worst quadratic in the number of nested if statements. (I'm not sure the complexity analysis in the paper accounts for this correctly, but it probably doesn't have much practical impact.)

The exponential-time problem comes from the additional recursive calls introduced by the marker variation, I think. The variation isn't fully described in the paper, but adding recursive calls for blocks with multiple predecessors makes the algorithm exponential if you don't use some sort of cache to stop the recursion.

(there is a longer report version that describes the marker variation in detail that is escaping me at the moment. There is also code that implements it in clang).

So staring, i think the main difference is in fact, like you say, that they have an effective cache and we don't.
In particular, the marker variation code looks at the variable map before continuing to traverse blocks, acting exactly as the cache you've implemented. It stores the ongoing results in writeVariable (which we don't have) to avoid having to find them again.
(I didn't notice this at first because the coding style is truly horrible).

So, IMHO, what you've done seems exactly right.

We should be updating the CachedPreviousDef for phis we insert/remove as well.
(IOW, it should be updated where you see the writevariable calls in the algorithm, i believe)

This revision now requires changes to proceed.Mar 21 2018, 2:59 PM

Update the MemoryAccess cache for blocks with multiple predecessors.

LGTM, thanks!

(I do wonder if we shouldn't add a unit test to the MemorySSAUpdater unit tests that just has 100 nested if or something that will take forever with exponential time but be fine with N^2 time, but otherwise, ...)

This revision is now accepted and ready to land.Mar 23 2018, 3:18 PM

Closed by commit rL328577: [MemorySSA] Fix exponential compile-time updating MemorySSA. (authored by efriedma). · Explain WhyMar 26 2018, 12:55 PM

This revision was automatically updated to reflect the committed changes.

Revision Contents

Path

Size

llvm/

trunk/

include/

llvm/

Analysis/

MemorySSAUpdater.h

9 lines

lib/

Analysis/

MemorySSAUpdater.cpp

43 lines

test/

Transforms/

GVNHoist/

hoist-simplify-phi.ll

54 lines

Diff 139840

llvm/trunk/include/llvm/Analysis/MemorySSAUpdater.h

Show All 37 Lines
#include "llvm/IR/BasicBlock.h"		#include "llvm/IR/BasicBlock.h"
#include "llvm/IR/Dominators.h"		#include "llvm/IR/Dominators.h"
#include "llvm/IR/Module.h"		#include "llvm/IR/Module.h"
#include "llvm/IR/OperandTraits.h"		#include "llvm/IR/OperandTraits.h"
#include "llvm/IR/Type.h"		#include "llvm/IR/Type.h"
#include "llvm/IR/Use.h"		#include "llvm/IR/Use.h"
#include "llvm/IR/User.h"		#include "llvm/IR/User.h"
#include "llvm/IR/Value.h"		#include "llvm/IR/Value.h"
		#include "llvm/IR/ValueHandle.h"
#include "llvm/Pass.h"		#include "llvm/Pass.h"
#include "llvm/Support/Casting.h"		#include "llvm/Support/Casting.h"
#include "llvm/Support/ErrorHandling.h"		#include "llvm/Support/ErrorHandling.h"

namespace llvm {		namespace llvm {

class Function;		class Function;
class Instruction;		class Instruction;
▲ Show 20 Lines • Show All 82 Lines • ▼ Show 20 Lines	public:
void removeMemoryAccess(MemoryAccess *);		void removeMemoryAccess(MemoryAccess *);

private:		private:
// Move What before Where in the MemorySSA IR.		// Move What before Where in the MemorySSA IR.
template <class WhereType>		template <class WhereType>
void moveTo(MemoryUseOrDef What, BasicBlock BB, WhereType Where);		void moveTo(MemoryUseOrDef What, BasicBlock BB, WhereType Where);
MemoryAccess getPreviousDef(MemoryAccess );		MemoryAccess getPreviousDef(MemoryAccess );
MemoryAccess getPreviousDefInBlock(MemoryAccess );		MemoryAccess getPreviousDefInBlock(MemoryAccess );
MemoryAccess getPreviousDefFromEnd(BasicBlock );		MemoryAccess *
MemoryAccess getPreviousDefRecursive(BasicBlock );		getPreviousDefFromEnd(BasicBlock *,
		DenseMap<BasicBlock *, TrackingVH<MemoryAccess>> &);
		MemoryAccess *
		getPreviousDefRecursive(BasicBlock *,
		DenseMap<BasicBlock *, TrackingVH<MemoryAccess>> &);
MemoryAccess recursePhi(MemoryAccess Phi);		MemoryAccess recursePhi(MemoryAccess Phi);
template <class RangeType>		template <class RangeType>
MemoryAccess tryRemoveTrivialPhi(MemoryPhi Phi, RangeType &Operands);		MemoryAccess tryRemoveTrivialPhi(MemoryPhi Phi, RangeType &Operands);
void fixupDefs(const SmallVectorImpl<MemoryAccess *> &);		void fixupDefs(const SmallVectorImpl<MemoryAccess *> &);
};		};
} // end namespace llvm		} // end namespace llvm

#endif // LLVM_ANALYSIS_MEMORYSSAUPDATER_H		#endif // LLVM_ANALYSIS_MEMORYSSAUPDATER_H

llvm/trunk/lib/Analysis/MemorySSAUpdater.cpp

Show All 31 Lines
// Static Single Assignment Form"		// Static Single Assignment Form"
// The simple, non-marker algorithm places phi nodes at any join		// The simple, non-marker algorithm places phi nodes at any join
// Here, we place markers, and only place phi nodes if they end up necessary.		// Here, we place markers, and only place phi nodes if they end up necessary.
// They are only necessary if they break a cycle (IE we recursively visit		// They are only necessary if they break a cycle (IE we recursively visit
// ourselves again), or we discover, while getting the value of the operands,		// ourselves again), or we discover, while getting the value of the operands,
// that there are two or more definitions needing to be merged.		// that there are two or more definitions needing to be merged.
// This still will leave non-minimal form in the case of irreducible control		// This still will leave non-minimal form in the case of irreducible control
// flow, where phi nodes may be in cycles with themselves, but unnecessary.		// flow, where phi nodes may be in cycles with themselves, but unnecessary.
MemoryAccess MemorySSAUpdater::getPreviousDefRecursive(BasicBlock BB) {		MemoryAccess *MemorySSAUpdater::getPreviousDefRecursive(
		BasicBlock *BB,
		DenseMap<BasicBlock *, TrackingVH<MemoryAccess>> &CachedPreviousDef) {
		// First, do a cache lookup. Without this cache, certain CFG structures
		// (like a series of if statements) take exponential time to visit.
		auto Cached = CachedPreviousDef.find(BB);
		if (Cached != CachedPreviousDef.end()) {
		return Cached->second;
		} else if (BasicBlock *Pred = BB->getSinglePredecessor()) {
// Single predecessor case, just recurse, we can only have one definition.		// Single predecessor case, just recurse, we can only have one definition.
if (BasicBlock *Pred = BB->getSinglePredecessor()) {		MemoryAccess *Result = getPreviousDefFromEnd(Pred, CachedPreviousDef);
return getPreviousDefFromEnd(Pred);		CachedPreviousDef.insert({BB, Result});
		return Result;
} else if (VisitedBlocks.count(BB)) {		} else if (VisitedBlocks.count(BB)) {
// We hit our node again, meaning we had a cycle, we must insert a phi		// We hit our node again, meaning we had a cycle, we must insert a phi
// node to break it so we have an operand. The only case this will		// node to break it so we have an operand. The only case this will
// insert useless phis is if we have irreducible control flow.		// insert useless phis is if we have irreducible control flow.
return MSSA->createMemoryPhi(BB);		MemoryAccess *Result = MSSA->createMemoryPhi(BB);
		CachedPreviousDef.insert({BB, Result});
		return Result;
} else if (VisitedBlocks.insert(BB).second) {		} else if (VisitedBlocks.insert(BB).second) {
// Mark us visited so we can detect a cycle		// Mark us visited so we can detect a cycle
SmallVector<MemoryAccess *, 8> PhiOps;		SmallVector<MemoryAccess *, 8> PhiOps;

// Recurse to get the values in our predecessors for placement of a		// Recurse to get the values in our predecessors for placement of a
// potential phi node. This will insert phi nodes if we cycle in order to		// potential phi node. This will insert phi nodes if we cycle in order to
// break the cycle and have an operand.		// break the cycle and have an operand.
for (auto *Pred : predecessors(BB))		for (auto *Pred : predecessors(BB))
PhiOps.push_back(getPreviousDefFromEnd(Pred));		PhiOps.push_back(getPreviousDefFromEnd(Pred, CachedPreviousDef));

// Now try to simplify the ops to avoid placing a phi.		// Now try to simplify the ops to avoid placing a phi.
// This may return null if we never created a phi yet, that's okay		// This may return null if we never created a phi yet, that's okay
MemoryPhi *Phi = dyn_cast_or_null<MemoryPhi>(MSSA->getMemoryAccess(BB));		MemoryPhi *Phi = dyn_cast_or_null<MemoryPhi>(MSSA->getMemoryAccess(BB));
bool PHIExistsButNeedsUpdate = false;		bool PHIExistsButNeedsUpdate = false;
// See if the existing phi operands match what we need.		// See if the existing phi operands match what we need.
// Unlike normal SSA, we only allow one phi node per block, so we can't just		// Unlike normal SSA, we only allow one phi node per block, so we can't just
// create a new one.		// create a new one.
Show All 19 Lines	if (Result == Phi) {
Phi->addIncoming(PhiOps[i++], Pred);		Phi->addIncoming(PhiOps[i++], Pred);
InsertedPHIs.push_back(Phi);		InsertedPHIs.push_back(Phi);
}		}
Result = Phi;		Result = Phi;
}		}

// Set ourselves up for the next variable by resetting visited state.		// Set ourselves up for the next variable by resetting visited state.
VisitedBlocks.erase(BB);		VisitedBlocks.erase(BB);
		CachedPreviousDef.insert({BB, Result});
return Result;		return Result;
}		}
llvm_unreachable("Should have hit one of the three cases above");		llvm_unreachable("Should have hit one of the three cases above");
}		}

// This starts at the memory access, and goes backwards in the block to find the		// This starts at the memory access, and goes backwards in the block to find the
// previous definition. If a definition is not found the block of the access,		// previous definition. If a definition is not found the block of the access,
// it continues globally, creating phi nodes to ensure we have a single		// it continues globally, creating phi nodes to ensure we have a single
// definition.		// definition.
MemoryAccess MemorySSAUpdater::getPreviousDef(MemoryAccess MA) {		MemoryAccess MemorySSAUpdater::getPreviousDef(MemoryAccess MA) {
auto *LocalResult = getPreviousDefInBlock(MA);		if (auto *LocalResult = getPreviousDefInBlock(MA))
		return LocalResult;
return LocalResult ? LocalResult : getPreviousDefRecursive(MA->getBlock());		DenseMap<BasicBlock *, TrackingVH<MemoryAccess>> CachedPreviousDef;
		return getPreviousDefRecursive(MA->getBlock(), CachedPreviousDef);
}		}

// This starts at the memory access, and goes backwards in the block to the find		// This starts at the memory access, and goes backwards in the block to the find
// the previous definition. If the definition is not found in the block of the		// the previous definition. If the definition is not found in the block of the
// access, it returns nullptr.		// access, it returns nullptr.
MemoryAccess MemorySSAUpdater::getPreviousDefInBlock(MemoryAccess MA) {		MemoryAccess MemorySSAUpdater::getPreviousDefInBlock(MemoryAccess MA) {
auto *Defs = MSSA->getWritableBlockDefs(MA->getBlock());		auto *Defs = MSSA->getWritableBlockDefs(MA->getBlock());

Show All 14 Lines	if (!isa<MemoryUse>(MA)) {
// Note that if MA comes before Defs->begin(), we won't hit a def.		// Note that if MA comes before Defs->begin(), we won't hit a def.
return nullptr;		return nullptr;
}		}
}		}
return nullptr;		return nullptr;
}		}

// This starts at the end of block		// This starts at the end of block
MemoryAccess MemorySSAUpdater::getPreviousDefFromEnd(BasicBlock BB) {		MemoryAccess *MemorySSAUpdater::getPreviousDefFromEnd(
		BasicBlock *BB,
		DenseMap<BasicBlock *, TrackingVH<MemoryAccess>> &CachedPreviousDef) {
auto *Defs = MSSA->getWritableBlockDefs(BB);		auto *Defs = MSSA->getWritableBlockDefs(BB);

if (Defs)		if (Defs)
return &*Defs->rbegin();		return &*Defs->rbegin();

return getPreviousDefRecursive(BB);		return getPreviousDefRecursive(BB, CachedPreviousDef);
}		}
// Recurse over a set of phi uses to eliminate the trivial ones		// Recurse over a set of phi uses to eliminate the trivial ones
MemoryAccess MemorySSAUpdater::recursePhi(MemoryAccess Phi) {		MemoryAccess MemorySSAUpdater::recursePhi(MemoryAccess Phi) {
if (!Phi)		if (!Phi)
return nullptr;		return nullptr;
TrackingVH<MemoryAccess> Res(Phi);		TrackingVH<MemoryAccess> Res(Phi);
SmallVector<TrackingVH<Value>, 8> Uses;		SmallVector<TrackingVH<Value>, 8> Uses;
std::copy(Phi->user_begin(), Phi->user_end(), std::back_inserter(Uses));		std::copy(Phi->user_begin(), Phi->user_end(), std::back_inserter(Uses));
▲ Show 20 Lines • Show All 74 Lines • ▼ Show 20 Lines
// construction algorithm.		// construction algorithm.
// Then, we update the defs below us (and any new phi nodes) in the graph to		// Then, we update the defs below us (and any new phi nodes) in the graph to
// point to the correct new defs, to ensure we only have one variable, and no		// point to the correct new defs, to ensure we only have one variable, and no
// disconnected stores.		// disconnected stores.
void MemorySSAUpdater::insertDef(MemoryDef *MD, bool RenameUses) {		void MemorySSAUpdater::insertDef(MemoryDef *MD, bool RenameUses) {
InsertedPHIs.clear();		InsertedPHIs.clear();

// See if we had a local def, and if not, go hunting.		// See if we had a local def, and if not, go hunting.
MemoryAccess *DefBefore = getPreviousDefInBlock(MD);		MemoryAccess *DefBefore = getPreviousDef(MD);
bool DefBeforeSameBlock = DefBefore != nullptr;		bool DefBeforeSameBlock = DefBefore->getBlock() == MD->getBlock();
if (!DefBefore)
DefBefore = getPreviousDefRecursive(MD->getBlock());

// There is a def before us, which means we can replace any store/phi uses		// There is a def before us, which means we can replace any store/phi uses
// of that thing with us, since we are in the way of whatever was there		// of that thing with us, since we are in the way of whatever was there
// before.		// before.
// We now define that def's memorydefs and memoryphis		// We now define that def's memorydefs and memoryphis
if (DefBeforeSameBlock) {		if (DefBeforeSameBlock) {
for (auto UI = DefBefore->use_begin(), UE = DefBefore->use_end();		for (auto UI = DefBefore->use_begin(), UE = DefBefore->use_end();
UI != UE;) {		UI != UE;) {
▲ Show 20 Lines • Show All 241 Lines • Show Last 20 Lines

llvm/trunk/test/Transforms/GVNHoist/hoist-simplify-phi.ll

				; RUN: opt < %s -gvn-hoist -S \| FileCheck %s

				; This test is meant to make sure that MemorySSAUpdater works correctly
				; in non-trivial cases.

				; CHECK: if.else218:
				; CHECK-NEXT: %0 = getelementptr inbounds %s, %s* undef, i32 0, i32 0
				; CHECK-NEXT: %1 = load i32, i32* %0, align 4

				target datalayout = "e-m:e-p:32:32-i64:64-v128:64:128-a:0:32-n32-S64"

				%s = type { i32, %s**, [3 x i8], i8 }

				define void @test() {
				entry:
				br label %cond.end118

				cond.end118: ; preds = %entry
				br i1 undef, label %cleanup, label %if.end155

				if.end155: ; preds = %cond.end118
				br label %while.cond

				while.cond: ; preds = %while.body, %if.end155
				br i1 undef, label %while.end, label %while.body

				while.body: ; preds = %while.cond
				br label %while.cond

				while.end: ; preds = %while.cond
				switch i32 undef, label %if.else218 [
				i32 1, label %cleanup
				i32 0, label %if.then174
				]

				if.then174: ; preds = %while.end
				unreachable

				if.else218: ; preds = %while.end
				br i1 undef, label %if.then226, label %if.else326

				if.then226: ; preds = %if.else218
				%size227 = getelementptr inbounds %s, %s* undef, i32 0, i32 0
				%0 = load i32, i32* %size227, align 4
				unreachable

				if.else326: ; preds = %if.else218
				%size330 = getelementptr inbounds %s, %s* undef, i32 0, i32 0
				%1 = load i32, i32* %size330, align 4
				unreachable

				cleanup: ; preds = %while.end, %cond.end118
				ret void
				}