This is an archive of the discontinued LLVM Phabricator instance.

Paths

Table of Contentst

-
include/llvm/Transforms/Utils/
-
llvm/
-
Transforms/
-
Utils/
-
MemorySSA.h
-
lib/Transforms/Utils/
-
Transforms/
-
Utils/
9/13
MemorySSA.cpp
-
test/Transforms/Util/MemorySSA/
-
Transforms/
-
Util/
-
MemorySSA/
-
cyclicphi.ll
-
phi-translation.ll

Differential D21777

[MemorySSA] Switch to a different walker
ClosedPublic

Authored by george.burgess.iv on Jun 27 2016, 5:24 PM.

Download Raw Diff

Details

Reviewers

reames
gberry
• dberlin

Commits

rG5f30897b7bb9: [MemorySSA] Update to the new shiny walker.
rL275940: [MemorySSA] Update to the new shiny walker.

Summary

This patch switches MemorySSA to use a new walker. The motivating reasons for this are:

Accuracy: the old walker cached things *really* eagerly, so it had to be overly conservative in many cases.
Speed of phi optimization: in some cases, the old walker would walk *way* more than it had to in order to determine if a phi was optimizable or not. This walker takes a more incremental approach, so we do as little work as we reasonably can.
Flexibility: this walker can be extended without much effort to do things like getting all clobbers that block a phi optimization. This will supposedly be useful in the future. Also, phi optimization is split into its own little world, so we can easily switch it off entirely if we want to (e.g. for -O1).
Testability: we can turn the cache on/off, so it's trivial to check whether the cache causes us to give a different answer than manually walking.

The big parts of this patch (in terms of line count) are caching and phi optimization. Phi optimization is more or less an iteratively expanding DFS; given a phi P, we figure out what the nearest legal optimization of P, Q, would be, then we see if there are any clobbers on any path from P to Q. If so, we quit. If not, we walk from Q to its dominating phi, and repeat. Caching is done after we've done all walking. Caching is just difficult in general. :)

This patch also includes a stupidly simple verify-this-optimization-isn't-broken function.

This patch verifies with EXPENSIVE_CHECKS enabled for MSSA.cpp when bootstrapping all of clang/LLVM.

Other smaller (but still noteworthy) things:

With this patch, we shouldn't cache MemoryUses anymore. This is desirable, since they always point to their clobber, so having them in the cache was kind of redundant
When running this across LLVM/clang, I didn't notice a major speed difference between this and the old walker. A script tells me it's in the noise, but note that the test wasn't quite scientific (read: time ninja -j$((NCORES * 1.1)))
The explicit call caching was removed from the upwards query. If it needs to be there, it can be readded without much effort, but the goal was to keep this patch as simple as I could, while not dropping accuracy.
The ClobberWalker::reset() function exists because an idea that's been thrown around is a bulk-update MSSA API (or some kind of thing where we'll do N updates back-to-back). If we don't need to drop the walker's cache of BB -> nearest-optimizable-access, then it shouldn't be terrible to keep it

Finally, note that there were some refactors split out from this, so it's easier to review. If you want to run things locally, you'll need to apply this on top of D21776.

Diff Detail

Event Timeline

george.burgess.iv updated this revision to Diff 62045.Jun 27 2016, 5:24 PM

george.burgess.iv retitled this revision from to [MemorySSA] Switch to a different walker.

george.burgess.iv updated this object.

george.burgess.iv added reviewers: • dberlin, reames, gberry.

george.burgess.iv added a parent revision: D21776: MSSA Walker Pre-refactor.

george.burgess.iv added a subscriber: llvm-commits.

The explicit call caching was removed from the upwards query. If it

needs to be there, it can be readded without much effort, but the goal was
to keep this patch as simple as I could, while not dropping accuracy.

Sure. I added this because it makes newgvn faster, but i can make a walker
for newgvn that caches more, it doesn't have to be the default walker :)

The ClobberWalker::reset() function exists because an idea that's been

thrown around is a bulk-update MSSA API (or some kind of thing where we'll
do N updates back-to-back). If we don

FWIW, as mentioned on a different thread, you can't make bulk update faster
than "destroy everything" without a bunch of work.

gberry added a child revision: D19821: [EarlyCSE] Optionally use MemorySSA. NFC..Jul 5 2016, 11:00 AM

Is anyone planning on looking at this soon? I can probably spend some time this week looking at it but I assumed @dberlin was planning on looking at it?
FWIW, I have some EarlyCSE & LICM w/ MemorySSA performance measurements that I believe are blocked on this change.

(FYI: rebasing this patch wasn't trivial when I tried to do so earlier today; I'll get out an updated version that applies cleanly to trunk in a bit.)

Fix up the comments and LGTM

lib/Transforms/Utils/MemorySSA.cpp
198	This seems like a really expensive assert, no? (maybe it should be in XDEBUG or whatever expensive checking is these days)
206	Please document what the parameters are supposed to be ;)
212	Nit: Move this right before the while loop start.
220	A comment on what, precisely, this is doing, would be helpful.
222	Why? (don't answer, just document :P)
229	Why not \|= instead of HadDefinitiveClobber = HadDefinitiveClobber \|\| :)
351	This is probably better named WalkTargetCache :)
357	I assume this is not expensive in practice, because otherwise, you can just track it when we build memoryssa, and always have an answer to "lastnonuse for a block". (even under updates, it's trivial to update in O(1)) Right now this looks N^2 since the dom tree above us could contain every block, and it could be all loads and then one store at the top.
1626	remove the heh :)

This revision is now accepted and ready to land.Jul 18 2016, 3:41 PM

george.burgess.iv marked 9 inline comments as done.Jul 18 2016, 6:36 PM

george.burgess.iv added inline comments.

lib/Transforms/Utils/MemorySSA.cpp
198	Good point
229	I think `\|=` wouldn't let us short-circuit, and given that `instructionClobbersQuery` isn't completely trivial, I think short-circuiting may be beneficial here. :) If I'm wrong, I'm happy to swap to `\|=` in a followup commit.
357	otherwise, you can just track it when we build memoryssa, and always have an answer to "lastnonuse for a block". Given that I ended up un-killing `findDominatingDef`, (and that this basically does the same job as `findDominatingDef` AFAICT), it may be a good idea to swap to that in the future, yeah. Will add a FIXME. :) Right now this looks N^2 since the dom tree above us could contain every block, and it could be all loads and then one store at the top. FWIW, Because we have caching (`PhiTargetCache`), it's linear-time worst-case when we're initially optimizing uses. We kill that cache when updates happen, though, because it relies on the domtree/set of MemoryDefs not changing, so it may be n^2 there.
1626	Oops :)

Closed by commit rL275940: [MemorySSA] Update to the new shiny walker. (authored by • gbiv). · Explain WhyJul 18 2016, 6:36 PM

This revision was automatically updated to reflect the committed changes.

george.burgess.iv mentioned this in D21776: MSSA Walker Pre-refactor.Jul 18 2016, 6:38 PM

Revision Contents

Path

Size

include/

llvm/

Transforms/

Utils/

MemorySSA.h

6 lines

lib/

Transforms/

Utils/

MemorySSA.cpp

847 lines

test/

Transforms/

Util/

MemorySSA/

cyclicphi.ll

3 lines

phi-translation.ll

3 lines

Diff 62045

include/llvm/Transforms/Utils/MemorySSA.h

Show First 20 Lines • Show All 570 Lines • ▼ Show 20 Lines	public:
/// \brief Remove a MemoryAccess from MemorySSA, including updating all		/// \brief Remove a MemoryAccess from MemorySSA, including updating all
/// definitions and uses.		/// definitions and uses.
/// This should be called when a memory instruction that has a MemoryAccess		/// This should be called when a memory instruction that has a MemoryAccess
/// associated with it is erased from the program. For example, if a store or		/// associated with it is erased from the program. For example, if a store or
/// load is simply erased (not replaced), removeMemoryAccess should be called		/// load is simply erased (not replaced), removeMemoryAccess should be called
/// on the MemoryAccess for that store/load.		/// on the MemoryAccess for that store/load.
void removeMemoryAccess(MemoryAccess *);		void removeMemoryAccess(MemoryAccess *);

/// \brief Given two memory accesses in the same basic block, determine		/// \brief Given two memory accesses in potentially different blocks,
/// whether MemoryAccess \p A dominates MemoryAccess \p B.		/// determine whether MemoryAccess \p A dominates MemoryAccess \p B.
bool locallyDominates(const MemoryAccess A, const MemoryAccess B) const;		bool dominates(const MemoryAccess A, const MemoryAccess B) const;

/// \brief Verify that MemorySSA is self consistent (IE definitions dominate		/// \brief Verify that MemorySSA is self consistent (IE definitions dominate
/// all uses, uses appear in the right places). This is used by unit tests.		/// all uses, uses appear in the right places). This is used by unit tests.
void verifyMemorySSA() const;		void verifyMemorySSA() const;

protected:		protected:
// Used by Memory SSA annotater, dumpers, and wrapper pass		// Used by Memory SSA annotater, dumpers, and wrapper pass
friend class MemorySSAAnnotatedWriter;		friend class MemorySSAAnnotatedWriter;
▲ Show 20 Lines • Show All 350 Lines • Show Last 20 Lines

lib/Transforms/Utils/MemorySSA.cpp

Show All 11 Lines
//===----------------------------------------------------------------===//		//===----------------------------------------------------------------===//
#include "llvm/Transforms/Utils/MemorySSA.h"		#include "llvm/Transforms/Utils/MemorySSA.h"
#include "llvm/ADT/DenseMap.h"		#include "llvm/ADT/DenseMap.h"
#include "llvm/ADT/DenseSet.h"		#include "llvm/ADT/DenseSet.h"
#include "llvm/ADT/DepthFirstIterator.h"		#include "llvm/ADT/DepthFirstIterator.h"
#include "llvm/ADT/GraphTraits.h"		#include "llvm/ADT/GraphTraits.h"
#include "llvm/ADT/PostOrderIterator.h"		#include "llvm/ADT/PostOrderIterator.h"
#include "llvm/ADT/STLExtras.h"		#include "llvm/ADT/STLExtras.h"
		#include "llvm/ADT/SmallBitVector.h"
#include "llvm/ADT/SmallPtrSet.h"		#include "llvm/ADT/SmallPtrSet.h"
#include "llvm/ADT/SmallSet.h"		#include "llvm/ADT/SmallSet.h"
#include "llvm/ADT/Statistic.h"		#include "llvm/ADT/Statistic.h"
#include "llvm/Analysis/AliasAnalysis.h"		#include "llvm/Analysis/AliasAnalysis.h"
#include "llvm/Analysis/CFG.h"		#include "llvm/Analysis/CFG.h"
#include "llvm/Analysis/GlobalsModRef.h"		#include "llvm/Analysis/GlobalsModRef.h"
#include "llvm/Analysis/IteratedDominanceFrontier.h"		#include "llvm/Analysis/IteratedDominanceFrontier.h"
#include "llvm/Analysis/MemoryLocation.h"		#include "llvm/Analysis/MemoryLocation.h"
▲ Show 20 Lines • Show All 54 Lines • ▼ Show 20 Lines
struct UpwardsMemoryQuery {		struct UpwardsMemoryQuery {
// True if our original query started off as a call		// True if our original query started off as a call
bool IsCall;		bool IsCall;
// The pointer location we started the query with. This will be empty if		// The pointer location we started the query with. This will be empty if
// IsCall is true.		// IsCall is true.
MemoryLocation StartingLoc;		MemoryLocation StartingLoc;
// This is the instruction we were querying about.		// This is the instruction we were querying about.
const Instruction *Inst;		const Instruction *Inst;
// Set of visited Instructions for this query.
DenseSet<MemoryAccessPair> Visited;
// Vector of visited call accesses for this query. This is separated out
// because you can always cache and lookup the result of call queries (IE when
// IsCall == true) for every call in the chain. The calls have no AA location
// associated with them with them, and thus, no context dependence.
SmallVector<const MemoryAccess *, 32> VisitedCalls;
// The MemoryAccess we actually got called with, used to test local domination		// The MemoryAccess we actually got called with, used to test local domination
const MemoryAccess *OriginalAccess;		const MemoryAccess *OriginalAccess;

UpwardsMemoryQuery()		UpwardsMemoryQuery()
: IsCall(false), Inst(nullptr), OriginalAccess(nullptr) {}		: IsCall(false), Inst(nullptr), OriginalAccess(nullptr) {}

UpwardsMemoryQuery(const Instruction Inst, const MemoryAccess Access)		UpwardsMemoryQuery(const Instruction Inst, const MemoryAccess Access)
: IsCall(ImmutableCallSite(Inst)), Inst(Inst), OriginalAccess(Access) {		: IsCall(ImmutableCallSite(Inst)), Inst(Inst), OriginalAccess(Access) {
▲ Show 20 Lines • Show All 62 Lines • ▼ Show 20 Lines	for (auto &P : Accesses)
return true;		return true;
for (auto &P : Calls)		for (auto &P : Calls)
if (P.first == MA \|\| P.second == MA)		if (P.first == MA \|\| P.second == MA)
return true;		return true;
return false;		return false;
}		}
};		};

		/// Walks the defining uses of MemoryDefs. Stops after we hit something that has
		/// no defining use (e.g. a MemoryPhi or liveOnEntry). Note that, when comparing
		/// against a null def_chain_iterator, this will compare equal only after
		/// walking said Phi/liveOnEntry.
		struct def_chain_iterator
		: public iterator_facade_base<def_chain_iterator, std::forward_iterator_tag,
		MemoryAccess *> {
		def_chain_iterator() : MA(nullptr) {}
		def_chain_iterator(MemoryAccess *MA) : MA(MA) {}

		MemoryAccess operator() const { return MA; }

		def_chain_iterator &operator++() {
		// N.B. liveOnEntry has a null defining access.
		if (auto *MUD = dyn_cast<MemoryUseOrDef>(MA))
		MA = MUD->getDefiningAccess();
		else
		MA = nullptr;
		return *this;
		}

		bool operator==(const def_chain_iterator &O) const { return MA == O.MA; }

		private:
		MemoryAccess *MA;
		};

		static iterator_range<def_chain_iterator>
		def_chain(MemoryAccess MA, MemoryAccess UpTo = nullptr) {
		assert((!UpTo \|\| find(def_chain(MA), UpTo) != def_chain_iterator()) &&
		dberlinUnsubmitted Not Done Reply Inline Actions This seems like a really expensive assert, no? (maybe it should be in XDEBUG or whatever expensive checking is these days) dberlin: This seems like a really expensive assert, no? (maybe it should be in XDEBUG or whatever…
		george.burgess.ivAuthorUnsubmitted Not Done Reply Inline Actions Good point george.burgess.iv: Good point
		"UpTo isn't in the def chain!");
		return make_range(def_chain_iterator(MA), def_chain_iterator(UpTo));
		}

		/// Verify the result of a memory query. This basically re-walks everything on
		/// its own, and is meant to be as simple and self-contained as possible.
		/// Because it uses no cache, etc., it can get expensive.
		static void LLVM_ATTRIBUTE_UNUSED
		dberlinUnsubmitted Done Reply Inline Actions Please document what the parameters are supposed to be ;) dberlin: Please document what the parameters are supposed to be ;)
		checkClobberSanity(MemoryAccess Start, MemoryAccess ClobberAt,
		const MemoryLocation &StartLoc, const MemorySSA &MSSA,
		const UpwardsMemoryQuery &Query, AliasAnalysis &AA) {
		DenseSet<MemoryAccessPair> VisitedPhis;
		SmallVector<MemoryAccessPair, 8> Worklist;
		Worklist.emplace_back(Start, StartLoc);
		dberlinUnsubmitted Done Reply Inline Actions Nit: Move this right before the while loop start. dberlin: Nit: Move this right before the while loop start.

		assert(MSSA.dominates(ClobberAt, Start) && "Clobber doesn't dominate start?");
		assert((MSSA.isLiveOnEntryDef(Start) \|\| isa<MemoryPhi>(Start) \|\|
		Start != ClobberAt) &&
		"Start can't clobber itself!");

		bool HadDefinitiveClobber = false;
		while (!Worklist.empty()) {
		dberlinUnsubmitted Done Reply Inline Actions A comment on what, precisely, this is doing, would be helpful. dberlin: A comment on what, precisely, this is doing, would be helpful.
		MemoryAccessPair MAP = Worklist.pop_back_val();
		// No revisiting.
		dberlinUnsubmitted Done Reply Inline Actions Why? (don't answer, just document :P) dberlin: Why? (don't answer, just document :P)
		if (!VisitedPhis.insert(MAP).second)
		continue;

		for (MemoryAccess *MA : def_chain(MAP.first)) {
		if (MA == ClobberAt) {
		if (auto *MD = dyn_cast<MemoryDef>(MA)) {
		HadDefinitiveClobber =
		dberlinUnsubmitted Not Done Reply Inline Actions Why not \|= instead of HadDefinitiveClobber = HadDefinitiveClobber \|\| :) dberlin: Why not \|= instead of HadDefinitiveClobber = HadDefinitiveClobber \|\| :)
		george.burgess.ivAuthorUnsubmitted Not Done Reply Inline Actions I think `\|=` wouldn't let us short-circuit, and given that `instructionClobbersQuery` isn't completely trivial, I think short-circuiting may be beneficial here. :) If I'm wrong, I'm happy to swap to `\|=` in a followup commit. george.burgess.iv: I think `\|=` wouldn't let us short-circuit, and given that `instructionClobbersQuery` isn't…
		HadDefinitiveClobber \|\| MSSA.isLiveOnEntryDef(MD) \|\|
		instructionClobbersQuery(MD, MAP.second, Query, AA);
		} else {
		// Phis count as a clobber, for our purposes.
		HadDefinitiveClobber = true;
		}
		break;
		}

		// We should never hit liveOnEntry, unless it's the clobber.
		assert(!MSSA.isLiveOnEntryDef(MA) && "Hit liveOnEntry before clobber?");

		if (auto *MD = dyn_cast<MemoryDef>(MA)) {
		// If we have a loop, and Start wasn't the last def in its BB, then we
		// may see Start again. Quit early if we do.
		assert((MA == Start \|\|
		!instructionClobbersQuery(MD, MAP.second, Query, AA)) &&
		"Found clobber before reaching ClobberAt!");
		continue;
		}

		assert(isa<MemoryPhi>(MA));
		Worklist.append(upward_defs_begin({MA, MAP.second}), upward_defs_end());
		}
		}

		assert(HadDefinitiveClobber && "ClobberAt never acted as a clobber");
		}

		/// Our algorithm for walking (and trying to optimize) clobbers, all wrapped up
		/// in one class.
		class ClobberWalker {
		/// Save a few bytes by using unsigned instead of size_t.
		using ListIndex = unsigned;

		/// Represents a span of contiguous MemoryDefs, potentially ending in a
		/// MemoryPhi.
		struct DefPath {
		MemoryLocation Loc;
		// Note that, because we always walk in reverse, Last will always dominate
		// First. Also note that First and Last are inclusive.
		MemoryAccess *First;
		MemoryAccess *Last;
		// N.B. Blocker is currently basically unused. The goal is to use it to make
		// cache invalidation better, but we're not there yet.
		MemoryAccess *Blocker;
		Optional<ListIndex> Previous;

		DefPath(const MemoryLocation &Loc, MemoryAccess First, MemoryAccess Last,
		Optional<ListIndex> Previous)
		: Loc(Loc), First(First), Last(Last), Previous(Previous) {}

		DefPath(const MemoryLocation &Loc, MemoryAccess *Init,
		Optional<ListIndex> Previous)
		: DefPath(Loc, Init, Init, Previous) {}
		};

		const MemorySSA &MSSA;
		AliasAnalysis &AA;
		DominatorTree &DT;
		WalkerCache &WC;
		UpwardsMemoryQuery *Query;
		bool UseCache;

		// Phi optimization bookkeeping
		SmallVector<DefPath, 32> Paths;
		DenseSet<ConstMemoryAccessPair> VisitedPhis;
		DenseMap<const BasicBlock , MemoryAccess > PhiTargetCache;

		void setUseCache(bool Use) { UseCache = Use; }
		bool shouldIgnoreCache() const {
		// UseCache will only be false when we're debugging, or when expensive
		// checks are enabled. In either case, we don't care deeply about speed.
		return LLVM_UNLIKELY(!UseCache);
		}

		void addCacheEntry(const MemoryAccess What, MemoryAccess To,
		const MemoryLocation &Loc) const {
		// EXPENSIVE_CHECKS because most of these queries are redundant, and if What
		// and To are in the same BB, that gives us n^2 behavior.
		#ifdef EXPENSIVE_CHECKS
		assert(MSSA.dominates(To, What));
		#endif
		if (shouldIgnoreCache())
		return;
		WC.insert(What, To, Loc, Query->IsCall);
		}

		MemoryAccess lookupCache(const MemoryAccess MA, const MemoryLocation &Loc) {
		return shouldIgnoreCache() ? nullptr : WC.lookup(MA, Loc, Query->IsCall);
		}

		void cacheDefPath(const DefPath &DN, MemoryAccess *Target) const {
		if (shouldIgnoreCache())
		return;

		for (MemoryAccess *MA : def_chain(DN.First, DN.Last))
		addCacheEntry(MA, Target, DN.Loc);

		// DefPaths only express the path we walked. So, DN.Last could either be a
		// thing we want to cache, or not.
		if (DN.Last != Target)
		addCacheEntry(DN.Last, Target, DN.Loc);
		}

		/// Find the nearest def or phi that `From` can legally be optimized to.
		MemoryAccess getWalkTarget(const MemoryPhi From) {
		assert(!MSSA.isLiveOnEntryDef(From) && "liveOnEntry has no target.");
		assert(From->getNumOperands() && "Phi with no operands?");

		BasicBlock *BB = From->getBlock();
		auto At = PhiTargetCache.find(BB);
		if (At != PhiTargetCache.end())
		return At->second;

		SmallVector<const BasicBlock *, 8> ToCache;
		ToCache.push_back(BB);

		MemoryAccess *Result = MSSA.getLiveOnEntryDef();
		DomTreeNode *Node = DT.getNode(BB);
		while ((Node = Node->getIDom())) {
		auto At = PhiTargetCache.find(BB);
		dberlinUnsubmitted Done Reply Inline Actions This is probably better named WalkTargetCache :) dberlin: This is probably better named WalkTargetCache :)
		if (At != PhiTargetCache.end()) {
		Result = At->second;
		break;
		}

		auto *Accesses = MSSA.getBlockAccesses(Node->getBlock());
		dberlinUnsubmitted Not Done Reply Inline Actions I assume this is not expensive in practice, because otherwise, you can just track it when we build memoryssa, and always have an answer to "lastnonuse for a block". (even under updates, it's trivial to update in O(1)) Right now this looks N^2 since the dom tree above us could contain every block, and it could be all loads and then one store at the top. dberlin: I assume this is not expensive in practice, because otherwise, you can just track it when we…
		george.burgess.ivAuthorUnsubmitted Not Done Reply Inline Actions otherwise, you can just track it when we build memoryssa, and always have an answer to "lastnonuse for a block". Given that I ended up un-killing `findDominatingDef`, (and that this basically does the same job as `findDominatingDef` AFAICT), it may be a good idea to swap to that in the future, yeah. Will add a FIXME. :) Right now this looks N^2 since the dom tree above us could contain every block, and it could be all loads and then one store at the top. FWIW, Because we have caching (`PhiTargetCache`), it's linear-time worst-case when we're initially optimizing uses. We kill that cache when updates happen, though, because it relies on the domtree/set of MemoryDefs not changing, so it may be n^2 there. george.burgess.iv: > otherwise, you can just track it when we build memoryssa, and always have an answer to…
		if (Accesses) {
		auto Iter = find_if(reverse(*Accesses), [](const MemoryAccess &MA) {
		return !isa<MemoryUse>(MA);
		});
		if (Iter != Accesses->rend()) {
		Result = const_cast<MemoryAccess >(&Iter);
		break;
		}
		}

		ToCache.push_back(Node->getBlock());
		}

		for (const BasicBlock *BB : ToCache)
		PhiTargetCache.insert({BB, Result});
		return Result;
		}

		/// Result of calling walkToPhiOrClobber.
		struct UpwardsWalkResult {
		/// The "Result" of the walk. Either a clobber, the last thing we walked, or
		/// both.
		MemoryAccess *Result;
		bool IsKnownClobber;
		bool FromCache;
		};

		/// Walk to the next Phi or Clobber in the def chain starting at Desc.Last.
		/// This will update Desc.Last as it walks. It will (optionally) also stop at
		/// StopAt.
		///
		/// This does not test for whether StopAt is a clobber
		UpwardsWalkResult walkToPhiOrClobber(DefPath &Desc,
		MemoryAccess *StopAt = nullptr) {
		assert(!isa<MemoryUse>(Desc.Last) && "Uses don't exist in my world");

		for (MemoryAccess *Current : def_chain(Desc.Last)) {
		Desc.Last = Current;
		if (Current == StopAt)
		return {Current, false, false};

		if (auto *MD = dyn_cast<MemoryDef>(Current))
		if (MSSA.isLiveOnEntryDef(MD) \|\|
		instructionClobbersQuery(MD, Desc.Loc, *Query, AA))
		return {MD, true, false};

		// Cache checks must be done last, because if Current is a clobber, the
		// cache will contain the clobber for Current.
		if (MemoryAccess *MA = lookupCache(Current, Desc.Loc))
		return {MA, true, true};
		}

		assert(isa<MemoryPhi>(Desc.Last) &&
		"Ended at a non-clobber that's not a phi?");
		return {Desc.Last, false, false};
		}

		void addSearches(MemoryPhi *Phi, SmallVectorImpl<ListIndex> &PausedSearches,
		ListIndex PriorNode) {
		auto UpwardDefs = make_range(upward_defs_begin({Phi, Paths[PriorNode].Loc}),
		upward_defs_end());
		for (const MemoryAccessPair &P : UpwardDefs) {
		PausedSearches.push_back(Paths.size());
		Paths.emplace_back(P.second, P.first, PriorNode);
		}
		}

		/// Represents a search that terminated after finding a clobber. This clobber
		/// may or may not be present in the path of defs from LastNode..SearchStart,
		/// since it may have been retrieved from cache.
		struct TerminatedPath {
		MemoryAccess *Clobber;
		ListIndex LastNode;
		};

		/// Get an access that keeps us from optimizing to the given phi.
		///
		/// PausedSearches is an array of indices into the Paths array. Its incoming
		/// value is the indices of searches that stopped at the last phi optimization
		/// target. It's left in an unspecified state.
		///
		/// If this returns None, NewPaused is a vector of searches that terminated
		/// at StopWhere. Otherwise, NewPaused is left in an unspecified state.
		Optional<ListIndex>
		getBlockingAccess(MemoryAccess *StopWhere,
		SmallVectorImpl<ListIndex> &PausedSearches,
		SmallVectorImpl<ListIndex> &NewPaused,
		SmallVectorImpl<TerminatedPath> &Terminated) {
		assert(!PausedSearches.empty() && "No searches to continue?");

		// BFS vs DFS really doesn't make a difference here, so just do a DFS with
		// PausedSearches as our stack.
		while (!PausedSearches.empty()) {
		ListIndex PathIndex = PausedSearches.pop_back_val();
		DefPath &Node = Paths[PathIndex];

		// If we've already visited this path with this MemoryLocation, we don't
		// need to do so again.
		//
		// NOTE: That we just drop these paths on the ground makes caching
		// behavior sporadic. e.g. given a diamond:
		// A
		// B C
		// D
		//
		// ...If we walk D, B, A, C, we'll only cache the result of phi
		// optimization for A, B, and D; C will be skipped because it dies here.
		// This arguably isn't the worst thing ever, since:
		// - We generally query things in a top-down order, so if we got below D
		// without needing cache entries for {C, MemLoc}, then chances are
		// that those cache entries would end up ultimately unused.
		// - We still cache things for A, so C only needs to walk up a bit.
		// If this behavior becomes problematic, we can fix without a ton of extra
		// work.
		if (!VisitedPhis.insert({Node.Last, Node.Loc}).second)
		continue;

		UpwardsWalkResult Res = walkToPhiOrClobber(Node, /StopAt=/StopWhere);
		if (Res.IsKnownClobber) {
		assert(Res.Result != StopWhere \|\| Res.FromCache);
		// If this wasn't a cache hit, we hit a clobber when walking. That's a
		// failure.
		if (!Res.FromCache \|\| !MSSA.dominates(Res.Result, StopWhere))
		return PathIndex;

		// Otherwise, it's a valid thing to potentially optimize to.
		Terminated.push_back({Res.Result, PathIndex});
		continue;
		}

		if (Res.Result == StopWhere) {
		// We've hit our target. Save this path off for if we want to continue
		// walking.
		NewPaused.push_back(PathIndex);
		continue;
		}

		assert(!MSSA.isLiveOnEntryDef(Res.Result) && "liveOnEntry is a clobber");
		addSearches(cast<MemoryPhi>(Res.Result), PausedSearches, PathIndex);
		}

		return None;
		}

		template <typename T, typename Walker>
		struct generic_def_path_iterator
		: public iterator_facade_base<generic_def_path_iterator<T, Walker>,
		std::forward_iterator_tag, T *> {
		generic_def_path_iterator() : W(nullptr), N(None) {}
		generic_def_path_iterator(Walker *W, ListIndex N) : W(W), N(N) {}

		T &operator*() const { return curNode(); }

		generic_def_path_iterator &operator++() {
		N = curNode().Previous;
		return *this;
		}

		bool operator==(const generic_def_path_iterator &O) const {
		if (N.hasValue() != O.N.hasValue())
		return false;
		return !N.hasValue() \|\| N == O.N;
		}

		private:
		T &curNode() const { return W->Paths[*N]; }

		Walker *W;
		Optional<ListIndex> N;
		};

		using def_path_iterator = generic_def_path_iterator<DefPath, ClobberWalker>;
		using const_def_path_iterator =
		generic_def_path_iterator<const DefPath, const ClobberWalker>;

		iterator_range<def_path_iterator> def_path(ListIndex From) {
		return make_range(def_path_iterator(this, From), def_path_iterator());
		}

		iterator_range<const_def_path_iterator> const_def_path(ListIndex From) const {
		return make_range(const_def_path_iterator(this, From),
		const_def_path_iterator());
		}

		struct OptznResult {
		/// The path that contains our result.
		TerminatedPath PrimaryClobber;
		/// The paths that we can legally cache back from, but that aren't
		/// necessarily the result of the Phi optimization.
		SmallVector<TerminatedPath, 4> OtherClobbers;
		};

		ListIndex defPathIndex(const DefPath &N) const {
		// The assert looks nicer if we don't need to do &N
		const DefPath *NP = &N;
		assert(!Paths.empty() && NP >= &Paths.front() && NP <= &Paths.back() &&
		"Out of bounds DefPath!");
		return NP - &Paths.front();
		}

		/// Try to optimize a phi as best as we can. Returns a SmallVector of Paths
		/// that act as legal clobbers. Note that this won't return all clobbers.
		///
		/// Phi optimization algorithm tl;dr:
		/// - Find the earliest def/phi, A, we can optimize to
		/// - Find if all paths from the starting memory access ultimately reach A
		/// - If not, optimization isn't possible.
		/// - Otherwise, walk from A to another clobber or phi, A'.
		/// - If A' is a def, we're done.
		/// - If A' is a phi, try to optimize it.
		///
		/// A path is a series of {MemoryAccess, MemoryLocation} pairs. A path
		/// terminates when a MemoryAccess that clobbers said MemoryLocation is found.
		OptznResult tryOptimizePhi(MemoryPhi Phi, MemoryAccess Start,
		const MemoryLocation &Loc) {
		assert(Paths.empty() && VisitedPhis.empty() &&
		"Reset the optimization state.");

		Paths.emplace_back(Loc, Start, Phi, None);
		// Stores how many "valid" optimization nodes we had prior to calling
		// addSearches/getBlockingAccess. Necessary for caching if we had a blocker.
		auto PriorPathsSize = Paths.size();

		SmallVector<ListIndex, 16> PausedSearches;
		SmallVector<ListIndex, 8> NewPaused;
		SmallVector<TerminatedPath, 4> TerminatedPaths;

		addSearches(Phi, PausedSearches, 0);

		// Moves the TerminatedPath with the "most dominated" Clobber to the end of
		// Paths.
		auto MoveDominatedPathToEnd = [&](SmallVectorImpl<TerminatedPath> &Paths) {
		assert(!Paths.empty() && "Need a path to move");
		// FIXME: This is technically n^2 (n = distance(DefPath.First,
		// DefPath.Last)) because of local dominance checks.
		auto Dom = Paths.begin();
		for (auto I = std::next(Dom), E = Paths.end(); I != E; ++I)
		if (!MSSA.dominates(I->Clobber, Dom->Clobber))
		Dom = I;
		auto Last = Paths.end() - 1;
		if (Last != Dom)
		std::iter_swap(Last, Dom);
		};

		MemoryPhi *Current = Phi;
		while (1) {
		assert(!MSSA.isLiveOnEntryDef(Current) &&
		"liveOnEntry wasn't treated as a clobber?");

		MemoryAccess *Target = getWalkTarget(Current);
		// If a TerminatedPath doesn't dominate Target, then it wasn't a legal
		// optimization for the prior phi.
		assert(all_of(TerminatedPaths, [&](const TerminatedPath &P) {
		return MSSA.dominates(P.Clobber, Target);
		}));

		// FIXME: This is broken, because the Blocker may be reported to be
		// liveOnEntry, and we'll happily wait for that to disappear (read: never)
		// For the moment, this is fine, since we do basically nothing with
		// blocker info.
		if (Optional<ListIndex> Blocker = getBlockingAccess(
		Target, PausedSearches, NewPaused, TerminatedPaths)) {
		MemoryAccess BlockingAccess = Paths[Blocker].Last;
		// Cache our work on the blocking node, since we know that's correct.
		cacheDefPath(Paths[*Blocker], BlockingAccess);

		// Find the node we started at. We can't search based on N->Last, since
		// we may have gone around a loop with a different MemoryLocation.
		auto Iter = find_if(def_path(*Blocker), [&](const DefPath &N) {
		return defPathIndex(N) < PriorPathsSize;
		});
		assert(Iter != def_path_iterator());

		DefPath &CurNode = *Iter;
		assert(CurNode.Last == Current);
		CurNode.Blocker = BlockingAccess;

		// Two things:
		// A. We can't reliably cache all of NewPaused back. Consider a case
		// where we have two paths in NewPaused; one of which can't optimize
		// above this phi, whereas the other can. If we cache the second path
		// back, we'll end up with suboptimal cache entries. We can handle
		// cases like this a bit better when we either try to find all
		// clobbers that block phi optimization, or when our cache starts
		// supporting unfinished searches.
		// B. We can't reliably cache TerminatedPaths back here without doing
		// extra checks; consider a case like:
		// T
		// / \
		// D C
		// \ /
		// S
		// Where T is our target, C is a node with a clobber on it, D is a
		// diamond (with a clobber only on the left or right node, N), and
		// S is our start. Say we walk to D, through the node opposite N
		// (read: ignoring the clobber), and see a cache entry in the top
		// node of D. That cache entry gets put into TerminatedPaths. We then
		// walk up to C (N is later in our worklist), find the clobber, and
		// quit. If we append TerminatedPaths to OtherClobbers, we'll cache
		// the bottom part of D to the cached clobber, ignoring the clobber
		// in N. Again, this problem goes away if we start tracking all
		// blockers for a given phi optimization.
		TerminatedPath Result{CurNode.Last, defPathIndex(CurNode)};
		return {Result, {}};
		}

		// If there's nothing left to search, then all paths led to valid clobbers
		// that we got from our cache; pick the nearest to the start, and allow
		// the rest to be cached back.
		if (NewPaused.empty()) {
		MoveDominatedPathToEnd(TerminatedPaths);
		TerminatedPath Result = TerminatedPaths.pop_back_val();
		return {Result, std::move(TerminatedPaths)};
		}

		MemoryAccess *DefChainEnd = nullptr;
		SmallVector<TerminatedPath, 4> Clobbers;
		for (ListIndex Paused : NewPaused) {
		UpwardsWalkResult WR = walkToPhiOrClobber(Paths[Paused]);
		if (WR.IsKnownClobber)
		Clobbers.push_back({WR.Result, Paused});
		else
		// Micro-opt: If we hit the end of the chain, save it.
		DefChainEnd = WR.Result;
		}

		if (!TerminatedPaths.empty()) {
		// If we couldn't find the dominating phi/liveOnEntry in the above loop,
		// do it now.
		if (!DefChainEnd)
		for (MemoryAccess *MA : def_chain(Target))
		DefChainEnd = MA;

		// If any of the terminated paths don't dominate the phi we'll try to
		// optimize, we need to figure out what they are and quit.
		const BasicBlock *ChainBB = DefChainEnd->getBlock();
		for (const TerminatedPath &TP : TerminatedPaths) {
		// Because we know that DefChainEnd is as "high" as we can go, we
		// don't need local dominance checks; BB dominance is sufficient.
		if (DT.dominates(ChainBB, TP.Clobber->getBlock()))
		Clobbers.push_back(TP);
		}
		}

		// If we have clobbers in the def chain, find the one closest to Current
		// and quit.
		if (!Clobbers.empty()) {
		MoveDominatedPathToEnd(Clobbers);
		TerminatedPath Result = Clobbers.pop_back_val();
		return {Result, std::move(Clobbers)};
		}

		assert(all_of(NewPaused,
		[&](ListIndex I) { return Paths[I].Last == DefChainEnd; }));

		// Because liveOnEntry is a clobber, this must be a phi.
		auto *DefChainPhi = cast<MemoryPhi>(DefChainEnd);

		PriorPathsSize = Paths.size();
		PausedSearches.clear();
		for (ListIndex I : NewPaused)
		addSearches(DefChainPhi, PausedSearches, I);
		NewPaused.clear();

		Current = DefChainPhi;
		}
		}

		/// Caches everything in an OptznResult.
		void cacheOptResult(const OptznResult &R) {
		if (R.OtherClobbers.empty()) {
		// If we're not going to be caching OtherClobbers, don't bother with
		// marking visited/etc.
		for (const DefPath &N : const_def_path(R.PrimaryClobber.LastNode))
		cacheDefPath(N, R.PrimaryClobber.Clobber);
		return;
		}

		// PrimaryClobber is our answer. If we can cache anything back, we need to
		// stop caching when we visit PrimaryClobber.
		SmallBitVector Visited(Paths.size());
		for (const DefPath &N : const_def_path(R.PrimaryClobber.LastNode)) {
		Visited[defPathIndex(N)] = true;
		cacheDefPath(N, R.PrimaryClobber.Clobber);
		}

		for (const TerminatedPath &P : R.OtherClobbers) {
		for (const DefPath &N : const_def_path(P.LastNode)) {
		ListIndex NIndex = defPathIndex(N);
		if (Visited[NIndex])
		break;
		Visited[NIndex] = true;
		cacheDefPath(N, P.Clobber);
		}
		}
		}

		void verifyOptResult(const OptznResult &R) const {
		assert(all_of(R.OtherClobbers, [&](const TerminatedPath &P) {
		return MSSA.dominates(P.Clobber, R.PrimaryClobber.Clobber);
		}));
		}

		void resetPhiOptznState() {
		Paths.clear();
		VisitedPhis.clear();
		}

		public:
		ClobberWalker(const MemorySSA &MSSA, AliasAnalysis &AA, DominatorTree &DT,
		WalkerCache &WC)
		: MSSA(MSSA), AA(AA), DT(DT), WC(WC), UseCache(true) {}

		void reset() { PhiTargetCache.clear(); }

		/// Finds the nearest clobber for the given query, optimizing phis if
		/// possible.
		MemoryAccess findClobber(MemoryAccess Start, UpwardsMemoryQuery &Q,
		bool UseWalkerCache = true) {
		setUseCache(UseWalkerCache);
		Query = &Q;

		MemoryAccess *Current = Start;
		// This walker pretends uses don't exist. If we're handed one, silently grab
		// its def. (This has the nice side-effect of ensuring we never cache uses)
		if (auto *MU = dyn_cast<MemoryUse>(Start))
		Current = MU->getDefiningAccess();

		DefPath FirstDesc(Q.StartingLoc, Current, Current, None);
		// Fast path for the overly-common case (no crazy phi optimization
		// necessary)
		UpwardsWalkResult WalkResult = walkToPhiOrClobber(FirstDesc);
		if (WalkResult.IsKnownClobber) {
		cacheDefPath(FirstDesc, WalkResult.Result);
		return WalkResult.Result;
		}

		OptznResult OptRes =
		tryOptimizePhi(cast<MemoryPhi>(FirstDesc.Last), Current, Q.StartingLoc);
		verifyOptResult(OptRes);
		cacheOptResult(OptRes);
		resetPhiOptznState();

		#ifdef EXPENSIVE_CHECKS
		checkClobberSanity(Current, OptRes.PrimaryClobber.Clobber, Q.StartingLoc,
		MSSA, Q, AA);
		#endif
		return OptRes.PrimaryClobber.Clobber;
		}
		};

struct RenamePassData {		struct RenamePassData {
DomTreeNode *DTN;		DomTreeNode *DTN;
DomTreeNode::const_iterator ChildIt;		DomTreeNode::const_iterator ChildIt;
MemoryAccess *IncomingVal;		MemoryAccess *IncomingVal;

RenamePassData(DomTreeNode *D, DomTreeNode::const_iterator It,		RenamePassData(DomTreeNode *D, DomTreeNode::const_iterator It,
MemoryAccess *M)		MemoryAccess *M)
: DTN(D), ChildIt(It), IncomingVal(M) {}		: DTN(D), ChildIt(It), IncomingVal(M) {}
Show All 36 Lines
///		///
/// ; For completeness' sake, loading %a or %b again would not cache another		/// ; For completeness' sake, loading %a or %b again would not cache another
/// ; M entries.		/// ; M entries.
/// %r = add i32 %c, %d		/// %r = add i32 %c, %d
/// ret i32 %r		/// ret i32 %r
/// }		/// }
class MemorySSA::CachingWalker final : public MemorySSAWalker {		class MemorySSA::CachingWalker final : public MemorySSAWalker {
WalkerCache Cache;		WalkerCache Cache;
AliasAnalysis *AA;		ClobberWalker Walker;
DominatorTree *DT;		bool AutoResetWalker;

MemoryAccessPair UpwardsDFSWalk(MemoryAccess *, const MemoryLocation &,
UpwardsMemoryQuery &, bool);
MemoryAccess getClobberingMemoryAccess(MemoryAccess , UpwardsMemoryQuery &);		MemoryAccess getClobberingMemoryAccess(MemoryAccess , UpwardsMemoryQuery &);
void verifyRemoved(MemoryAccess *);		void verifyRemoved(MemoryAccess *);

public:		public:
CachingWalker(MemorySSA , AliasAnalysis , DominatorTree *);		CachingWalker(MemorySSA , AliasAnalysis , DominatorTree *);
~CachingWalker() override;		~CachingWalker() override;

MemoryAccess getClobberingMemoryAccess(const Instruction ) override;		MemoryAccess getClobberingMemoryAccess(const Instruction ) override;
MemoryAccess getClobberingMemoryAccess(MemoryAccess ,		MemoryAccess getClobberingMemoryAccess(MemoryAccess ,
MemoryLocation &) override;		MemoryLocation &) override;
void invalidateInfo(MemoryAccess *) override;		void invalidateInfo(MemoryAccess *) override;

		/// Whether we call resetClobberWalker() after each time we actually walk to
		/// answer a clobber query.
		void setAutoResetWalker(bool AutoReset) { AutoResetWalker = AutoReset; }

		/// Drop the walker's persistent data structures. At the moment, this means
		/// "drop the walker's cache of BasicBlocks ->
		/// earliest-MemoryAccess-we-can-optimize-to". This is necessary if we're
		/// going to have DT updates, if we remove MemoryAccesses, etc.
		void resetClobberWalker() { Walker.reset(); }
};		};

/// \brief Rename a single basic block into MemorySSA form.		/// \brief Rename a single basic block into MemorySSA form.
/// Uses the standard SSA renaming algorithm.		/// Uses the standard SSA renaming algorithm.
/// \returns The new incoming value.		/// \returns The new incoming value.
MemoryAccess MemorySSA::renameBlock(BasicBlock BB,		MemoryAccess MemorySSA::renameBlock(BasicBlock BB,
MemoryAccess *IncomingVal) {		MemoryAccess *IncomingVal) {
auto It = PerBlockAccesses.find(BB);		auto It = PerBlockAccesses.find(BB);
▲ Show 20 Lines • Show All 217 Lines • ▼ Show 20 Lines	void MemorySSA::buildMemorySSA() {

// Now do regular SSA renaming on the MemoryDef/MemoryUse. Visited will get		// Now do regular SSA renaming on the MemoryDef/MemoryUse. Visited will get
// filled in with all blocks.		// filled in with all blocks.
SmallPtrSet<BasicBlock *, 16> Visited;		SmallPtrSet<BasicBlock *, 16> Visited;
renamePass(DT->getRootNode(), LiveOnEntryDef.get(), Visited);		renamePass(DT->getRootNode(), LiveOnEntryDef.get(), Visited);

CachingWalker *Walker = getWalkerImpl();		CachingWalker *Walker = getWalkerImpl();

		// We're doing a batch of updates; don't drop useful caches between them.
		Walker->setAutoResetWalker(false);

// Now optimize the MemoryUse's defining access to point to the nearest		// Now optimize the MemoryUse's defining access to point to the nearest
// dominating clobbering def.		// dominating clobbering def.
// This ensures that MemoryUse's that are killed by the same store are		// This ensures that MemoryUse's that are killed by the same store are
// immediate users of that store, one of the invariants we guarantee.		// immediate users of that store, one of the invariants we guarantee.
for (auto DomNode : depth_first(DT)) {		for (auto DomNode : depth_first(DT)) {
BasicBlock *BB = DomNode->getBlock();		BasicBlock *BB = DomNode->getBlock();
auto AI = PerBlockAccesses.find(BB);		auto AI = PerBlockAccesses.find(BB);
if (AI == PerBlockAccesses.end())		if (AI == PerBlockAccesses.end())
continue;		continue;
AccessList *Accesses = AI->second.get();		AccessList *Accesses = AI->second.get();
for (auto &MA : *Accesses) {		for (auto &MA : *Accesses) {
if (auto *MU = dyn_cast<MemoryUse>(&MA)) {		if (auto *MU = dyn_cast<MemoryUse>(&MA)) {
Instruction *Inst = MU->getMemoryInst();		Instruction *Inst = MU->getMemoryInst();
MU->setDefiningAccess(Walker->getClobberingMemoryAccess(Inst));		MU->setDefiningAccess(Walker->getClobberingMemoryAccess(Inst));
}		}
}		}
}		}

		Walker->setAutoResetWalker(true);
		Walker->resetClobberWalker();

// Mark the uses in unreachable blocks as live on entry, so that they go		// Mark the uses in unreachable blocks as live on entry, so that they go
// somewhere.		// somewhere.
for (auto &BB : F)		for (auto &BB : F)
if (!Visited.count(&BB))		if (!Visited.count(&BB))
markUnreachableAsLiveOnEntry(&BB);		markUnreachableAsLiveOnEntry(&BB);
}		}

MemorySSAWalker *MemorySSA::getWalker() { return getWalkerImpl(); }		MemorySSAWalker *MemorySSA::getWalker() { return getWalkerImpl(); }
▲ Show 20 Lines • Show All 337 Lines • ▼ Show 20 Lines
MemoryAccess MemorySSA::getMemoryAccess(const Value I) const {		MemoryAccess MemorySSA::getMemoryAccess(const Value I) const {
return ValueToMemoryAccess.lookup(I);		return ValueToMemoryAccess.lookup(I);
}		}

MemoryPhi MemorySSA::getMemoryAccess(const BasicBlock BB) const {		MemoryPhi MemorySSA::getMemoryAccess(const BasicBlock BB) const {
return cast_or_null<MemoryPhi>(getMemoryAccess((const Value *)BB));		return cast_or_null<MemoryPhi>(getMemoryAccess((const Value *)BB));
}		}

/// \brief Determine, for two memory accesses in the same block,		bool MemorySSA::dominates(const MemoryAccess *Dominator,
/// whether \p Dominator dominates \p Dominatee.
/// \returns True if \p Dominator dominates \p Dominatee.
bool MemorySSA::locallyDominates(const MemoryAccess *Dominator,
const MemoryAccess *Dominatee) const {		const MemoryAccess *Dominatee) const {
assert((Dominator->getBlock() == Dominatee->getBlock()) &&		if (Dominator == Dominatee \|\| isLiveOnEntryDef(Dominator))
"Asking for local domination when accesses are in different blocks!");

// A node dominates itself, and liveOnEntry dominates everything.
if (Dominatee == Dominator \|\| isLiveOnEntryDef(Dominator))
return true;		return true;

// When Dominatee is defined on function entry, it is not dominated by another
// memory access.
if (isLiveOnEntryDef(Dominatee))		if (isLiveOnEntryDef(Dominatee))
return false;		return false;

// Get the access list for the block		if (Dominator->getBlock() != Dominatee->getBlock())
		return DT->dominates(Dominator->getBlock(), Dominatee->getBlock());

		// Test for local domination.
const AccessList *AccessList = getBlockAccesses(Dominator->getBlock());		const AccessList *AccessList = getBlockAccesses(Dominator->getBlock());
AccessList::const_reverse_iterator It(Dominator->getIterator());		AccessList::const_reverse_iterator It(Dominator->getIterator());

// If we hit the beginning of the access list before we hit dominatee, we must		// If we hit the beginning of the access list before we hit dominatee, we must
// dominate it.		// dominate it.
return std::none_of(It, AccessList->rend(),		return std::none_of(It, AccessList->rend(),
[&](const MemoryAccess &MA) { return &MA == Dominatee; });		[&](const MemoryAccess &MA) { return &MA == Dominatee; });
}		}
▲ Show 20 Lines • Show All 103 Lines • ▼ Show 20 Lines
void MemorySSAWrapperPass::print(raw_ostream &OS, const Module *M) const {		void MemorySSAWrapperPass::print(raw_ostream &OS, const Module *M) const {
MSSA->print(OS);		MSSA->print(OS);
}		}

MemorySSAWalker::MemorySSAWalker(MemorySSA *M) : MSSA(M) {}		MemorySSAWalker::MemorySSAWalker(MemorySSA *M) : MSSA(M) {}

MemorySSA::CachingWalker::CachingWalker(MemorySSA M, AliasAnalysis A,		MemorySSA::CachingWalker::CachingWalker(MemorySSA M, AliasAnalysis A,
DominatorTree *D)		DominatorTree *D)
: MemorySSAWalker(M), AA(A), DT(D) {}		: MemorySSAWalker(M), Walker(M, A, D, Cache) / heh */,
		dberlinUnsubmitted Not Done Reply Inline Actions remove the heh :) dberlin: remove the heh :)
		george.burgess.ivAuthorUnsubmitted Not Done Reply Inline Actions Oops :) george.burgess.iv: Oops :)
		AutoResetWalker(true) {}

MemorySSA::CachingWalker::~CachingWalker() {}		MemorySSA::CachingWalker::~CachingWalker() {}

void MemorySSA::CachingWalker::invalidateInfo(MemoryAccess *MA) {		void MemorySSA::CachingWalker::invalidateInfo(MemoryAccess *MA) {
// TODO: We can do much better cache invalidation with differently stored		// TODO: We can do much better cache invalidation with differently stored
// caches. For now, for MemoryUses, we simply remove them		// caches. For now, for MemoryUses, we simply remove them
// from the cache, and kill the entire call/non-call cache for everything		// from the cache, and kill the entire call/non-call cache for everything
// else. The problem is for phis or defs, currently we'd need to follow use		// else. The problem is for phis or defs, currently we'd need to follow use
Show All 9 Lines	if (MemoryUse *MU = dyn_cast<MemoryUse>(MA)) {
UpwardsMemoryQuery Q(MU->getMemoryInst(), MU);		UpwardsMemoryQuery Q(MU->getMemoryInst(), MU);
Cache.remove(MU, Q.StartingLoc, Q.IsCall);		Cache.remove(MU, Q.StartingLoc, Q.IsCall);
} else {		} else {
// If it is not a use, the best we can do right now is destroy the cache.		// If it is not a use, the best we can do right now is destroy the cache.
Cache.clear();		Cache.clear();
}		}

#ifdef EXPENSIVE_CHECKS		#ifdef EXPENSIVE_CHECKS
// Run this only when expensive checks are enabled.
verifyRemoved(MA);		verifyRemoved(MA);
#endif		#endif
}		}

MemoryAccessPair MemorySSA::CachingWalker::UpwardsDFSWalk(
MemoryAccess *StartingAccess, const MemoryLocation &Loc,
UpwardsMemoryQuery &Q, bool FollowingBackedge) {
MemoryAccess *ModifyingAccess = nullptr;

auto DFI = df_begin(StartingAccess);
for (auto DFE = df_end(StartingAccess); DFI != DFE;) {
MemoryAccess CurrAccess = DFI;
if (MSSA->isLiveOnEntryDef(CurrAccess))
return {CurrAccess, Loc};
// If this is a MemoryDef, check whether it clobbers our current query. This
// needs to be done before consulting the cache, because the cache reports
// the clobber for CurrAccess. If CurrAccess is a clobber for this query,
// and we ask the cache for information first, then we might skip this
// clobber, which is bad.
if (auto *MD = dyn_cast<MemoryDef>(CurrAccess)) {
// If we hit the top, stop following this path.
// While we can do lookups, we can't sanely do inserts here unless we were
// to track everything we saw along the way, since we don't know where we
// will stop.
if (instructionClobbersQuery(MD, Loc, Q, *AA)) {
ModifyingAccess = CurrAccess;
break;
}
}
if (auto CacheResult = Cache.lookup(CurrAccess, Loc, Q.IsCall))
return {CacheResult, Loc};

// We need to know whether it is a phi so we can track backedges.
// Otherwise, walk all upward defs.
if (!isa<MemoryPhi>(CurrAccess)) {
++DFI;
continue;
}

#ifndef NDEBUG
// The loop below visits the phi's children for us. Because phis are the
// only things with multiple edges, skipping the children should always lead
// us to the end of the loop.
//
// Use a copy of DFI because skipChildren would kill our search stack, which
// would make caching anything on the way back impossible.
auto DFICopy = DFI;
assert(DFICopy.skipChildren() == DFE &&
"Skipping phi's children doesn't end the DFS?");
#endif

const MemoryAccessPair PHIPair(CurrAccess, Loc);

// Don't try to optimize this phi again if we've already tried to do so.
if (!Q.Visited.insert(PHIPair).second) {
ModifyingAccess = CurrAccess;
break;
}

std::size_t InitialVisitedCallSize = Q.VisitedCalls.size();

// Recurse on PHI nodes, since we need to change locations.
// TODO: Allow graphtraits on pairs, which would turn this whole function
// into a normal single depth first walk.
MemoryAccess *FirstDef = nullptr;
for (auto MPI = upward_defs_begin(PHIPair), MPE = upward_defs_end();
MPI != MPE; ++MPI) {
bool Backedge =
!FollowingBackedge &&
DT->dominates(CurrAccess->getBlock(), MPI.getPhiArgBlock());

MemoryAccessPair CurrentPair =
UpwardsDFSWalk(MPI->first, MPI->second, Q, Backedge);
// All the phi arguments should reach the same point if we can bypass
// this phi. The alternative is that they hit this phi node, which
// means we can skip this argument.
if (FirstDef && CurrentPair.first != PHIPair.first &&
CurrentPair.first != FirstDef) {
ModifyingAccess = CurrAccess;
break;
}

if (!FirstDef)
FirstDef = CurrentPair.first;
}

// If we exited the loop early, go with the result it gave us.
if (!ModifyingAccess) {
assert(FirstDef && "Found a Phi with no upward defs?");
ModifyingAccess = FirstDef;
} else {
// If we can't optimize this Phi, then we can't safely cache any of the
// calls we visited when trying to optimize it. Wipe them out now.
Q.VisitedCalls.resize(InitialVisitedCallSize);
}
break;
}

if (!ModifyingAccess)
return {MSSA->getLiveOnEntryDef(), Q.StartingLoc};

const BasicBlock *OriginalBlock = StartingAccess->getBlock();
assert(DFI.getPathLength() > 0 && "We dropped our path?");
unsigned N = DFI.getPathLength();
// If we found a clobbering def, the last element in the path will be our
// clobber, so we don't want to cache that to itself. OTOH, if we optimized a
// phi, we can add the last thing in the path to the cache, since that won't
// be the result.
if (DFI.getPath(N - 1) == ModifyingAccess)
--N;
for (; N > 1; --N) {
MemoryAccess *CacheAccess = DFI.getPath(N - 1);
BasicBlock *CurrBlock = CacheAccess->getBlock();
if (!FollowingBackedge)
Cache.insert(CacheAccess, ModifyingAccess, Loc, Q.IsCall);
if (DT->dominates(CurrBlock, OriginalBlock) &&
(CurrBlock != OriginalBlock \|\| !FollowingBackedge \|\|
MSSA->locallyDominates(CacheAccess, StartingAccess)))
break;
}

// Cache everything else on the way back. The caller should cache
// StartingAccess for us.
for (; N > 1; --N) {
MemoryAccess *CacheAccess = DFI.getPath(N - 1);
Cache.insert(CacheAccess, ModifyingAccess, Loc, Q.IsCall);
}
assert(Q.Visited.size() < 1000 && "Visited too much");

return {ModifyingAccess, Loc};
}

/// \brief Walk the use-def chains starting at \p MA and find		/// \brief Walk the use-def chains starting at \p MA and find
/// the MemoryAccess that actually clobbers Loc.		/// the MemoryAccess that actually clobbers Loc.
///		///
/// \returns our clobbering memory access		/// \returns our clobbering memory access
MemoryAccess *MemorySSA::CachingWalker::getClobberingMemoryAccess(		MemoryAccess *MemorySSA::CachingWalker::getClobberingMemoryAccess(
MemoryAccess *StartingAccess, UpwardsMemoryQuery &Q) {		MemoryAccess *StartingAccess, UpwardsMemoryQuery &Q) {
return UpwardsDFSWalk(StartingAccess, Q.StartingLoc, Q, false).first;		MemoryAccess *New = Walker.findClobber(StartingAccess, Q);
		#ifdef EXPENSIVE_CHECKS
		MemoryAccess *NewNoCache =
		Walker.findClobber(StartingAccess, Q, /UseWalkerCache=/false);
		assert(NewNoCache == New && "Cache made us hand back a different result?");
		#endif
		if (AutoResetWalker)
		resetClobberWalker();
		return New;
}		}

MemoryAccess *MemorySSA::CachingWalker::getClobberingMemoryAccess(		MemoryAccess *MemorySSA::CachingWalker::getClobberingMemoryAccess(
MemoryAccess *StartingAccess, MemoryLocation &Loc) {		MemoryAccess *StartingAccess, MemoryLocation &Loc) {
if (isa<MemoryPhi>(StartingAccess))		if (isa<MemoryPhi>(StartingAccess))
return StartingAccess;		return StartingAccess;

auto *StartingUseOrDef = cast<MemoryUseOrDef>(StartingAccess);		auto *StartingUseOrDef = cast<MemoryUseOrDef>(StartingAccess);
Show All 18 Lines	MemoryAccess *MemorySSA::CachingWalker::getClobberingMemoryAccess(

// Unlike the other function, do not walk to the def of a def, because we are		// Unlike the other function, do not walk to the def of a def, because we are
// handed something we already believe is the clobbering access.		// handed something we already believe is the clobbering access.
MemoryAccess *DefiningAccess = isa<MemoryUse>(StartingUseOrDef)		MemoryAccess *DefiningAccess = isa<MemoryUse>(StartingUseOrDef)
? StartingUseOrDef->getDefiningAccess()		? StartingUseOrDef->getDefiningAccess()
: StartingUseOrDef;		: StartingUseOrDef;

MemoryAccess *Clobber = getClobberingMemoryAccess(DefiningAccess, Q);		MemoryAccess *Clobber = getClobberingMemoryAccess(DefiningAccess, Q);
// Only cache this if it wouldn't make Clobber point to itself.
if (Clobber != StartingAccess)
Cache.insert(Q.OriginalAccess, Clobber, Q.StartingLoc, Q.IsCall);
DEBUG(dbgs() << "Starting Memory SSA clobber for " << *I << " is ");		DEBUG(dbgs() << "Starting Memory SSA clobber for " << *I << " is ");
DEBUG(dbgs() << *StartingUseOrDef << "\n");		DEBUG(dbgs() << *StartingUseOrDef << "\n");
DEBUG(dbgs() << "Final Memory SSA clobber for " << *I << " is ");		DEBUG(dbgs() << "Final Memory SSA clobber for " << *I << " is ");
DEBUG(dbgs() << *Clobber << "\n");		DEBUG(dbgs() << *Clobber << "\n");
return Clobber;		return Clobber;
}		}

MemoryAccess *		MemoryAccess *
Show All 16 Lines	MemorySSA::CachingWalker::getClobberingMemoryAccess(const Instruction *I) {
MemoryAccess *DefiningAccess = StartingAccess->getDefiningAccess();		MemoryAccess *DefiningAccess = StartingAccess->getDefiningAccess();

// At this point, DefiningAccess may be the live on entry def.		// At this point, DefiningAccess may be the live on entry def.
// If it is, we will not get a better result.		// If it is, we will not get a better result.
if (MSSA->isLiveOnEntryDef(DefiningAccess))		if (MSSA->isLiveOnEntryDef(DefiningAccess))
return DefiningAccess;		return DefiningAccess;

MemoryAccess *Result = getClobberingMemoryAccess(DefiningAccess, Q);		MemoryAccess *Result = getClobberingMemoryAccess(DefiningAccess, Q);
// DFS won't cache a result for DefiningAccess. So, if DefiningAccess isn't
// our clobber, be sure that it gets a cache entry, too.
if (Result != DefiningAccess)
Cache.insert(DefiningAccess, Result, Q.StartingLoc, Q.IsCall);
Cache.insert(Q.OriginalAccess, Result, Q.StartingLoc, Q.IsCall);
// TODO: When this implementation is more mature, we may want to figure out
// what this additional caching buys us. It's most likely A Good Thing.
if (Q.IsCall)
for (const MemoryAccess *MA : Q.VisitedCalls)
if (MA != Result)
Cache.insert(MA, Result, Q.StartingLoc, Q.IsCall);

DEBUG(dbgs() << "Starting Memory SSA clobber for " << *I << " is ");		DEBUG(dbgs() << "Starting Memory SSA clobber for " << *I << " is ");
DEBUG(dbgs() << *DefiningAccess << "\n");		DEBUG(dbgs() << *DefiningAccess << "\n");
DEBUG(dbgs() << "Final Memory SSA clobber for " << *I << " is ");		DEBUG(dbgs() << "Final Memory SSA clobber for " << *I << " is ");
DEBUG(dbgs() << *Result << "\n");		DEBUG(dbgs() << *Result << "\n");

return Result;		return Result;
}		}

Show All 20 Lines

test/Transforms/Util/MemorySSA/cyclicphi.ll

Show First 20 Lines • Show All 50 Lines • ▼ Show 20 Lines	; CHECK-NEXT: %tmp69 = load i64, i64* %g, align 8
%tmp69 = load i64, i64* %g, align 8		%tmp69 = load i64, i64* %g, align 8
; CHECK: 1 = MemoryDef(3)		; CHECK: 1 = MemoryDef(3)
; CHECK-NEXT: store i64 %tmp69, i64* %g, align 8		; CHECK-NEXT: store i64 %tmp69, i64* %g, align 8
store i64 %tmp69, i64* %g, align 8		store i64 %tmp69, i64* %g, align 8
br label %bb77		br label %bb77

bb77: ; preds = %bb68, %bb26		bb77: ; preds = %bb68, %bb26
; CHECK: 2 = MemoryPhi({bb26,3},{bb68,1})		; CHECK: 2 = MemoryPhi({bb26,3},{bb68,1})
; FIXME: This should be MemoryUse(liveOnEntry)		; CHECK: MemoryUse(liveOnEntry)
; CHECK: MemoryUse(3)
; CHECK-NEXT: %tmp78 = load i64, i64* %tmp25, align 8		; CHECK-NEXT: %tmp78 = load i64, i64* %tmp25, align 8
%tmp78 = load i64, i64* %tmp25, align 8		%tmp78 = load i64, i64* %tmp25, align 8
br label %bb26		br label %bb26
}		}

; CHECK-LABEL: define void @quux_dominated		; CHECK-LABEL: define void @quux_dominated
define void @quux_dominated(%struct.hoge* noalias %f, i64* noalias %g) align 2 {		define void @quux_dominated(%struct.hoge* noalias %f, i64* noalias %g) align 2 {
%tmp = getelementptr inbounds %struct.hoge, %struct.hoge* %f, i64 0, i32 1, i32 0		%tmp = getelementptr inbounds %struct.hoge, %struct.hoge* %f, i64 0, i32 1, i32 0
▲ Show 20 Lines • Show All 56 Lines • Show Last 20 Lines

test/Transforms/Util/MemorySSA/phi-translation.ll

Show First 20 Lines • Show All 132 Lines • ▼ Show 20 Lines	; CHECK-NEXT: store i8 1, i8* %p2
store i8 1, i8* %p2		store i8 1, i8* %p2
br label %loop.3		br label %loop.3

loop.3:		loop.3:
; CHECK: 5 = MemoryPhi({loop.1,2},{loop.2,3})		; CHECK: 5 = MemoryPhi({loop.1,2},{loop.2,3})
; CHECK: 4 = MemoryDef(5)		; CHECK: 4 = MemoryDef(5)
; CHECK-NEXT: store i8 2, i8* %p2		; CHECK-NEXT: store i8 2, i8* %p2
store i8 2, i8* %p2		store i8 2, i8* %p2
; FIXME: This should be MemoryUse(1)		; CHECK: MemoryUse(1)
; CHECK: MemoryUse(5)
; CHECK-NEXT: load i8, i8* %p1		; CHECK-NEXT: load i8, i8* %p1
load i8, i8* %p1		load i8, i8* %p1
br i1 undef, label %loop.2, label %loop.1		br i1 undef, label %loop.2, label %loop.1
}		}

; CHECK-LABEL: define void @looped_visitedonlyonce		; CHECK-LABEL: define void @looped_visitedonlyonce
define void @looped_visitedonlyonce(i8* noalias %p1, i8* noalias %p2) {		define void @looped_visitedonlyonce(i8* noalias %p1, i8* noalias %p2) {
br label %while.cond		br label %while.cond
Show All 32 Lines

This is an archive of the discontinued LLVM Phabricator instance.

[MemorySSA] Switch to a different walkerClosedPublic

Details

Diff Detail

Event Timeline

Revision Contents

Diff 62045

include/llvm/Transforms/Utils/MemorySSA.h

lib/Transforms/Utils/MemorySSA.cpp

test/Transforms/Util/MemorySSA/cyclicphi.ll

test/Transforms/Util/MemorySSA/phi-translation.ll

[MemorySSA] Switch to a different walker
ClosedPublic