This is an archive of the discontinued LLVM Phabricator instance.

limit the number of instructions per block examined by dead store elimination
ClosedPublic

Authored by inglorion on Dec 15 2015, 11:10 AM.

Download Raw Diff

Details

Reviewers

george.burgess.iv
bruno
reames
davidxl
• dberlin
dexonsmith

Commits

rG3db176410a19: limit the number of instructions per block examined by dead store elimination
rL279833: limit the number of instructions per block examined by dead store elimination

Summary

Dead store elimination gets very expensive when large numbers of instructions need to be analyzed. This patch limits the number of instructions analyzed per store to the value of the memdep-block-scan-limit parameter (which defaults to 100). This resulted in no observed difference in performance of the generated code, and no change in the statistics for the dead store elimination pass, but improved compilation time on some files by more than an order of magnitude.

Diff Detail

Event Timeline

inglorion updated this revision to Diff 42880.Dec 15 2015, 11:10 AM

inglorion retitled this revision from to limit the number of instructions per block examined by dead store elimination.

inglorion updated this object.

inglorion added a reviewer: dexonsmith.Dec 15 2015, 11:13 AM

inglorion added a subscriber: llvm-commits.

bruno added a reviewer: bruno.Jan 5 2016, 3:41 PM

mbodart added a subscriber: mbodart.Jan 5 2016, 4:39 PM

hfinkel added reviewers: • dberlin, george.burgess.iv, reames.Feb 2 2016, 2:29 PM

Can you provide tests for these cases, so we at least have some idea when the limit should hit?

Any update on this patch? I have an internal source file for which DSE is taking an inordinate amount of time:

1427.7928 ( 99.9%)   0.1237 ( 61.0%)  1427.9166 ( 99.9%)  1428.6744 ( 99.9%)  Dead Store Elimination
 ...
1428.8521 (100.0%)   0.2030 (100.0%)  1429.0551 (100.0%)  1429.8275 (100.0%)  Total

Applying this patch reduces this immensely:

17.7091 ( 94.3%)   0.0123 ( 14.7%)  17.7214 ( 93.9%)  17.7251 ( 93.8%)  Dead Store Elimination
 ...
18.7871 (100.0%)   0.0837 (100.0%)  18.8707 (100.0%)  18.8947 (100.0%)  Total

Easwaran may also hit some very long compile time problems due to this.

Feel free to invest time switching it to memoryssa :)

yes -- MemSSA will get there. Short term workaround might also be needed :)

I'd be a lot more willing to say yes (though you can get yes from someone
else too!) if someone was committed to doing the work to actually making it
short term.

Because the short term things all over the place rarely end up short term :)
While plan is to convert the existing memdep passes, me doing it alone will
a longer process than folks are likely to want here.

In D15537#397257, @davidxl wrote:

yes -- MemSSA will get there. Short term workaround might also be needed :)

I think we need a stopgap fix for this non-linear compile time issue. I don't think waiting until it is reimplemented to use MemorySSA is a good strategy given the extreme compile time issues, and I can't even find a switch to disable this pass.

BTW, the author had provided a test case on the mailing list (there are two threads for this patch):

http://lists.llvm.org/pipermail/llvm-commits/Week-of-Mon-20160201/330335.html

FWIW, I think it is reasonable to workaround quadratic compile time problems with temporary limits until a well scaling algorithm lands. However, I don't think this patch as-is is the correct approach.

lib/Analysis/MemoryDependenceAnalysis.cpp
398–399	Exposing the search limit of MemDep and allowing the dynamic search depth to be surfaced in the API seems like a really bad API. Fundamentally, I think we should first bound the DSE scan when it has to call through to MemDep, which this effectively does, but in a very round-about way. Then, if it necessary to give MemDep increasingly small scan limits, you should just add a vanilla input here, and decrease that limit in the DSE loop below rather than carrying-over the limit from one query to another query. But I'd really like to understand if all you actually need is to limit the DSE scan.

davidxl added inline comments.Apr 12 2016, 9:49 AM

lib/Analysis/MemoryDependenceAnalysis.cpp
398–399	One data point -- DSE is not the only client pass that triggers the problem. We have seen GVN triggered one as well. (I have not examined this patch in detail).

chandlerc added inline comments.Apr 12 2016, 9:57 AM

lib/Analysis/MemoryDependenceAnalysis.cpp
398–399	With out a very clear description of why the same limiting technique will work with two different consumers, I think it is a big mistake to put it into the API.

Thank you for all your comments, folks. I will be happy to improve this patch so we can get stop the long compiles while a better solution is being worked on.

@chandlerc, I am not 100% sure I understand your suggestion. If I understand you correctly, you are saying that you think it is bad API design to add the getDefaultBlockScanLimit to MemDep and allow clients of MemDep to pass in a limit to getPointerDependencyFrom. Ok. Then you say "we should first bound the DSE scan when it has to call through to MemDep" and "you should just add a vanilla input here" (in getSimplePointerDependencyFrom). What do you mean? Perhaps you could illustrate the API you're looking for with some pseudocode.

For what it's worth, what I am trying to accomplish with this patch is essentially to better enforce the limit that MemDep already has. Without the patch, we limit the number of instructions we will examine in a single call to getSimplePointerDependencyFrom to (by default) the last 100 instructions before the one where the search starts. In DSE, we then look at the result to determine if it lets us determine whether we can definitely perform or definitely not perform the optimization, but, in some cases, the result actually doesn't tell us one way or another, so we call into MemDep again to get the next possibly relevant instruction. This then searches up to 100 instructions from the location of the previous result. Since this can continue indefinitely, we can end up way more than 100 instructions away from where the search originally started, which is what causes the long compiles. This patch limits the search to 100 instructions (or whatever the parameter is set to) from where the DSE pass started the search, effectively closing the loophole that let us run past the limit the original search had.

Ping ..

Chandler, do you have time to reply to Bob's question? What this patch
does essentially is to enforce a global limit on for MemDep queries instead
of using a per-query local limit as is done today.

Your suggestion of having a fixed limit in DSE loop may not work well in
some cases and lead to more pessimistic result. The main problem with that
approach is that it does not know how expensive (how many instructions are
touched in backward walking) in each individual query. If there is a way to
communicate this cost information back to the caller of MD interface, that
will be fine too -- but that is essentially the same as what this patch
does.

We have been beaten badly by this problem in both sanitizer builds and FDO
builds. With Sanitizer buiild, we can throw in O0 as a walkaround (but
also suffer in runtime), but for FDO this is not an option.

David

Ping ..

Ping. Chandler, It has been a while (3 months) since my last reply (May 6) to the thread.

thanks,

The updated patch is here https://reviews.llvm.org/F2313386

The patch fixes the issue in https://llvm.org/bugs/show_bug.cgi?id=29064

davide added a subscriber: davide.Aug 22 2016, 12:26 PM

I just want to make sure I understand the big picture view of this change. I'm going to try to summarize, please correct me if I'm wrong.

Today, we will walk back through the list of defs/clobbers provided by MDA (within a single block) without limit in DSE. Internally, MDA will only find defs/clobbers which are within a limited distance of each other. As a result, a series of adjacent clobbers will be scanned, but the same series of adjacent clobbers with a single long break of non-memory related instructions will not be. Right?

With the patch, we will walk backwards a fixed distance (in number of instructions) considering any def/clobber we see in that window.

Is that a correct summary?

(p.s. Using the same default value from the original implementation with the new one seems highly suspect since the old implementation would have been much more aggressive in practice..)

reames added inline comments.Aug 23 2016, 12:04 PM

lib/Transforms/Scalar/DeadStoreElimination.cpp
512–513	FYI: this patch clearly needs rebased
512–513	I'm wondering whether we can solve the practical problem with a caching change. What if we simply chose to cache the intermediate MDA results? (Note that MDA does not do this for the getPointerDependencyFrom interface.) I'm picturing a simple cache structure of Map<pair<MemoryLoc, Instruction>, MemDepResult> This wouldn't reduce the worst case complexity (it could be each instruction has a different memory location associated with it which is may alias with all others), but how many unique memory locations do we see in practice? And out of those, how many are mayalias (i.e. not a possible read or def which terminates a chain)?

Thanks for rebasing my changes, @davidxl! Since your updated version applies cleanly and matches the changes I wanted to make, I'm updating the code on Phabricator to match yours.

lgtm

Based on discussions, we will take this as solution to tackle quadratic behavior in DSE until memSSA is in place.

This revision is now accepted and ready to land.Aug 25 2016, 9:38 AM

inglorion closed this revision.Aug 26 2016, 9:25 AM

Revision Contents

Path

Size

include/

llvm/

Analysis/

MemoryDependenceAnalysis.h

17 lines

lib/

Analysis/

MemoryDependenceAnalysis.cpp

22 lines

Transforms/

Scalar/

DeadStoreElimination.cpp

12 lines

Diff 69170

include/llvm/Analysis/MemoryDependenceAnalysis.h

Show First 20 Lines • Show All 344 Lines • ▼ Show 20 Lines	private:
PredIteratorCache PredCache;		PredIteratorCache PredCache;

public:		public:
MemoryDependenceResults(AliasAnalysis &AA, AssumptionCache &AC,		MemoryDependenceResults(AliasAnalysis &AA, AssumptionCache &AC,
const TargetLibraryInfo &TLI,		const TargetLibraryInfo &TLI,
DominatorTree &DT)		DominatorTree &DT)
: AA(AA), AC(AC), TLI(TLI), DT(DT) {}		: AA(AA), AC(AC), TLI(TLI), DT(DT) {}

		/// Some methods limit the number of instructions they will examine.
		/// The return value of this method is the default limit that will be
		/// used if no limit is explicitly passed in.
		unsigned getDefaultBlockScanLimit() const;

/// Returns the instruction on which a memory operation depends.		/// Returns the instruction on which a memory operation depends.
///		///
/// See the class comment for more details. It is illegal to call this on		/// See the class comment for more details. It is illegal to call this on
/// non-memory instructions.		/// non-memory instructions.
MemDepResult getDependency(Instruction *QueryInst);		MemDepResult getDependency(Instruction *QueryInst);

/// Perform a full dependency query for the specified call, returning the set		/// Perform a full dependency query for the specified call, returning the set
/// of blocks that the value is potentially live across.		/// of blocks that the value is potentially live across.
▲ Show 20 Lines • Show All 43 Lines • ▼ Show 20 Lines	public:
void invalidateCachedPredecessors();		void invalidateCachedPredecessors();

/// Returns the instruction on which a memory location depends.		/// Returns the instruction on which a memory location depends.
///		///
/// If isLoad is true, this routine ignores may-aliases with read-only		/// If isLoad is true, this routine ignores may-aliases with read-only
/// operations. If isLoad is false, this routine ignores may-aliases		/// operations. If isLoad is false, this routine ignores may-aliases
/// with reads from read-only locations. If possible, pass the query		/// with reads from read-only locations. If possible, pass the query
/// instruction as well; this function may take advantage of the metadata		/// instruction as well; this function may take advantage of the metadata
/// annotated to the query instruction to refine the result.		/// annotated to the query instruction to refine the result. \p Limit
		/// can be used to set the maximum number of instructions that will be
		/// examined to find the pointer dependency. On return, it will be set to
		/// the number of instructions left to examine. If a null pointer is passed
		/// in, the limit will default to the value of -memdep-block-scan-limit.
///		///
/// Note that this is an uncached query, and thus may be inefficient.		/// Note that this is an uncached query, and thus may be inefficient.
MemDepResult getPointerDependencyFrom(const MemoryLocation &Loc, bool isLoad,		MemDepResult getPointerDependencyFrom(const MemoryLocation &Loc, bool isLoad,
BasicBlock::iterator ScanIt,		BasicBlock::iterator ScanIt,
BasicBlock *BB,		BasicBlock *BB,
Instruction *QueryInst = nullptr);		Instruction *QueryInst = nullptr,
		unsigned *Limit = nullptr);

MemDepResult getSimplePointerDependencyFrom(const MemoryLocation &MemLoc,		MemDepResult getSimplePointerDependencyFrom(const MemoryLocation &MemLoc,
bool isLoad,		bool isLoad,
BasicBlock::iterator ScanIt,		BasicBlock::iterator ScanIt,
BasicBlock *BB,		BasicBlock *BB,
Instruction *QueryInst);		Instruction *QueryInst,
		unsigned *Limit = nullptr);

/// This analysis looks for other loads and stores with invariant.group		/// This analysis looks for other loads and stores with invariant.group
/// metadata and the same pointer operand. Returns Unknown if it does not		/// metadata and the same pointer operand. Returns Unknown if it does not
/// find anything, and Def if it can be assumed that 2 instructions load or		/// find anything, and Def if it can be assumed that 2 instructions load or
/// store the same value.		/// store the same value.
/// FIXME: This analysis works only on single block because of restrictions		/// FIXME: This analysis works only on single block because of restrictions
/// at the call site.		/// at the call site.
MemDepResult getInvariantGroupPointerDependency(LoadInst LI, BasicBlock BB);		MemDepResult getInvariantGroupPointerDependency(LoadInst LI, BasicBlock BB);
▲ Show 20 Lines • Show All 76 Lines • Show Last 20 Lines

lib/Analysis/MemoryDependenceAnalysis.cpp

Show First 20 Lines • Show All 321 Lines • ▼ Show 20 Lines	else if (StoreInst *SI = dyn_cast<StoreInst>(Inst))
return SI->isVolatile();		return SI->isVolatile();
else if (AtomicCmpXchgInst *AI = dyn_cast<AtomicCmpXchgInst>(Inst))		else if (AtomicCmpXchgInst *AI = dyn_cast<AtomicCmpXchgInst>(Inst))
return AI->isVolatile();		return AI->isVolatile();
return false;		return false;
}		}

MemDepResult MemoryDependenceResults::getPointerDependencyFrom(		MemDepResult MemoryDependenceResults::getPointerDependencyFrom(
const MemoryLocation &MemLoc, bool isLoad, BasicBlock::iterator ScanIt,		const MemoryLocation &MemLoc, bool isLoad, BasicBlock::iterator ScanIt,
BasicBlock BB, Instruction QueryInst) {		BasicBlock BB, Instruction QueryInst, unsigned *Limit) {

if (QueryInst != nullptr) {		if (QueryInst != nullptr) {
if (auto *LI = dyn_cast<LoadInst>(QueryInst)) {		if (auto *LI = dyn_cast<LoadInst>(QueryInst)) {
MemDepResult invariantGroupDependency =		MemDepResult invariantGroupDependency =
getInvariantGroupPointerDependency(LI, BB);		getInvariantGroupPointerDependency(LI, BB);

if (invariantGroupDependency.isDef())		if (invariantGroupDependency.isDef())
return invariantGroupDependency;		return invariantGroupDependency;
}		}
}		}
return getSimplePointerDependencyFrom(MemLoc, isLoad, ScanIt, BB, QueryInst);		return getSimplePointerDependencyFrom(MemLoc, isLoad, ScanIt, BB, QueryInst,
		Limit);
}		}

MemDepResult		MemDepResult
MemoryDependenceResults::getInvariantGroupPointerDependency(LoadInst *LI,		MemoryDependenceResults::getInvariantGroupPointerDependency(LoadInst *LI,
BasicBlock *BB) {		BasicBlock *BB) {
Value *LoadOperand = LI->getPointerOperand();		Value *LoadOperand = LI->getPointerOperand();
// It's is not safe to walk the use list of global value, because function		// It's is not safe to walk the use list of global value, because function
// passes aren't allowed to look outside their functions.		// passes aren't allowed to look outside their functions.
Show All 39 Lines	for (Use &Us : Ptr->uses()) {
return MemDepResult::getDef(U);		return MemDepResult::getDef(U);
}		}
}		}
return Result;		return Result;
}		}

MemDepResult MemoryDependenceResults::getSimplePointerDependencyFrom(		MemDepResult MemoryDependenceResults::getSimplePointerDependencyFrom(
const MemoryLocation &MemLoc, bool isLoad, BasicBlock::iterator ScanIt,		const MemoryLocation &MemLoc, bool isLoad, BasicBlock::iterator ScanIt,
BasicBlock BB, Instruction QueryInst) {		BasicBlock BB, Instruction QueryInst, unsigned *Limit) {

		chandlercUnsubmitted Not Done Reply Inline Actions Exposing the search limit of MemDep and allowing the dynamic search depth to be surfaced in the API seems like a really bad API. Fundamentally, I think we should first bound the DSE scan when it has to call through to MemDep, which this effectively does, but in a very round-about way. Then, if it necessary to give MemDep increasingly small scan limits, you should just add a vanilla input here, and decrease that limit in the DSE loop below rather than carrying-over the limit from one query to another query. But I'd really like to understand if all you actually need is to limit the DSE scan. chandlerc: Exposing the search limit of MemDep and allowing the dynamic search depth to be surfaced in the…
		davidxlUnsubmitted Not Done Reply Inline Actions One data point -- DSE is not the only client pass that triggers the problem. We have seen GVN triggered one as well. (I have not examined this patch in detail). davidxl: One data point -- DSE is not the only client pass that triggers the problem. We have seen GVN…
		chandlercUnsubmitted Not Done Reply Inline Actions With out a very clear description of why the same limiting technique will work with two different consumers, I think it is a big mistake to put it into the API. chandlerc: With out a very clear description of why the same limiting technique will work with two…
const Value *MemLocBase = nullptr;		const Value *MemLocBase = nullptr;
int64_t MemLocOffset = 0;		int64_t MemLocOffset = 0;
unsigned Limit = BlockScanLimit;
bool isInvariantLoad = false;		bool isInvariantLoad = false;

		if (!Limit) {
		unsigned DefaultLimit = BlockScanLimit;
		return getSimplePointerDependencyFrom(MemLoc, isLoad, ScanIt, BB, QueryInst,
		&DefaultLimit);
		}

// We must be careful with atomic accesses, as they may allow another thread		// We must be careful with atomic accesses, as they may allow another thread
// to touch this location, clobbering it. We are conservative: if the		// to touch this location, clobbering it. We are conservative: if the
// QueryInst is not a simple (non-atomic) memory access, we automatically		// QueryInst is not a simple (non-atomic) memory access, we automatically
// return getClobber.		// return getClobber.
// If it is simple, we know based on the results of		// If it is simple, we know based on the results of
// "Compiler testing via a theory of sound optimisations in the C11/C++11		// "Compiler testing via a theory of sound optimisations in the C11/C++11
// memory model" in PLDI 2013, that a non-atomic location can only be		// memory model" in PLDI 2013, that a non-atomic location can only be
// clobbered between a pair of a release and an acquire action, with no		// clobbered between a pair of a release and an acquire action, with no
▲ Show 20 Lines • Show All 57 Lines • ▼ Show 20 Lines	while (ScanIt != BB->begin()) {

if (IntrinsicInst *II = dyn_cast<IntrinsicInst>(Inst))		if (IntrinsicInst *II = dyn_cast<IntrinsicInst>(Inst))
// Debug intrinsics don't (and can't) cause dependencies.		// Debug intrinsics don't (and can't) cause dependencies.
if (isa<DbgInfoIntrinsic>(II))		if (isa<DbgInfoIntrinsic>(II))
continue;		continue;

// Limit the amount of scanning we do so we don't end up with quadratic		// Limit the amount of scanning we do so we don't end up with quadratic
// running time on extreme testcases.		// running time on extreme testcases.
--Limit;		--*Limit;
if (!Limit)		if (!*Limit)
return MemDepResult::getUnknown();		return MemDepResult::getUnknown();

if (IntrinsicInst *II = dyn_cast<IntrinsicInst>(Inst)) {		if (IntrinsicInst *II = dyn_cast<IntrinsicInst>(Inst)) {
// If we reach a lifetime begin or end marker, then the query ends here		// If we reach a lifetime begin or end marker, then the query ends here
// because the value is undefined.		// because the value is undefined.
if (II->getIntrinsicID() == Intrinsic::lifetime_start) {		if (II->getIntrinsicID() == Intrinsic::lifetime_start) {
// FIXME: This only considers queries directly on the invariant-tagged		// FIXME: This only considers queries directly on the invariant-tagged
// pointer, not on query pointers that are indexed off of them. It'd		// pointer, not on query pointers that are indexed off of them. It'd
▲ Show 20 Lines • Show All 1,206 Lines • ▼ Show 20 Lines
void MemoryDependenceWrapperPass::getAnalysisUsage(AnalysisUsage &AU) const {		void MemoryDependenceWrapperPass::getAnalysisUsage(AnalysisUsage &AU) const {
AU.setPreservesAll();		AU.setPreservesAll();
AU.addRequired<AssumptionCacheTracker>();		AU.addRequired<AssumptionCacheTracker>();
AU.addRequired<DominatorTreeWrapperPass>();		AU.addRequired<DominatorTreeWrapperPass>();
AU.addRequiredTransitive<AAResultsWrapperPass>();		AU.addRequiredTransitive<AAResultsWrapperPass>();
AU.addRequiredTransitive<TargetLibraryInfoWrapperPass>();		AU.addRequiredTransitive<TargetLibraryInfoWrapperPass>();
}		}

		unsigned MemoryDependenceResults::getDefaultBlockScanLimit() const {
		return BlockScanLimit;
		}

bool MemoryDependenceWrapperPass::runOnFunction(Function &F) {		bool MemoryDependenceWrapperPass::runOnFunction(Function &F) {
auto &AA = getAnalysis<AAResultsWrapperPass>().getAAResults();		auto &AA = getAnalysis<AAResultsWrapperPass>().getAAResults();
auto &AC = getAnalysis<AssumptionCacheTracker>().getAssumptionCache(F);		auto &AC = getAnalysis<AssumptionCacheTracker>().getAssumptionCache(F);
auto &TLI = getAnalysis<TargetLibraryInfoWrapperPass>().getTLI();		auto &TLI = getAnalysis<TargetLibraryInfoWrapperPass>().getTLI();
auto &DT = getAnalysis<DominatorTreeWrapperPass>().getDomTree();		auto &DT = getAnalysis<DominatorTreeWrapperPass>().getDomTree();
MemDep.emplace(AA, AC, TLI, DT);		MemDep.emplace(AA, AC, TLI, DT);
return false;		return false;
}		}

lib/Transforms/Scalar/DeadStoreElimination.cpp

Show First 20 Lines • Show All 503 Lines • ▼ Show 20 Lines	static bool isPossibleSelfRead(Instruction *Inst,
if (DepReadLoc.Ptr && AA.isMustAlias(InstReadLoc.Ptr, DepReadLoc.Ptr))		if (DepReadLoc.Ptr && AA.isMustAlias(InstReadLoc.Ptr, DepReadLoc.Ptr))
return false;		return false;

// If DepWrite doesn't read memory or if we can't prove it is a must alias,		// If DepWrite doesn't read memory or if we can't prove it is a must alias,
// then it can't be considered dead.		// then it can't be considered dead.
return true;		return true;
}		}


/// Returns true if the memory which is accessed by the second instruction is not		/// Returns true if the memory which is accessed by the second instruction is not
reamesUnsubmitted Not Done Reply Inline Actions FYI: this patch clearly needs rebased reames: FYI: this patch clearly needs rebased
reamesUnsubmitted Not Done Reply Inline Actions I'm wondering whether we can solve the practical problem with a caching change. What if we simply chose to cache the intermediate MDA results? (Note that MDA does not do this for the getPointerDependencyFrom interface.) I'm picturing a simple cache structure of Map<pair<MemoryLoc, Instruction>, MemDepResult> This wouldn't reduce the worst case complexity (it could be each instruction has a different memory location associated with it which is may alias with all others), but how many unique memory locations do we see in practice? And out of those, how many are mayalias (i.e. not a possible read or def which terminates a chain)? reames: I'm wondering whether we can solve the practical problem with a caching change. What if we…
/// modified between the first and the second instruction.		/// modified between the first and the second instruction.
/// Precondition: Second instruction must be dominated by the first		/// Precondition: Second instruction must be dominated by the first
/// instruction.		/// instruction.
static bool memoryIsNotModifiedBetween(Instruction *FirstI,		static bool memoryIsNotModifiedBetween(Instruction *FirstI,
Instruction *SecondI,		Instruction *SecondI,
AliasAnalysis *AA) {		AliasAnalysis *AA) {
SmallVector<BasicBlock *, 16> WorkList;		SmallVector<BasicBlock *, 16> WorkList;
SmallPtrSet<BasicBlock *, 8> Visited;		SmallPtrSet<BasicBlock *, 8> Visited;
▲ Show 20 Lines • Show All 524 Lines • ▼ Show 20 Lines	for (BasicBlock::iterator BBI = BB.begin(), BBE = BB.end(); BBI != BBE; ) {

// Figure out what location is being stored to.		// Figure out what location is being stored to.
MemoryLocation Loc = getLocForWrite(Inst, *AA);		MemoryLocation Loc = getLocForWrite(Inst, *AA);

// If we didn't get a useful location, fail.		// If we didn't get a useful location, fail.
if (!Loc.Ptr)		if (!Loc.Ptr)
continue;		continue;

		// Loop until we find a store we can eliminate or a load that
		// invalidates the analysis. Without an upper bound on the number of
		// instructions examined, this analysis can become very time-consuming.
		// However, the potential gain diminishes as we process more instructions
		// without eliminating any of them. Therefore, we limit the number of
		// instructions we look at.
		auto Limit = MD->getDefaultBlockScanLimit();
while (InstDep.isDef() \|\| InstDep.isClobber()) {		while (InstDep.isDef() \|\| InstDep.isClobber()) {
// Get the memory clobbered by the instruction we depend on. MemDep will		// Get the memory clobbered by the instruction we depend on. MemDep will
// skip any instructions that 'Loc' clearly doesn't interact with. If we		// skip any instructions that 'Loc' clearly doesn't interact with. If we
// end up depending on a may- or must-aliased load, then we can't optimize		// end up depending on a may- or must-aliased load, then we can't optimize
// away the store and we bail out. However, if we depend on something		// away the store and we bail out. However, if we depend on something
// that overwrites the memory location we can potentially optimize it.		// that overwrites the memory location we can potentially optimize it.
//		//
// Find out what memory location the dependent instruction stores.		// Find out what memory location the dependent instruction stores.
▲ Show 20 Lines • Show All 72 Lines • ▼ Show 20 Lines	while (InstDep.isDef() \|\| InstDep.isClobber()) {
// we can remove the first store to P even though we don't know if P and Q		// we can remove the first store to P even though we don't know if P and Q
// alias.		// alias.
if (DepWrite == &BB.front()) break;		if (DepWrite == &BB.front()) break;

// Can't look past this instruction if it might read 'Loc'.		// Can't look past this instruction if it might read 'Loc'.
if (AA->getModRefInfo(DepWrite, Loc) & MRI_Ref)		if (AA->getModRefInfo(DepWrite, Loc) & MRI_Ref)
break;		break;

InstDep = MD->getPointerDependencyFrom(Loc, false,		InstDep = MD->getPointerDependencyFrom(Loc, /isLoad=/ false,
DepWrite->getIterator(), &BB);		DepWrite->getIterator(), &BB,
		/QueryInst=/ nullptr, &Limit);
}		}
}		}

if (EnablePartialOverwriteTracking)		if (EnablePartialOverwriteTracking)
MadeChange \|= removePartiallyOverlappedStores(AA, DL, IOL);		MadeChange \|= removePartiallyOverlappedStores(AA, DL, IOL);

// If this block ends in a return, unwind, or unreachable, all allocas are		// If this block ends in a return, unwind, or unreachable, all allocas are
// dead at its end, which means stores to them are also dead.		// dead at its end, which means stores to them are also dead.
▲ Show 20 Lines • Show All 88 Lines • Show Last 20 Lines