This is an archive of the discontinued LLVM Phabricator instance.

limit the number of instructions per block examined by dead store elimination
ClosedPublic

Authored by inglorion on Dec 15 2015, 11:10 AM.

Download Raw Diff

Details

Reviewers

george.burgess.iv
bruno
reames
davidxl
• dberlin
dexonsmith

Commits

rG3db176410a19: limit the number of instructions per block examined by dead store elimination
rL279833: limit the number of instructions per block examined by dead store elimination

Summary

Dead store elimination gets very expensive when large numbers of instructions need to be analyzed. This patch limits the number of instructions analyzed per store to the value of the memdep-block-scan-limit parameter (which defaults to 100). This resulted in no observed difference in performance of the generated code, and no change in the statistics for the dead store elimination pass, but improved compilation time on some files by more than an order of magnitude.

Diff Detail

Event Timeline

inglorion updated this revision to Diff 42880.Dec 15 2015, 11:10 AM

inglorion retitled this revision from to limit the number of instructions per block examined by dead store elimination.

inglorion updated this object.

inglorion added a reviewer: dexonsmith.Dec 15 2015, 11:13 AM

inglorion added a subscriber: llvm-commits.

bruno added a reviewer: bruno.Jan 5 2016, 3:41 PM

mbodart added a subscriber: mbodart.Jan 5 2016, 4:39 PM

hfinkel added reviewers: • dberlin, george.burgess.iv, reames.Feb 2 2016, 2:29 PM

Can you provide tests for these cases, so we at least have some idea when the limit should hit?

Any update on this patch? I have an internal source file for which DSE is taking an inordinate amount of time:

1427.7928 ( 99.9%)   0.1237 ( 61.0%)  1427.9166 ( 99.9%)  1428.6744 ( 99.9%)  Dead Store Elimination
 ...
1428.8521 (100.0%)   0.2030 (100.0%)  1429.0551 (100.0%)  1429.8275 (100.0%)  Total

Applying this patch reduces this immensely:

17.7091 ( 94.3%)   0.0123 ( 14.7%)  17.7214 ( 93.9%)  17.7251 ( 93.8%)  Dead Store Elimination
 ...
18.7871 (100.0%)   0.0837 (100.0%)  18.8707 (100.0%)  18.8947 (100.0%)  Total

Easwaran may also hit some very long compile time problems due to this.

Feel free to invest time switching it to memoryssa :)

yes -- MemSSA will get there. Short term workaround might also be needed :)

I'd be a lot more willing to say yes (though you can get yes from someone
else too!) if someone was committed to doing the work to actually making it
short term.

Because the short term things all over the place rarely end up short term :)
While plan is to convert the existing memdep passes, me doing it alone will
a longer process than folks are likely to want here.

In D15537#397257, @davidxl wrote:

yes -- MemSSA will get there. Short term workaround might also be needed :)

I think we need a stopgap fix for this non-linear compile time issue. I don't think waiting until it is reimplemented to use MemorySSA is a good strategy given the extreme compile time issues, and I can't even find a switch to disable this pass.

BTW, the author had provided a test case on the mailing list (there are two threads for this patch):

http://lists.llvm.org/pipermail/llvm-commits/Week-of-Mon-20160201/330335.html

FWIW, I think it is reasonable to workaround quadratic compile time problems with temporary limits until a well scaling algorithm lands. However, I don't think this patch as-is is the correct approach.

lib/Analysis/MemoryDependenceAnalysis.cpp
455–461	Exposing the search limit of MemDep and allowing the dynamic search depth to be surfaced in the API seems like a really bad API. Fundamentally, I think we should first bound the DSE scan when it has to call through to MemDep, which this effectively does, but in a very round-about way. Then, if it necessary to give MemDep increasingly small scan limits, you should just add a vanilla input here, and decrease that limit in the DSE loop below rather than carrying-over the limit from one query to another query. But I'd really like to understand if all you actually need is to limit the DSE scan.

davidxl added inline comments.Apr 12 2016, 9:49 AM

lib/Analysis/MemoryDependenceAnalysis.cpp
455–461	One data point -- DSE is not the only client pass that triggers the problem. We have seen GVN triggered one as well. (I have not examined this patch in detail).

chandlerc added inline comments.Apr 12 2016, 9:57 AM

lib/Analysis/MemoryDependenceAnalysis.cpp
455–461	With out a very clear description of why the same limiting technique will work with two different consumers, I think it is a big mistake to put it into the API.

Thank you for all your comments, folks. I will be happy to improve this patch so we can get stop the long compiles while a better solution is being worked on.

@chandlerc, I am not 100% sure I understand your suggestion. If I understand you correctly, you are saying that you think it is bad API design to add the getDefaultBlockScanLimit to MemDep and allow clients of MemDep to pass in a limit to getPointerDependencyFrom. Ok. Then you say "we should first bound the DSE scan when it has to call through to MemDep" and "you should just add a vanilla input here" (in getSimplePointerDependencyFrom). What do you mean? Perhaps you could illustrate the API you're looking for with some pseudocode.

For what it's worth, what I am trying to accomplish with this patch is essentially to better enforce the limit that MemDep already has. Without the patch, we limit the number of instructions we will examine in a single call to getSimplePointerDependencyFrom to (by default) the last 100 instructions before the one where the search starts. In DSE, we then look at the result to determine if it lets us determine whether we can definitely perform or definitely not perform the optimization, but, in some cases, the result actually doesn't tell us one way or another, so we call into MemDep again to get the next possibly relevant instruction. This then searches up to 100 instructions from the location of the previous result. Since this can continue indefinitely, we can end up way more than 100 instructions away from where the search originally started, which is what causes the long compiles. This patch limits the search to 100 instructions (or whatever the parameter is set to) from where the DSE pass started the search, effectively closing the loophole that let us run past the limit the original search had.

Ping ..

Chandler, do you have time to reply to Bob's question? What this patch
does essentially is to enforce a global limit on for MemDep queries instead
of using a per-query local limit as is done today.

Your suggestion of having a fixed limit in DSE loop may not work well in
some cases and lead to more pessimistic result. The main problem with that
approach is that it does not know how expensive (how many instructions are
touched in backward walking) in each individual query. If there is a way to
communicate this cost information back to the caller of MD interface, that
will be fine too -- but that is essentially the same as what this patch
does.

We have been beaten badly by this problem in both sanitizer builds and FDO
builds. With Sanitizer buiild, we can throw in O0 as a walkaround (but
also suffer in runtime), but for FDO this is not an option.

David

Ping ..

Ping. Chandler, It has been a while (3 months) since my last reply (May 6) to the thread.

thanks,

The updated patch is here https://reviews.llvm.org/F2313386

The patch fixes the issue in https://llvm.org/bugs/show_bug.cgi?id=29064

davide added a subscriber: davide.Aug 22 2016, 12:26 PM

I just want to make sure I understand the big picture view of this change. I'm going to try to summarize, please correct me if I'm wrong.

Today, we will walk back through the list of defs/clobbers provided by MDA (within a single block) without limit in DSE. Internally, MDA will only find defs/clobbers which are within a limited distance of each other. As a result, a series of adjacent clobbers will be scanned, but the same series of adjacent clobbers with a single long break of non-memory related instructions will not be. Right?

With the patch, we will walk backwards a fixed distance (in number of instructions) considering any def/clobber we see in that window.

Is that a correct summary?

(p.s. Using the same default value from the original implementation with the new one seems highly suspect since the old implementation would have been much more aggressive in practice..)

reames added inline comments.Aug 23 2016, 12:04 PM

lib/Transforms/Scalar/DeadStoreElimination.cpp
490	I'm wondering whether we can solve the practical problem with a caching change. What if we simply chose to cache the intermediate MDA results? (Note that MDA does not do this for the getPointerDependencyFrom interface.) I'm picturing a simple cache structure of Map<pair<MemoryLoc, Instruction>, MemDepResult> This wouldn't reduce the worst case complexity (it could be each instruction has a different memory location associated with it which is may alias with all others), but how many unique memory locations do we see in practice? And out of those, how many are mayalias (i.e. not a possible read or def which terminates a chain)?
522	FYI: this patch clearly needs rebased

Thanks for rebasing my changes, @davidxl! Since your updated version applies cleanly and matches the changes I wanted to make, I'm updating the code on Phabricator to match yours.

lgtm

Based on discussions, we will take this as solution to tackle quadratic behavior in DSE until memSSA is in place.

This revision is now accepted and ready to land.Aug 25 2016, 9:38 AM

inglorion closed this revision.Aug 26 2016, 9:25 AM

Revision Contents

Path

Size

include/

llvm/

Analysis/

MemoryDependenceAnalysis.h

16 lines

lib/

Analysis/

MemoryDependenceAnalysis.cpp

26 lines

Transforms/

Scalar/

DeadStoreElimination.cpp

14 lines

Diff 42880

include/llvm/Analysis/MemoryDependenceAnalysis.h

Show First 20 Lines • Show All 342 Lines • ▼ Show 20 Lines	public:
/// Clean up memory in between runs		/// Clean up memory in between runs
void releaseMemory() override;		void releaseMemory() override;

/// getAnalysisUsage - Does not modify anything. It uses Value Numbering		/// getAnalysisUsage - Does not modify anything. It uses Value Numbering
/// and Alias Analysis.		/// and Alias Analysis.
///		///
void getAnalysisUsage(AnalysisUsage &AU) const override;		void getAnalysisUsage(AnalysisUsage &AU) const override;

		/// Some methods limit the number of instructions they will examine.
		/// The return value of this method is the default limit that will be
		/// used if no limit is explicitly passed in.
		unsigned getDefaultBlockScanLimit() const;

/// getDependency - Return the instruction on which a memory operation		/// getDependency - Return the instruction on which a memory operation
/// depends. See the class comment for more details. It is illegal to call		/// depends. See the class comment for more details. It is illegal to call
/// this on non-memory instructions.		/// this on non-memory instructions.
MemDepResult getDependency(Instruction *QueryInst);		MemDepResult getDependency(Instruction *QueryInst);

/// getNonLocalCallDependency - Perform a full dependency query for the		/// getNonLocalCallDependency - Perform a full dependency query for the
/// specified call, returning the set of blocks that the value is		/// specified call, returning the set of blocks that the value is
/// potentially live across. The returned set of results will include a		/// potentially live across. The returned set of results will include a
▲ Show 20 Lines • Show All 41 Lines • ▼ Show 20 Lines	public:
/// \brief Return the instruction on which a memory location depends.		/// \brief Return the instruction on which a memory location depends.
/// If isLoad is true, this routine ignores may-aliases with read-only		/// If isLoad is true, this routine ignores may-aliases with read-only
/// operations. If isLoad is false, this routine ignores may-aliases		/// operations. If isLoad is false, this routine ignores may-aliases
/// with reads from read-only locations. If possible, pass the query		/// with reads from read-only locations. If possible, pass the query
/// instruction as well; this function may take advantage of the metadata		/// instruction as well; this function may take advantage of the metadata
/// annotated to the query instruction to refine the result.		/// annotated to the query instruction to refine the result.
///		///
/// Note that this is an uncached query, and thus may be inefficient.		/// Note that this is an uncached query, and thus may be inefficient.
		/// Limit can be used to set the maximum number of instructions that
		/// will be examined to find the pointer dependency. On return, it will
		/// be set to the number of instructions left to examine. If a null pointer
		/// is passed in, the limit will default to the value of the
		/// -memdep-block-scan-limit parameter.
///		///
MemDepResult getPointerDependencyFrom(const MemoryLocation &Loc,		MemDepResult getPointerDependencyFrom(const MemoryLocation &Loc,
bool isLoad,		bool isLoad,
BasicBlock::iterator ScanIt,		BasicBlock::iterator ScanIt,
BasicBlock *BB,		BasicBlock *BB,
Instruction *QueryInst = nullptr);		Instruction *QueryInst = nullptr,
		unsigned *Limit = nullptr);

MemDepResult getSimplePointerDependencyFrom(const MemoryLocation &MemLoc,		MemDepResult getSimplePointerDependencyFrom(const MemoryLocation &MemLoc,
bool isLoad,		bool isLoad,
BasicBlock::iterator ScanIt,		BasicBlock::iterator ScanIt,
BasicBlock *BB,		BasicBlock *BB,
Instruction *QueryInst);		Instruction *QueryInst,
		unsigned *Limit = nullptr);

/// This analysis looks for other loads and stores with invariant.group		/// This analysis looks for other loads and stores with invariant.group
/// metadata and the same pointer operand. Returns Unknown if it does not		/// metadata and the same pointer operand. Returns Unknown if it does not
/// find anything, and Def if it can be assumed that 2 instructions load or		/// find anything, and Def if it can be assumed that 2 instructions load or
/// store the same value.		/// store the same value.
/// FIXME: This analysis works only on single block because of restrictions		/// FIXME: This analysis works only on single block because of restrictions
/// at the call site.		/// at the call site.
MemDepResult getInvariantGroupPointerDependency(LoadInst *LI,		MemDepResult getInvariantGroupPointerDependency(LoadInst *LI,
Show All 40 Lines

lib/Analysis/MemoryDependenceAnalysis.cpp

Show First 20 Lines • Show All 92 Lines • ▼ Show 20 Lines
///		///
void MemoryDependenceAnalysis::getAnalysisUsage(AnalysisUsage &AU) const {		void MemoryDependenceAnalysis::getAnalysisUsage(AnalysisUsage &AU) const {
AU.setPreservesAll();		AU.setPreservesAll();
AU.addRequired<AssumptionCacheTracker>();		AU.addRequired<AssumptionCacheTracker>();
AU.addRequiredTransitive<AAResultsWrapperPass>();		AU.addRequiredTransitive<AAResultsWrapperPass>();
AU.addRequiredTransitive<TargetLibraryInfoWrapperPass>();		AU.addRequiredTransitive<TargetLibraryInfoWrapperPass>();
}		}

		unsigned MemoryDependenceAnalysis::getDefaultBlockScanLimit() const {
		return BlockScanLimit;
		}

bool MemoryDependenceAnalysis::runOnFunction(Function &F) {		bool MemoryDependenceAnalysis::runOnFunction(Function &F) {
AA = &getAnalysis<AAResultsWrapperPass>().getAAResults();		AA = &getAnalysis<AAResultsWrapperPass>().getAAResults();
AC = &getAnalysis<AssumptionCacheTracker>().getAssumptionCache(F);		AC = &getAnalysis<AssumptionCacheTracker>().getAssumptionCache(F);
DominatorTreeWrapperPass *DTWP =		DominatorTreeWrapperPass *DTWP =
getAnalysisIfAvailable<DominatorTreeWrapperPass>();		getAnalysisIfAvailable<DominatorTreeWrapperPass>();
DT = DTWP ? &DTWP->getDomTree() : nullptr;		DT = DTWP ? &DTWP->getDomTree() : nullptr;
TLI = &getAnalysis<TargetLibraryInfoWrapperPass>().getTLI();		TLI = &getAnalysis<TargetLibraryInfoWrapperPass>().getTLI();
return false;		return false;
▲ Show 20 Lines • Show All 264 Lines • ▼ Show 20 Lines
/// getPointerDependencyFrom - Return the instruction on which a memory		/// getPointerDependencyFrom - Return the instruction on which a memory
/// location depends. If isLoad is true, this routine ignores may-aliases with		/// location depends. If isLoad is true, this routine ignores may-aliases with
/// read-only operations. If isLoad is false, this routine ignores may-aliases		/// read-only operations. If isLoad is false, this routine ignores may-aliases
/// with reads from read-only locations. If possible, pass the query		/// with reads from read-only locations. If possible, pass the query
/// instruction as well; this function may take advantage of the metadata		/// instruction as well; this function may take advantage of the metadata
/// annotated to the query instruction to refine the result.		/// annotated to the query instruction to refine the result.
MemDepResult MemoryDependenceAnalysis::getPointerDependencyFrom(		MemDepResult MemoryDependenceAnalysis::getPointerDependencyFrom(
const MemoryLocation &MemLoc, bool isLoad, BasicBlock::iterator ScanIt,		const MemoryLocation &MemLoc, bool isLoad, BasicBlock::iterator ScanIt,
BasicBlock BB, Instruction QueryInst) {		BasicBlock BB, Instruction QueryInst, unsigned *Limit) {

if (QueryInst != nullptr) {		if (QueryInst != nullptr) {
if (auto *LI = dyn_cast<LoadInst>(QueryInst)) {		if (auto *LI = dyn_cast<LoadInst>(QueryInst)) {
MemDepResult invariantGroupDependency =		MemDepResult invariantGroupDependency =
getInvariantGroupPointerDependency(LI, BB);		getInvariantGroupPointerDependency(LI, BB);

if (invariantGroupDependency.isDef())		if (invariantGroupDependency.isDef())
return invariantGroupDependency;		return invariantGroupDependency;
}		}
}		}
return getSimplePointerDependencyFrom(MemLoc, isLoad, ScanIt, BB, QueryInst);		return getSimplePointerDependencyFrom(MemLoc, isLoad, ScanIt, BB, QueryInst,
		Limit);
}		}

MemDepResult		MemDepResult
MemoryDependenceAnalysis::getInvariantGroupPointerDependency(LoadInst *LI,		MemoryDependenceAnalysis::getInvariantGroupPointerDependency(LoadInst *LI,
BasicBlock *BB) {		BasicBlock *BB) {
Value *LoadOperand = LI->getPointerOperand();		Value *LoadOperand = LI->getPointerOperand();
// It's is not safe to walk the use list of global value, because function		// It's is not safe to walk the use list of global value, because function
// passes aren't allowed to look outside their functions.		// passes aren't allowed to look outside their functions.
▲ Show 20 Lines • Show All 41 Lines • ▼ Show 20 Lines	for (Use &Us : Ptr->uses()) {
return MemDepResult::getDef(U);		return MemDepResult::getDef(U);
}		}
}		}
return Result;		return Result;
}		}

MemDepResult MemoryDependenceAnalysis::getSimplePointerDependencyFrom(		MemDepResult MemoryDependenceAnalysis::getSimplePointerDependencyFrom(
const MemoryLocation &MemLoc, bool isLoad, BasicBlock::iterator ScanIt,		const MemoryLocation &MemLoc, bool isLoad, BasicBlock::iterator ScanIt,
BasicBlock BB, Instruction QueryInst) {		BasicBlock BB, Instruction QueryInst, unsigned *Limit) {

		if (!Limit) {
		unsigned DefaultLimit = BlockScanLimit;
		return getSimplePointerDependencyFrom(MemLoc, isLoad, ScanIt, BB, QueryInst,
		&DefaultLimit);
		}
		chandlercUnsubmitted Not Done Reply Inline Actions Exposing the search limit of MemDep and allowing the dynamic search depth to be surfaced in the API seems like a really bad API. Fundamentally, I think we should first bound the DSE scan when it has to call through to MemDep, which this effectively does, but in a very round-about way. Then, if it necessary to give MemDep increasingly small scan limits, you should just add a vanilla input here, and decrease that limit in the DSE loop below rather than carrying-over the limit from one query to another query. But I'd really like to understand if all you actually need is to limit the DSE scan. chandlerc: Exposing the search limit of MemDep and allowing the dynamic search depth to be surfaced in the…
		davidxlUnsubmitted Not Done Reply Inline Actions One data point -- DSE is not the only client pass that triggers the problem. We have seen GVN triggered one as well. (I have not examined this patch in detail). davidxl: One data point -- DSE is not the only client pass that triggers the problem. We have seen GVN…
		chandlercUnsubmitted Not Done Reply Inline Actions With out a very clear description of why the same limiting technique will work with two different consumers, I think it is a big mistake to put it into the API. chandlerc: With out a very clear description of why the same limiting technique will work with two…

const Value *MemLocBase = nullptr;		const Value *MemLocBase = nullptr;
int64_t MemLocOffset = 0;		int64_t MemLocOffset = 0;
unsigned Limit = BlockScanLimit;
bool isInvariantLoad = false;		bool isInvariantLoad = false;

// We must be careful with atomic accesses, as they may allow another thread		// We must be careful with atomic accesses, as they may allow another thread
// to touch this location, cloberring it. We are conservative: if the		// to touch this location, cloberring it. We are conservative: if the
// QueryInst is not a simple (non-atomic) memory access, we automatically		// QueryInst is not a simple (non-atomic) memory access, we automatically
// return getClobber.		// return getClobber.
// If it is simple, we know based on the results of		// If it is simple, we know based on the results of
// "Compiler testing via a theory of sound optimisations in the C11/C++11		// "Compiler testing via a theory of sound optimisations in the C11/C++11
▲ Show 20 Lines • Show All 42 Lines • ▼ Show 20 Lines	while (ScanIt != BB->begin()) {
Instruction Inst = &--ScanIt;		Instruction Inst = &--ScanIt;

if (IntrinsicInst *II = dyn_cast<IntrinsicInst>(Inst))		if (IntrinsicInst *II = dyn_cast<IntrinsicInst>(Inst))
// Debug intrinsics don't (and can't) cause dependencies.		// Debug intrinsics don't (and can't) cause dependencies.
if (isa<DbgInfoIntrinsic>(II)) continue;		if (isa<DbgInfoIntrinsic>(II)) continue;

// Limit the amount of scanning we do so we don't end up with quadratic		// Limit the amount of scanning we do so we don't end up with quadratic
// running time on extreme testcases.		// running time on extreme testcases.
--Limit;		--*Limit;
if (!Limit)		if (!*Limit)
return MemDepResult::getUnknown();		return MemDepResult::getUnknown();

if (IntrinsicInst *II = dyn_cast<IntrinsicInst>(Inst)) {		if (IntrinsicInst *II = dyn_cast<IntrinsicInst>(Inst)) {
// If we reach a lifetime begin or end marker, then the query ends here		// If we reach a lifetime begin or end marker, then the query ends here
// because the value is undefined.		// because the value is undefined.
if (II->getIntrinsicID() == Intrinsic::lifetime_start) {		if (II->getIntrinsicID() == Intrinsic::lifetime_start) {
// FIXME: This only considers queries directly on the invariant-tagged		// FIXME: This only considers queries directly on the invariant-tagged
// pointer, not on query pointers that are indexed off of them. It'd		// pointer, not on query pointers that are indexed off of them. It'd
Show All 19 Lines	if (LoadInst *LI = dyn_cast<LoadInst>(Inst)) {
if (!QueryInst)		if (!QueryInst)
// Original QueryInst may be volatile		// Original QueryInst may be volatile
return MemDepResult::getClobber(LI);		return MemDepResult::getClobber(LI);
if (isVolatile(QueryInst))		if (isVolatile(QueryInst))
// Ordering required if QueryInst is itself volatile		// Ordering required if QueryInst is itself volatile
return MemDepResult::getClobber(LI);		return MemDepResult::getClobber(LI);
// Otherwise, volatile doesn't imply any special ordering		// Otherwise, volatile doesn't imply any special ordering
}		}

// Atomic loads have complications involved.		// Atomic loads have complications involved.
// A Monotonic (or higher) load is OK if the query inst is itself not atomic.		// A Monotonic (or higher) load is OK if the query inst is itself not atomic.
// FIXME: This is overly conservative.		// FIXME: This is overly conservative.
if (LI->isAtomic() && LI->getOrdering() > Unordered) {		if (LI->isAtomic() && LI->getOrdering() > Unordered) {
if (!QueryInst)		if (!QueryInst)
return MemDepResult::getClobber(LI);		return MemDepResult::getClobber(LI);
if (LI->getOrdering() != Monotonic)		if (LI->getOrdering() != Monotonic)
return MemDepResult::getClobber(LI);		return MemDepResult::getClobber(LI);
▲ Show 20 Lines • Show All 396 Lines • ▼ Show 20 Lines	getNonLocalPointerDependency(Instruction *QueryInst,
const MemoryLocation Loc = MemoryLocation::get(QueryInst);		const MemoryLocation Loc = MemoryLocation::get(QueryInst);
bool isLoad = isa<LoadInst>(QueryInst);		bool isLoad = isa<LoadInst>(QueryInst);
BasicBlock *FromBB = QueryInst->getParent();		BasicBlock *FromBB = QueryInst->getParent();
assert(FromBB);		assert(FromBB);

assert(Loc.Ptr->getType()->isPointerTy() &&		assert(Loc.Ptr->getType()->isPointerTy() &&
"Can't get pointer deps of a non-pointer!");		"Can't get pointer deps of a non-pointer!");
Result.clear();		Result.clear();

// This routine does not expect to deal with volatile instructions.		// This routine does not expect to deal with volatile instructions.
// Doing so would require piping through the QueryInst all the way through.		// Doing so would require piping through the QueryInst all the way through.
// TODO: volatiles can't be elided, but they can be reordered with other		// TODO: volatiles can't be elided, but they can be reordered with other
// non-volatile accesses.		// non-volatile accesses.

// We currently give up on any instruction which is ordered, but we do handle		// We currently give up on any instruction which is ordered, but we do handle
// atomic instructions which are unordered.		// atomic instructions which are unordered.
// TODO: Handle ordered instructions		// TODO: Handle ordered instructions
▲ Show 20 Lines • Show All 787 Lines • Show Last 20 Lines

lib/Transforms/Scalar/DeadStoreElimination.cpp

Show First 20 Lines • Show All 481 Lines • ▼ Show 20 Lines
// DSE Pass		// DSE Pass
//===----------------------------------------------------------------------===//		//===----------------------------------------------------------------------===//

bool DSE::runOnBasicBlock(BasicBlock &BB) {		bool DSE::runOnBasicBlock(BasicBlock &BB) {
const DataLayout &DL = BB.getModule()->getDataLayout();		const DataLayout &DL = BB.getModule()->getDataLayout();
bool MadeChange = false;		bool MadeChange = false;

// Do a top-down walk on the BB.		// Do a top-down walk on the BB.
for (BasicBlock::iterator BBI = BB.begin(), BBE = BB.end(); BBI != BBE; ) {		for (BasicBlock::iterator BBI = BB.begin(), BBE = BB.end(); BBI != BBE; ) {
reamesUnsubmitted Not Done Reply Inline Actions I'm wondering whether we can solve the practical problem with a caching change. What if we simply chose to cache the intermediate MDA results? (Note that MDA does not do this for the getPointerDependencyFrom interface.) I'm picturing a simple cache structure of Map<pair<MemoryLoc, Instruction>, MemDepResult> This wouldn't reduce the worst case complexity (it could be each instruction has a different memory location associated with it which is may alias with all others), but how many unique memory locations do we see in practice? And out of those, how many are mayalias (i.e. not a possible read or def which terminates a chain)? reames: I'm wondering whether we can solve the practical problem with a caching change. What if we…
Instruction Inst = &BBI++;		Instruction Inst = &BBI++;

// Handle 'free' calls specially.		// Handle 'free' calls specially.
if (CallInst *F = isFreeCall(Inst, TLI)) {		if (CallInst *F = isFreeCall(Inst, TLI)) {
MadeChange \|= HandleFree(F);		MadeChange \|= HandleFree(F);
continue;		continue;
}		}

Show All 15 Lines	if (StoreInst *SI = dyn_cast<StoreInst>(Inst)) {
if (!NextInst) // Next instruction deleted.		if (!NextInst) // Next instruction deleted.
BBI = BB.begin();		BBI = BB.begin();
else if (BBI != BB.begin()) // Revisit this instruction if possible.		else if (BBI != BB.begin()) // Revisit this instruction if possible.
--BBI;		--BBI;
++NumRedundantStores;		++NumRedundantStores;
MadeChange = true;		MadeChange = true;
};		};

if (LoadInst *DepLoad = dyn_cast<LoadInst>(SI->getValueOperand())) {		if (LoadInst *DepLoad = dyn_cast<LoadInst>(SI->getValueOperand())) {
reamesUnsubmitted Not Done Reply Inline Actions FYI: this patch clearly needs rebased reames: FYI: this patch clearly needs rebased
if (SI->getPointerOperand() == DepLoad->getPointerOperand() &&		if (SI->getPointerOperand() == DepLoad->getPointerOperand() &&
isRemovable(SI) &&		isRemovable(SI) &&
MemoryIsNotModifiedBetween(DepLoad, SI)) {		MemoryIsNotModifiedBetween(DepLoad, SI)) {

DEBUG(dbgs() << "DSE: Remove Store Of Load from same pointer:\n "		DEBUG(dbgs() << "DSE: Remove Store Of Load from same pointer:\n "
<< "LOAD: " << DepLoad << "\n STORE: " << SI << '\n');		<< "LOAD: " << DepLoad << "\n STORE: " << SI << '\n');

RemoveDeadInstAndUpdateBBI(SI);		RemoveDeadInstAndUpdateBBI(SI);
Show All 30 Lines	for (BasicBlock::iterator BBI = BB.begin(), BBE = BB.end(); BBI != BBE; ) {

// Figure out what location is being stored to.		// Figure out what location is being stored to.
MemoryLocation Loc = getLocForWrite(Inst, *AA);		MemoryLocation Loc = getLocForWrite(Inst, *AA);

// If we didn't get a useful location, fail.		// If we didn't get a useful location, fail.
if (!Loc.Ptr)		if (!Loc.Ptr)
continue;		continue;

		// Loop until we find a store we can eliminate or a load that
		// invalidates the analysis. Without an upper bound on the number of
		// instructions examined, this analysis can become very time-consuming.
		// However, the potential gain diminishes as we process more instructions
		// without eliminating any of them. Therefore, we limit the number of
		// instructions we look at.
		auto Limit = MD->getDefaultBlockScanLimit();
while (InstDep.isDef() \|\| InstDep.isClobber()) {		while (InstDep.isDef() \|\| InstDep.isClobber()) {
// Get the memory clobbered by the instruction we depend on. MemDep will		// Get the memory clobbered by the instruction we depend on. MemDep will
// skip any instructions that 'Loc' clearly doesn't interact with. If we		// skip any instructions that 'Loc' clearly doesn't interact with. If we
// end up depending on a may- or must-aliased load, then we can't optimize		// end up depending on a may- or must-aliased load, then we can't optimize
// away the store and we bail out. However, if we depend on on something		// away the store and we bail out. However, if we depend on on something
// that overwrites the memory location we can potentially optimize it.		// that overwrites the memory location we can potentially optimize it.
//		//
// Find out what memory location the dependent instruction stores.		// Find out what memory location the dependent instruction stores.
▲ Show 20 Lines • Show All 63 Lines • ▼ Show 20 Lines	while (InstDep.isDef() \|\| InstDep.isClobber()) {
// we can remove the first store to P even though we don't know if P and Q		// we can remove the first store to P even though we don't know if P and Q
// alias.		// alias.
if (DepWrite == &BB.front()) break;		if (DepWrite == &BB.front()) break;

// Can't look past this instruction if it might read 'Loc'.		// Can't look past this instruction if it might read 'Loc'.
if (AA->getModRefInfo(DepWrite, Loc) & MRI_Ref)		if (AA->getModRefInfo(DepWrite, Loc) & MRI_Ref)
break;		break;

InstDep = MD->getPointerDependencyFrom(Loc, false,		InstDep = MD->getPointerDependencyFrom(Loc, /isLoad=/ false,
DepWrite->getIterator(), &BB);		DepWrite->getIterator(), &BB,
		/QueryInst=/ nullptr, &Limit);
}		}
}		}

// If this block ends in a return, unwind, or unreachable, all allocas are		// If this block ends in a return, unwind, or unreachable, all allocas are
// dead at its end, which means stores to them are also dead.		// dead at its end, which means stores to them are also dead.
if (BB.getTerminator()->getNumSuccessors() == 0)		if (BB.getTerminator()->getNumSuccessors() == 0)
MadeChange \|= handleEndBlock(BB);		MadeChange \|= handleEndBlock(BB);

▲ Show 20 Lines • Show All 210 Lines • ▼ Show 20 Lines	for (BasicBlock::iterator BBI = BB.end(); BBI != BB.begin(); ){
if (isa<AllocaInst>(BBI)) {		if (isa<AllocaInst>(BBI)) {
// Remove allocas from the list of dead stack objects; there can't be		// Remove allocas from the list of dead stack objects; there can't be
// any references before the definition.		// any references before the definition.
DeadStackObjects.remove(&*BBI);		DeadStackObjects.remove(&*BBI);
continue;		continue;
}		}

if (auto CS = CallSite(&*BBI)) {		if (auto CS = CallSite(&*BBI)) {
// Remove allocation function calls from the list of dead stack objects;		// Remove allocation function calls from the list of dead stack objects;
// there can't be any references before the definition.		// there can't be any references before the definition.
if (isAllocLikeFn(&*BBI, TLI))		if (isAllocLikeFn(&*BBI, TLI))
DeadStackObjects.remove(&*BBI);		DeadStackObjects.remove(&*BBI);

// If this call does not access memory, it can't be loading any of our		// If this call does not access memory, it can't be loading any of our
// pointers.		// pointers.
if (AA->doesNotAccessMemory(CS))		if (AA->doesNotAccessMemory(CS))
continue;		continue;
▲ Show 20 Lines • Show All 77 Lines • Show Last 20 Lines