This is an archive of the discontinued LLVM Phabricator instance.

[DSE] Implement dead store elimination using MemorySSA (disabled by default).
AbandonedPublic

Authored by mcrosier on Jul 20 2016, 7:00 AM.

Download Raw Diff

Details

Reviewers

gberry
• dberlin

Summary

This patch implements dead store elimination using the MemorySSA framework. The implementation is intentionally written to be very similar to the non-MemorySSA algorithm and by no means does it exploit the full capabilities of MemorySSA. My rational is that I wanted to keep it simple for myself and the reviewer, since I'm very new to MemorySSA. I was also hoping this would allow an apples to apples comparison between the two implementations (mostly in terms of compile-time).

My longer-term goal is to implement global DSE using MemorySSA. A version of non-local DSE was attempted in the past (D13363), but was reverted due to compile-time regressions.

Please take a look,
Chad

Diff Detail

Event Timeline

mcrosier updated this revision to Diff 64680.Jul 20 2016, 7:00 AM

mcrosier retitled this revision from to [DSE] Implement dead store elimination using MemorySSA (disabled by default)..

mcrosier updated this object.

mcrosier added reviewers: gberry, • dberlin.

mcrosier updated this object.

mcrosier added a subscriber: llvm-commits.

junbuml added a subscriber: junbuml.Jul 20 2016, 7:12 AM

Thanks for doing this!

Just as a meta-comment:

One of the advantages of MemorySSA is that it allows sparse analysis. Now, for stores, because it's SSA and not SSI, you do end up having to worklist uses more often (for loads, algorithms are as easy as scalar SSA algorithms).

If you use MemorySSA as a faster memdep, and try to write iterative algorithms, it may or may not be faster, because it's really built to be able to build ssa-like algorithms, and some of the "faster memdep" use cases don't overlap with this well.

That said, I understand you are trying to port this incrementally, so i restricted major algorithmic comments to where there are easy and significantly better ways to do a thing.

lib/Transforms/Scalar/DeadStoreElimination.cpp
1114	Interesting. MemorySSA looks at this the other way around It guarantees that if you are loading a pointer you just stored from, MemoryUse->getDefiningAccess() will be the store you just loaded from.. (That is, it guarantees that load->getDefiningAccess() is the nearest dominating thing that actually aliases with the load) There is one edge case where it would be a MemoryPhi that you could eliminate, but it's a real nonlocal edge case - when the MemoryPhi's operands are all really the same thing if (a) 1 = MemoryDef(liveOnEntry) store B, 5 else 2 = MemoryDef(liveOnEntry) store B, 5 3 = MemoryPhi(1, 2) MemoryUse(3) load B Not sure this case is worth optimizing. NewGVN acutally handles this already and would just replace the load uses with "5".
1117	Because of the above guarantee, the memory state is guaranteed to be the same as this store if and only if getMemoryAccess(DepLoad)->getDefiningAccess() == getMemoryAccess(SI). This is a constant time check instead of memoryIsNotModifiedBetween (which also can be done faster with MemorySSA, but ...)
1127	Doesn't GVN do this already?
1134	This whole block could be // For stores, getClobberingMemoryAccess will guarantee you get the nearest dominating def that actually aliases the store. It is not needed for loads (see the comments at the top of MemorySSA.h for why this is) MemoryAccess MA = MSSA->getClobberingMemoryAccess(SI); if (MemoryDef MD = dyn_cast<MemoryDef>(MA)) { if (isCallocLikeFn(MD->getMemoryInst()) && <pointers are the same>) .... This is going to be a lot faster than the memoryisnotmodifiedcall.
1156	I'm a bit unclear what this is trying to do, so i can't definitely give advice, only guess. If you are trying to see if the only use of a store is a call to free, the following applies: Unlike loads, store uses/defs are not guaranteed to may-alias the store (doing so requires allowing multiple phi nodes in memoryssa). If they were, this would simply be "Process instructions in reverse order, worklist the uses of the store, if all of them are in calls to free, eliminate the store". However, because of this, you want to do: Process instructions in reverse order: <worklist the initial uses of the MemoryDef for the store> while(worklist not empty) Pull use off worklist. If use is free call, ignore it. If use is below free call (IE MSSA->dominates(memoryaccess for free, memoryaccess for use)), ignore it. (there are partially dead cases this will ignore, but let's ignore that for now) Otherwise: If use is a MemoryUse, you cannot remove the store (we already eliminated all uses after the free call above, and your other so this must be between the store and the free call. Both your original code and this code presume that you have already eliminated loads of just stored pointers, etc, so they don't get in the way ). If use is a MemoryDef, and getClobberingMemoryAccess(use) == original store you have store over store to the same data (IE partial or full overwrite). Not sure what you want to do here. If use is a MemoryDef, and getClobberingMemoryAccess(use) != original store, put the uses of this MemoryDef on worklist and continue. If use is a MemoryPhi, put uses of this MemoryPhi on worklist and continue. This seems complicated, but is still going to be faster than what you have :)
1242	Errr, If it's a memoryDef, it must write to memory or otherwise affect memory ;)
1257	FWIW: You don't need this loop at all. Much like in the rewritten merged load store motion, you can figure out what it could possibly be eliminated in favor in by hashing, and then use local ordering to tell which is earlier/later. This is true even where they are partial overwrites (that just gets accounted for in the hash). Whether "memory is modified" or (you have a use in the way )is entirely contained in the def/use chains of memoryssa (and MSSA->dominates), you don't need to look at the inst stream. Even if you don't take the hashing approach, i would approach this sparsely, by starting at the top of the block, looking at each MemoryAccess, and doing something depending on the uses/defs it has. Then move to the next MemoryAccess. You should not have to visit everything more than once, even if you do it this way. That said, i understand you are trying to incrementally port, so i won't force you to do this. Just sayin, if you want it to be faster, ...

• dberlin added inline comments.Jul 20 2016, 8:40 AM

lib/Transforms/Scalar/DeadStoreElimination.cpp

1156

(Note that the above starts from the store, instead of starting from the free, and that worklist must be a queue to get ordering right)

You can easily extend the above to be global, as well,in the sense that it "does not care where the free call is" - in this block or not.

<worklist the initial uses of the MemoryDef for the store>
while(worklist not empty) 
  Pull use off front of worklist.
  If use is free call, put it on list of free calls.  
  If use is dominated by *any* free call in the list, ignore it (because it means we will hit a free before we hit that use, and thus the use is use-after-free)
  (there are partially dead cases this will ignore, but let's ignore that for now)
  Otherwise:
   If use is a MemoryUse, you cannot remove the store (we already eliminated all uses after the free call above, and your other so this must be between the store and the free call. Both your original code and this code presume that you have already eliminated loads of just stored pointers, etc, so they don't get in the way ).
   If use is a MemoryDef, and getClobberingMemoryAccess(use) == original store you have store over store to the same data (IE partial or full overwrite). Not sure what you want to do here.
   If use is a MemoryDef, and getClobberingMemoryAccess(use) != original store, put the uses of this MemoryDef on worklist and continue.
   If use is a MemoryPhi, put uses of this MemoryPhi on worklist and continue.

The above will handle:

store A

if (B)
  free A
else
  free A

Because it's a queue, and defs dominate uses, you are guaranteed it will process it in the right order. (IE even if you add a use(A) at the end it will get the right answer)

As an optimization, you can drop free calls from the list of things to check in some cases. Probably not worth doing since normal code will likely have 1 or 2 frees to track.

george.burgess.iv added a subscriber: george.burgess.iv.Jul 21 2016, 11:46 AM

Abandoning this for now.. Regardless, thanks for the feedback, Danny. Maybe I can convince Geoff to revive this someday.

davide added a subscriber: davide.Dec 26 2016, 7:37 AM

jfb added a subscriber: jfb.Jan 30 2019, 1:52 PM

Herald added a subscriber: Prazek. · View Herald TranscriptJan 30 2019, 1:52 PM

Revision Contents

Path

Size

lib/

Transforms/

Scalar/

DeadStoreElimination.cpp

403 lines

test/

Transforms/

DeadStoreElimination/

simple.ll

1 line

Diff 64680

lib/Transforms/Scalar/DeadStoreElimination.cpp

Show All 34 Lines
#include "llvm/IR/Instructions.h"		#include "llvm/IR/Instructions.h"
#include "llvm/IR/IntrinsicInst.h"		#include "llvm/IR/IntrinsicInst.h"
#include "llvm/Pass.h"		#include "llvm/Pass.h"
#include "llvm/Support/CommandLine.h"		#include "llvm/Support/CommandLine.h"
#include "llvm/Support/Debug.h"		#include "llvm/Support/Debug.h"
#include "llvm/Support/raw_ostream.h"		#include "llvm/Support/raw_ostream.h"
#include "llvm/Transforms/Scalar.h"		#include "llvm/Transforms/Scalar.h"
#include "llvm/Transforms/Utils/Local.h"		#include "llvm/Transforms/Utils/Local.h"
		#include "llvm/Transforms/Utils/MemorySSA.h"
#include <map>		#include <map>
using namespace llvm;		using namespace llvm;

#define DEBUG_TYPE "dse"		#define DEBUG_TYPE "dse"

STATISTIC(NumRedundantStores, "Number of redundant stores deleted");		STATISTIC(NumRedundantStores, "Number of redundant stores deleted");
STATISTIC(NumFastStores, "Number of stores deleted");		STATISTIC(NumFastStores, "Number of stores deleted");
STATISTIC(NumFastOther , "Number of other instrs removed");		STATISTIC(NumFastOther , "Number of other instrs removed");
STATISTIC(NumCompletePartials, "Number of stores dead by later partials");		STATISTIC(NumCompletePartials, "Number of stores dead by later partials");

static cl::opt<bool>		static cl::opt<bool>
EnablePartialOverwriteTracking("enable-dse-partial-overwrite-tracking",		EnablePartialOverwriteTracking("enable-dse-partial-overwrite-tracking",
cl::init(true), cl::Hidden,		cl::init(true), cl::Hidden,
cl::desc("Enable partial-overwrite tracking in DSE"));		cl::desc("Enable partial-overwrite tracking in DSE"));

		static cl::opt<bool>
		UseMemorySSA("use-memoryssa-dse", cl::Hidden, cl::init(false),
		cl::desc("Use MemorySSA for DeadStoreElimination"));

//===----------------------------------------------------------------------===//		//===----------------------------------------------------------------------===//
// Helper functions		// Helper functions
//===----------------------------------------------------------------------===//		//===----------------------------------------------------------------------===//

/// Delete this instruction. Before we do, go through and zero out all the		/// Delete this instruction. Before we do, go through and zero out all the
/// operands of this instruction. If any of them become dead, delete them and		/// operands of this instruction. If any of them become dead, delete them and
/// the computation tree that feeds them.		/// the computation tree that feeds them.
▲ Show 20 Lines • Show All 599 Lines • ▼ Show 20 Lines
}		}

/// Remove dead stores to stack-allocated locations in the function end block.		/// Remove dead stores to stack-allocated locations in the function end block.
/// Ex:		/// Ex:
/// %A = alloca i32		/// %A = alloca i32
/// ...		/// ...
/// store i32 1, i32* %A		/// store i32 1, i32* %A
/// ret void		/// ret void
static bool handleEndBlock(BasicBlock &BB, AliasAnalysis *AA,		static bool
MemoryDependenceResults *MD,		handleEndBlock(BasicBlock &BB, AliasAnalysis AA, MemoryDependenceResults MD,
const TargetLibraryInfo *TLI) {		const TargetLibraryInfo *TLI,
		SmallPtrSetImpl<Instruction > AccessesToDelete = nullptr) {
bool MadeChange = false;		bool MadeChange = false;

// Keep track of all of the stack objects that are dead at the end of the		// Keep track of all of the stack objects that are dead at the end of the
// function.		// function.
SmallSetVector<Value*, 16> DeadStackObjects;		SmallSetVector<Value*, 16> DeadStackObjects;

// Find all of the alloca'd pointers in the entry block.		// Find all of the alloca'd pointers in the entry block.
BasicBlock &Entry = BB.getParent()->front();		BasicBlock &Entry = BB.getParent()->front();
▲ Show 20 Lines • Show All 42 Lines • ▼ Show 20 Lines	if (hasMemoryWrite(&BBI, TLI) && isRemovable(&*BBI)) {
E = Pointers.end(); I != E; ++I) {		E = Pointers.end(); I != E; ++I) {
dbgs() << **I;		dbgs() << **I;
if (std::next(I) != E)		if (std::next(I) != E)
dbgs() << ", ";		dbgs() << ", ";
}		}
dbgs() << '\n');		dbgs() << '\n');

// DCE instructions only used to calculate that store.		// DCE instructions only used to calculate that store.
		if (UseMemorySSA) {
		AccessesToDelete->insert(Dead);
		DeadStackObjects.remove(Dead);
		} else
deleteDeadInstruction(Dead, &BBI, MD, TLI, &DeadStackObjects);		deleteDeadInstruction(Dead, &BBI, MD, TLI, &DeadStackObjects);
++NumFastStores;		++NumFastStores;
MadeChange = true;		MadeChange = true;
continue;		continue;
}		}
}		}

// Remove any dead non-memory-mutating instructions.		// Remove any dead non-memory-mutating instructions.
if (isInstructionTriviallyDead(&*BBI, TLI)) {		if (isInstructionTriviallyDead(&*BBI, TLI)) {
DEBUG(dbgs() << "DSE: Removing trivially dead instruction:\n DEAD: "		DEBUG(dbgs() << "DSE: Removing trivially dead instruction:\n DEAD: "
<< &BBI << '\n');		<< &BBI << '\n');
		if (UseMemorySSA) {
		AccessesToDelete->insert(&*BBI);
		DeadStackObjects.remove(&*BBI);
		} else
deleteDeadInstruction(&BBI, &BBI, MD, *TLI, &DeadStackObjects);		deleteDeadInstruction(&BBI, &BBI, MD, *TLI, &DeadStackObjects);
++NumFastOther;		++NumFastOther;
MadeChange = true;		MadeChange = true;
continue;		continue;
}		}

if (isa<AllocaInst>(BBI)) {		if (isa<AllocaInst>(BBI)) {
// Remove allocas from the list of dead stack objects; there can't be		// Remove allocas from the list of dead stack objects; there can't be
// any references before the definition.		// any references before the definition.
▲ Show 20 Lines • Show All 279 Lines • ▼ Show 20 Lines	for (BasicBlock &BB : F)
// Only check non-dead blocks. Dead blocks may have strange pointer		// Only check non-dead blocks. Dead blocks may have strange pointer
// cycles that will confuse alias analysis.		// cycles that will confuse alias analysis.
if (DT->isReachableFromEntry(&BB))		if (DT->isReachableFromEntry(&BB))
MadeChange \|= eliminateDeadStores(BB, AA, MD, DT, TLI);		MadeChange \|= eliminateDeadStores(BB, AA, MD, DT, TLI);
return MadeChange;		return MadeChange;
}		}

//===----------------------------------------------------------------------===//		//===----------------------------------------------------------------------===//
		// MemorySSA implementation below.
		//
		// TODO: Delete instructions without invalidating the iterator.
		// TODO: Optimize dead stores in dominating blocks in handleFreeMSSA.
		//===----------------------------------------------------------------------===//

		static MemoryLocation getLocForReadMSSA(Instruction *Inst) {
		if (auto *LI = dyn_cast<LoadInst>(Inst))
		return MemoryLocation::get(LI);
		if (auto *MTI = dyn_cast<MemTransferInst>(Inst))
		return MemoryLocation::getForSource(MTI);

		// FIXME: Handle more instructions (e.g., vaarg, atomics).
		return MemoryLocation();
		}

		/// Delete this instruction. Before we do, go through and zero out all the
		/// operands of this instruction. If any of them become dead, delete them and
		/// the computation tree that feeds them.
		static void deleteDeadInstructionMSSA(Instruction I, MemorySSA MSSA,
		const TargetLibraryInfo *TLI) {
		SmallVector<Instruction *, 32> NowDeadInsts;

		NowDeadInsts.push_back(I);
		--NumFastOther;

		do {
		Instruction *DeadInst = NowDeadInsts.pop_back_val();
		++NumFastOther;

		// Before we touch this instruction, remove it from MSSA!
		if (MemoryAccess *MA = MSSA->getMemoryAccess(DeadInst))
		MSSA->removeMemoryAccess(MA);

		for (unsigned op = 0, e = DeadInst->getNumOperands(); op != e; ++op) {
		Value *Op = DeadInst->getOperand(op);
		DeadInst->setOperand(op, nullptr);

		// If this operand just became dead, add it to the NowDeadInsts list.
		if (!Op->use_empty())
		continue;

		if (Instruction *OpI = dyn_cast<Instruction>(Op))
		if (isInstructionTriviallyDead(OpI, TLI))
		NowDeadInsts.push_back(OpI);
		}

		DeadInst->eraseFromParent();
		} while (!NowDeadInsts.empty());
		}

		static bool eliminateNoopStoreMSSA(Instruction Inst, AliasAnalysis AA,
		const DataLayout &DL,
		const TargetLibraryInfo *TLI) {
		// Must be a store instruction.
		StoreInst *SI = dyn_cast<StoreInst>(Inst);
		if (!SI)
		return false;

		// If we're storing the same value back to a pointer that we just loaded from,
		// then the store can be removed.
		dberlinUnsubmitted Not Done Reply Inline Actions Interesting. MemorySSA looks at this the other way around It guarantees that if you are loading a pointer you just stored from, MemoryUse->getDefiningAccess() will be the store you just loaded from.. (That is, it guarantees that load->getDefiningAccess() is the nearest dominating thing that actually aliases with the load) There is one edge case where it would be a MemoryPhi that you could eliminate, but it's a real nonlocal edge case - when the MemoryPhi's operands are all really the same thing if (a) 1 = MemoryDef(liveOnEntry) store B, 5 else 2 = MemoryDef(liveOnEntry) store B, 5 3 = MemoryPhi(1, 2) MemoryUse(3) load B Not sure this case is worth optimizing. NewGVN acutally handles this already and would just replace the load uses with "5". dberlin: Interesting. MemorySSA looks at this the other way around It guarantees that if you are…
		if (LoadInst *DepLoad = dyn_cast<LoadInst>(SI->getValueOperand())) {
		if (SI->getPointerOperand() == DepLoad->getPointerOperand() &&
		isRemovable(SI) && memoryIsNotModifiedBetween(DepLoad, SI, AA)) {
		dberlinUnsubmitted Not Done Reply Inline Actions Because of the above guarantee, the memory state is guaranteed to be the same as this store if and only if getMemoryAccess(DepLoad)->getDefiningAccess() == getMemoryAccess(SI). This is a constant time check instead of memoryIsNotModifiedBetween (which also can be done faster with MemorySSA, but ...) dberlin: Because of the above guarantee, the memory state is guaranteed to be the same as this store if…

		DEBUG(dbgs() << "DSE: Remove Store Of Load from same pointer:\n LOAD: "
		<< DepLoad << "\n STORE: " << SI << '\n');

		++NumRedundantStores;
		return true;
		}
		}

		// Remove null stores into the calloc'ed objects
		dberlinUnsubmitted Not Done Reply Inline Actions Doesn't GVN do this already? dberlin: Doesn't GVN do this already?
		Constant *StoredConstant = dyn_cast<Constant>(SI->getValueOperand());
		if (StoredConstant && StoredConstant->isNullValue() && isRemovable(SI)) {
		Instruction *UnderlyingPointer =
		dyn_cast<Instruction>(GetUnderlyingObject(SI->getPointerOperand(), DL));

		if (UnderlyingPointer && isCallocLikeFn(UnderlyingPointer, TLI) &&
		memoryIsNotModifiedBetween(UnderlyingPointer, SI, AA)) {
		dberlinUnsubmitted Not Done Reply Inline Actions This whole block could be // For stores, getClobberingMemoryAccess will guarantee you get the nearest dominating def that actually aliases the store. It is not needed for loads (see the comments at the top of MemorySSA.h for why this is) MemoryAccess MA = MSSA->getClobberingMemoryAccess(SI); if (MemoryDef MD = dyn_cast<MemoryDef>(MA)) { if (isCallocLikeFn(MD->getMemoryInst()) && <pointers are the same>) .... This is going to be a lot faster than the memoryisnotmodifiedcall. dberlin: This whole block could be // For stores, getClobberingMemoryAccess will guarantee you get…
		DEBUG(
		dbgs() << "DSE: Remove null store to the calloc'ed object:\n DEAD: "
		<< Inst << "\n OBJECT: " << UnderlyingPointer << '\n');

		++NumRedundantStores;
		return true;
		}
		}
		return false;
		}

		/// Handle frees of entire structures whose dependency is a store
		/// to a field of that structure.
		static bool handleFreeMSSA(CallInst F, AliasAnalysis AA, MemorySSA *MSSA,
		const TargetLibraryInfo *TLI,
		SmallPtrSetImpl<Instruction > AccessesToDelete) {
		bool MadeChange = false;
		const DataLayout &DL = F->getModule()->getDataLayout();

		MemoryAccess *FreeMA = MSSA->getMemoryAccess(F);

		// Iterate over all local memory accesses.
		dberlinUnsubmitted Not Done Reply Inline Actions I'm a bit unclear what this is trying to do, so i can't definitely give advice, only guess. If you are trying to see if the only use of a store is a call to free, the following applies: Unlike loads, store uses/defs are not guaranteed to may-alias the store (doing so requires allowing multiple phi nodes in memoryssa). If they were, this would simply be "Process instructions in reverse order, worklist the uses of the store, if all of them are in calls to free, eliminate the store". However, because of this, you want to do: Process instructions in reverse order: <worklist the initial uses of the MemoryDef for the store> while(worklist not empty) Pull use off worklist. If use is free call, ignore it. If use is below free call (IE MSSA->dominates(memoryaccess for free, memoryaccess for use)), ignore it. (there are partially dead cases this will ignore, but let's ignore that for now) Otherwise: If use is a MemoryUse, you cannot remove the store (we already eliminated all uses after the free call above, and your other so this must be between the store and the free call. Both your original code and this code presume that you have already eliminated loads of just stored pointers, etc, so they don't get in the way ). If use is a MemoryDef, and getClobberingMemoryAccess(use) == original store you have store over store to the same data (IE partial or full overwrite). Not sure what you want to do here. If use is a MemoryDef, and getClobberingMemoryAccess(use) != original store, put the uses of this MemoryDef on worklist and continue. If use is a MemoryPhi, put uses of this MemoryPhi on worklist and continue. This seems complicated, but is still going to be faster than what you have :) dberlin: I'm a bit unclear what this is trying to do, so i can't definitely give advice, only guess.
		dberlinUnsubmitted Not Done Reply Inline Actions (Note that the above starts from the store, instead of starting from the free, and that worklist must be a queue to get ordering right) You can easily extend the above to be global, as well,in the sense that it "does not care where the free call is" - in this block or not. <worklist the initial uses of the MemoryDef for the store> while(worklist not empty) Pull use off front of worklist. If use is free call, put it on list of free calls. If use is dominated by any free call in the list, ignore it (because it means we will hit a free before we hit that use, and thus the use is use-after-free) (there are partially dead cases this will ignore, but let's ignore that for now) Otherwise: If use is a MemoryUse, you cannot remove the store (we already eliminated all uses after the free call above, and your other so this must be between the store and the free call. Both your original code and this code presume that you have already eliminated loads of just stored pointers, etc, so they don't get in the way ). If use is a MemoryDef, and getClobberingMemoryAccess(use) == original store you have store over store to the same data (IE partial or full overwrite). Not sure what you want to do here. If use is a MemoryDef, and getClobberingMemoryAccess(use) != original store, put the uses of this MemoryDef on worklist and continue. If use is a MemoryPhi, put uses of this MemoryPhi on worklist and continue. The above will handle: IE store A if (B) free A else free A Because it's a queue, and defs dominate uses, you are guaranteed it will process it in the right order. (IE even if you add a use(A) at the end it will get the right answer) As an optimization, you can drop free calls from the list of things to check in some cases. Probably not worth doing since normal code will likely have 1 or 2 frees to track. dberlin: (Note that the above starts from the store, instead of starting from the free, and that…
		auto *BlockAccesses = MSSA->getBlockAccesses(F->getParent());
		assert(BlockAccesses && "Expected memory accesses in block.");

		MemorySSA::AccessList::const_reverse_iterator I(FreeMA->getIterator());
		for (auto E = BlockAccesses->rend(); I != E; I++) {
		// If we have a use that may-aliases we must give up.
		if (const MemoryUse MU = dyn_cast<MemoryUse>(&I)) {
		Instruction *DepRead = MU->getMemoryInst();
		assert(DepRead && "Expected an associated inst with memory use.");

		MemoryLocation ReadLoc = getLocForReadMSSA(DepRead);
		if (!ReadLoc.Ptr \|\|
		!AA->isNoAlias(MemoryLocation(F->getArgOperand(0)), ReadLoc))
		break;
		continue;
		}

		// Check to see if this def clobbers FirstWrite.
		if (const MemoryDef MD2 = dyn_cast<MemoryDef>(&I)) {
		Instruction *DepWrite = MD2->getMemoryInst();
		if (!DepWrite)
		break;

		if (!hasMemoryWrite(DepWrite, *TLI) \|\| !isRemovable(DepWrite))
		break;

		Value *DepPointer =
		GetUnderlyingObject(getStoredPointerOperand(DepWrite), DL);

		if (!AA->isMustAlias(F->getArgOperand(0), DepPointer))
		break;

		DEBUG(dbgs() << "DSE: Dead Store to soon to be freed memory:\n DEAD: "
		<< *DepWrite << '\n');

		AccessesToDelete->insert(DepWrite);
		++NumFastStores;
		MadeChange = true;

		// Inst's old Dependency is now deleted. Compute the next dependency,
		// which may also be dead, as in
		// s[0] = 0;
		// s[1] = 0; // This has just been deleted.
		// free(s);
		continue;
		}
		}
		// FIXME: Look at unconditional predecessors.

		return MadeChange;
		}

		static bool eliminateDeadStores(BasicBlock BB, AliasAnalysis AA,
		MemorySSA MSSA, DominatorTree DT,
		const TargetLibraryInfo *TLI) {
		bool MadeChange = false;

		// A map of interval maps representing partially-overwritten value parts.
		InstOverlapIntervalsTy IOL;

		const MemorySSA::AccessList *BlockAccesses = MSSA->getBlockAccesses(BB);
		if (!BlockAccesses)
		return false;

		SmallPtrSet<Instruction *, 8> AccessesToDelete;
		const DataLayout &DL = BB->getModule()->getDataLayout();

		// Walk accesses in the block.
		MemorySSA::AccessList::const_iterator I = BlockAccesses->begin(),
		E = BlockAccesses->end();
		for (; I != E; ++I) {
		const MemoryDef MD = dyn_cast<MemoryDef>(&I);
		if (!MD)
		continue;

		// Handle 'free' calls specially.
		if (Instruction *FreeInst = MD->getMemoryInst()) {
		if (CallInst *F = isFreeCall(FreeInst, TLI)) {
		MadeChange \|= handleFreeMSSA(F, AA, MSSA, TLI, &AccessesToDelete);
		continue;
		}
		}

		// Check to see if Inst writes to memory. If not, continue.
		Instruction *Inst = MD->getMemoryInst();
		if (!Inst \|\| !hasMemoryWrite(Inst, *TLI))
		dberlinUnsubmitted Not Done Reply Inline Actions Errr, If it's a memoryDef, it must write to memory or otherwise affect memory ;) dberlin: Errr, If it's a memoryDef, it must write to memory or otherwise affect memory ;)
		continue;

		if (eliminateNoopStoreMSSA(Inst, AA, DL, TLI)) {
		MadeChange = true;
		AccessesToDelete.insert(Inst);
		continue;
		}

		// Figure out what location is being stored to. If we don't get a useful
		// location, fail.
		MemoryLocation Loc = getLocForWrite(Inst, *AA);
		if (!Loc.Ptr)
		continue;

		// FIXME: Non-local DSE would be fun. :)
		dberlinUnsubmitted Not Done Reply Inline Actions FWIW: You don't need this loop at all. Much like in the rewritten merged load store motion, you can figure out what it could possibly be eliminated in favor in by hashing, and then use local ordering to tell which is earlier/later. This is true even where they are partial overwrites (that just gets accounted for in the hash). Whether "memory is modified" or (you have a use in the way )is entirely contained in the def/use chains of memoryssa (and MSSA->dominates), you don't need to look at the inst stream. Even if you don't take the hashing approach, i would approach this sparsely, by starting at the top of the block, looking at each MemoryAccess, and doing something depending on the uses/defs it has. Then move to the next MemoryAccess. You should not have to visit everything more than once, even if you do it this way. That said, i understand you are trying to incrementally port, so i won't force you to do this. Just sayin, if you want it to be faster, ... dberlin: FWIW: You don't need this loop at all. Much like in the rewritten merged load store motion…
		MemorySSA::AccessList::const_reverse_iterator RI(I->getIterator());
		for (auto RE = BlockAccesses->rend(); RI != RE; ++RI) {
		// Get the memory clobbered by the instruction we depend on. If we end up
		// depending on a may- or must-aliased load, then we can't optimize away
		// the store and we bail out. However, if we depend on something that
		// overwrites the memory location we can potentially optimize it.
		// Quit when we hit the block's phi node.
		if (isa<MemoryPhi>(*RI))
		break;

		// If we have a use that may-aliases with Inst we must give up.
		if (const MemoryUse MU = dyn_cast<MemoryUse>(&RI)) {
		Instruction *DepRead = MU->getMemoryInst();
		assert(DepRead && "Expected an associated inst with memory use.");

		MemoryLocation ReadLoc = getLocForReadMSSA(DepRead);
		if (!ReadLoc.Ptr \|\| !AA->isNoAlias(Loc, ReadLoc))
		break;

		continue;
		}

		// Find out what memory location the dependent instruction stores.
		const MemoryDef InstDep = cast<MemoryDef>(&RI);
		Instruction *DepWrite = InstDep->getMemoryInst();
		if (!DepWrite)
		break;

		// If we find a write that is a) removable (i.e., non-volatile), b) is
		// completely obliterated by the store to 'Loc', and c) which we know that
		// 'Inst' doesn't load from, then we can remove it.
		MemoryLocation DepLoc = getLocForWrite(DepWrite, *AA);
		if (DepLoc.Ptr && isRemovable(DepWrite) &&
		!isPossibleSelfRead(Inst, Loc, DepWrite, TLI, AA)) {
		int64_t InstWriteOffset, DepWriteOffset;
		OverwriteResult OR = isOverwrite(Loc, DepLoc, DL, *TLI, DepWriteOffset,
		InstWriteOffset, DepWrite, IOL);
		if (OR == OverwriteComplete) {
		DEBUG(dbgs() << "DSE: Remove Dead Store:\n DEAD: " << *DepWrite
		<< "\n KILLER: " << *Inst << '\n');

		// Delete the store and now-dead instructions that feed it.
		AccessesToDelete.insert(DepWrite);
		++NumFastStores;
		MadeChange = true;

		// We erased DepWrite; start over.
		continue;
		} else if ((OR == OverwriteEnd && isShortenableAtTheEnd(DepWrite)) \|\|
		((OR == OverwriteBegin &&
		isShortenableAtTheBeginning(DepWrite)))) {
		// TODO: base this on the target vector size so that if the earlier
		// store was too small to get vector writes anyway then its likely
		// a good idea to shorten it
		// Power of 2 vector writes are probably always a bad idea to optimize
		// as any store/memset/memcpy is likely using vector instructions so
		// shortening it to not vector size is likely to be slower
		MemIntrinsic *DepIntrinsic = cast<MemIntrinsic>(DepWrite);
		unsigned DepWriteAlign = DepIntrinsic->getAlignment();
		bool IsOverwriteEnd = (OR == OverwriteEnd);
		if (!IsOverwriteEnd)
		InstWriteOffset = int64_t(InstWriteOffset + Loc.Size);

		if ((llvm::isPowerOf2_64(InstWriteOffset) &&
		DepWriteAlign <= InstWriteOffset) \|\|
		((DepWriteAlign != 0) && InstWriteOffset % DepWriteAlign == 0)) {

		DEBUG(dbgs() << "DSE: Remove Dead Store:\n OW "
		<< (IsOverwriteEnd ? "END" : "BEGIN") << ": "
		<< *DepWrite << "\n KILLER (offset "
		<< InstWriteOffset << ", " << DepLoc.Size << ")"
		<< *Inst << '\n');

		int64_t NewLength =
		IsOverwriteEnd
		? InstWriteOffset - DepWriteOffset
		: DepLoc.Size - (InstWriteOffset - DepWriteOffset);

		Value *DepWriteLength = DepIntrinsic->getLength();
		Value *TrimmedLength =
		ConstantInt::get(DepWriteLength->getType(), NewLength);
		DepIntrinsic->setLength(TrimmedLength);

		if (!IsOverwriteEnd) {
		int64_t OffsetMoved = (InstWriteOffset - DepWriteOffset);
		Value *Indices[1] = {
		ConstantInt::get(DepWriteLength->getType(), OffsetMoved)};
		GetElementPtrInst *NewDestGEP = GetElementPtrInst::CreateInBounds(
		DepIntrinsic->getRawDest(), Indices, "", DepWrite);
		DepIntrinsic->setDest(NewDestGEP);
		}
		MadeChange = true;
		}
		}
		}

		// If this is a may-aliased store that is clobbering the store value, we
		// can keep searching past it for another must-aliased pointer that stores
		// to the same location. For example, in:
		// store -> P
		// store -> Q
		// store -> P
		// we can remove the first store to P even though we don't know if P and Q
		// alias.

		// Can't look past this instruction if it might read 'Loc'.
		if (AA->getModRefInfo(DepWrite, Loc) & MRI_Ref)
		break;
		}
		}

		// If this block ends in a return, unwind, or unreachable, all allocas are
		// dead at its end, which means stores to them are also dead.
		if (BB->getTerminator()->getNumSuccessors() == 0)
		MadeChange \|= handleEndBlock(*BB, AA, nullptr, TLI, &AccessesToDelete);

		for (auto DeadInst : AccessesToDelete)
		deleteDeadInstructionMSSA(DeadInst, MSSA, TLI);

		return MadeChange;
		}

		static bool eliminateDeadStores(Function &F, AliasAnalysis AA, MemorySSA MSSA,
		DominatorTree *DT,
		const TargetLibraryInfo *TLI) {
		bool MadeChange = false;
		for (BasicBlock &BB : F)
		// Only check non-dead blocks. Dead blocks may have strange pointer
		// cycles that will confuse alias analysis.
		if (DT->isReachableFromEntry(&BB))
		MadeChange \|= eliminateDeadStores(&BB, AA, MSSA, DT, TLI);

		return MadeChange;
		}

		//===----------------------------------------------------------------------===//
		// End of MemorySSA implementation.
		//===----------------------------------------------------------------------===//

		//===----------------------------------------------------------------------===//
// DSE Pass		// DSE Pass
//===----------------------------------------------------------------------===//		//===----------------------------------------------------------------------===//
PreservedAnalyses DSEPass::run(Function &F, FunctionAnalysisManager &AM) {		PreservedAnalyses DSEPass::run(Function &F, FunctionAnalysisManager &AM) {
AliasAnalysis *AA = &AM.getResult<AAManager>(F);		AliasAnalysis *AA = &AM.getResult<AAManager>(F);
DominatorTree *DT = &AM.getResult<DominatorTreeAnalysis>(F);		DominatorTree *DT = &AM.getResult<DominatorTreeAnalysis>(F);
MemoryDependenceResults *MD = &AM.getResult<MemoryDependenceAnalysis>(F);
const TargetLibraryInfo *TLI = &AM.getResult<TargetLibraryAnalysis>(F);		const TargetLibraryInfo *TLI = &AM.getResult<TargetLibraryAnalysis>(F);

		if (UseMemorySSA) {
		MemorySSA *MSSA = &AM.getResult<MemorySSAAnalysis>(F);
		if (!eliminateDeadStores(F, AA, MSSA, DT, TLI))
		return PreservedAnalyses::all();
		} else {
		MemoryDependenceResults *MD = &AM.getResult<MemoryDependenceAnalysis>(F);
if (!eliminateDeadStores(F, AA, MD, DT, TLI))		if (!eliminateDeadStores(F, AA, MD, DT, TLI))
return PreservedAnalyses::all();		return PreservedAnalyses::all();
		}

PreservedAnalyses PA;		PreservedAnalyses PA;
PA.preserve<DominatorTreeAnalysis>();		PA.preserve<DominatorTreeAnalysis>();
PA.preserve<GlobalsAA>();		PA.preserve<GlobalsAA>();
PA.preserve<MemoryDependenceAnalysis>();		PA.preserve<MemoryDependenceAnalysis>();
		PA.preserve<MemorySSAAnalysis>();
return PA;		return PA;
}		}

namespace {		namespace {
/// A legacy pass for the legacy pass manager that wraps \c DSEPass.		/// A legacy pass for the legacy pass manager that wraps \c DSEPass.
class DSELegacyPass : public FunctionPass {		class DSELegacyPass : public FunctionPass {
public:		public:
DSELegacyPass() : FunctionPass(ID) {		DSELegacyPass() : FunctionPass(ID) {
initializeDSELegacyPassPass(*PassRegistry::getPassRegistry());		initializeDSELegacyPassPass(*PassRegistry::getPassRegistry());
}		}

bool runOnFunction(Function &F) override {		bool runOnFunction(Function &F) override {
if (skipFunction(F))		if (skipFunction(F))
return false;		return false;

DominatorTree *DT = &getAnalysis<DominatorTreeWrapperPass>().getDomTree();		DominatorTree *DT = &getAnalysis<DominatorTreeWrapperPass>().getDomTree();
AliasAnalysis *AA = &getAnalysis<AAResultsWrapperPass>().getAAResults();		AliasAnalysis *AA = &getAnalysis<AAResultsWrapperPass>().getAAResults();
MemoryDependenceResults *MD =
&getAnalysis<MemoryDependenceWrapperPass>().getMemDep();
const TargetLibraryInfo *TLI =		const TargetLibraryInfo *TLI =
&getAnalysis<TargetLibraryInfoWrapperPass>().getTLI();		&getAnalysis<TargetLibraryInfoWrapperPass>().getTLI();

		if (UseMemorySSA) {
		MemorySSA *MSSA = &getAnalysis<MemorySSAWrapperPass>().getMSSA();
		MSSA->verifyMemorySSA();
		return eliminateDeadStores(F, AA, MSSA, DT, TLI);
		} else {
		MemoryDependenceResults *MD =
		&getAnalysis<MemoryDependenceWrapperPass>().getMemDep();
return eliminateDeadStores(F, AA, MD, DT, TLI);		return eliminateDeadStores(F, AA, MD, DT, TLI);
}		}
		}

void getAnalysisUsage(AnalysisUsage &AU) const override {		void getAnalysisUsage(AnalysisUsage &AU) const override {
AU.setPreservesCFG();		AU.setPreservesCFG();
AU.addRequired<DominatorTreeWrapperPass>();		AU.addRequired<DominatorTreeWrapperPass>();
AU.addRequired<AAResultsWrapperPass>();		AU.addRequired<AAResultsWrapperPass>();
AU.addRequired<MemoryDependenceWrapperPass>();
AU.addRequired<TargetLibraryInfoWrapperPass>();		AU.addRequired<TargetLibraryInfoWrapperPass>();
AU.addPreserved<DominatorTreeWrapperPass>();		AU.addPreserved<DominatorTreeWrapperPass>();
AU.addPreserved<GlobalsAAWrapperPass>();		AU.addPreserved<GlobalsAAWrapperPass>();
		if (UseMemorySSA) {
		AU.addRequired<MemorySSAWrapperPass>();
		AU.addPreserved<MemorySSAWrapperPass>();
		} else {
		AU.addRequired<MemoryDependenceWrapperPass>();
AU.addPreserved<MemoryDependenceWrapperPass>();		AU.addPreserved<MemoryDependenceWrapperPass>();
}		}
		}

static char ID; // Pass identification, replacement for typeid		static char ID; // Pass identification, replacement for typeid
};		};
} // end anonymous namespace		} // end anonymous namespace

char DSELegacyPass::ID = 0;		char DSELegacyPass::ID = 0;
INITIALIZE_PASS_BEGIN(DSELegacyPass, "dse", "Dead Store Elimination", false,		INITIALIZE_PASS_BEGIN(DSELegacyPass, "dse", "Dead Store Elimination", false,
false)		false)
INITIALIZE_PASS_DEPENDENCY(DominatorTreeWrapperPass)		INITIALIZE_PASS_DEPENDENCY(DominatorTreeWrapperPass)
INITIALIZE_PASS_DEPENDENCY(AAResultsWrapperPass)		INITIALIZE_PASS_DEPENDENCY(AAResultsWrapperPass)
INITIALIZE_PASS_DEPENDENCY(GlobalsAAWrapperPass)		INITIALIZE_PASS_DEPENDENCY(GlobalsAAWrapperPass)
		INITIALIZE_PASS_DEPENDENCY(MemorySSAWrapperPass)
INITIALIZE_PASS_DEPENDENCY(MemoryDependenceWrapperPass)		INITIALIZE_PASS_DEPENDENCY(MemoryDependenceWrapperPass)
INITIALIZE_PASS_DEPENDENCY(TargetLibraryInfoWrapperPass)		INITIALIZE_PASS_DEPENDENCY(TargetLibraryInfoWrapperPass)
INITIALIZE_PASS_END(DSELegacyPass, "dse", "Dead Store Elimination", false,		INITIALIZE_PASS_END(DSELegacyPass, "dse", "Dead Store Elimination", false,
false)		false)

FunctionPass *llvm::createDeadStoreEliminationPass() {		FunctionPass *llvm::createDeadStoreEliminationPass() {
return new DSELegacyPass();		return new DSELegacyPass();
}		}

test/Transforms/DeadStoreElimination/simple.ll

	; RUN: opt < %s -basicaa -dse -S \| FileCheck %s			; RUN: opt < %s -basicaa -dse -S \| FileCheck %s
				; RUN: opt < %s -basicaa -memoryssa -dse -use-memoryssa-dse -S \| FileCheck %s
	; RUN: opt < %s -aa-pipeline=basic-aa -passes=dse -S \| FileCheck %s			; RUN: opt < %s -aa-pipeline=basic-aa -passes=dse -S \| FileCheck %s
	target datalayout = "E-p:64:64:64-a0:0:8-f32:32:32-f64:64:64-i1:8:8-i8:8:8-i16:16:16-i32:32:32-i64:32:64-v64:64:64-v128:128:128"			target datalayout = "E-p:64:64:64-a0:0:8-f32:32:32-f64:64:64-i1:8:8-i8:8:8-i16:16:16-i32:32:32-i64:32:64-v64:64:64-v128:128:128"

	declare void @llvm.memset.p0i8.i64(i8* nocapture, i8, i64, i32, i1) nounwind			declare void @llvm.memset.p0i8.i64(i8* nocapture, i8, i64, i32, i1) nounwind
	declare void @llvm.memcpy.p0i8.p0i8.i64(i8* nocapture, i8* nocapture, i64, i32, i1) nounwind			declare void @llvm.memcpy.p0i8.p0i8.i64(i8* nocapture, i8* nocapture, i64, i32, i1) nounwind
	declare void @llvm.init.trampoline(i8, i8, i8*)			declare void @llvm.init.trampoline(i8, i8, i8*)

	define void @test1(i32* %Q, i32* %P) {			define void @test1(i32* %Q, i32* %P) {
	▲ Show 20 Lines • Show All 491 Lines • Show Last 20 Lines

This is an archive of the discontinued LLVM Phabricator instance.

[DSE] Implement dead store elimination using MemorySSA (disabled by default).AbandonedPublic

Details

Diff Detail

Event Timeline

Revision Contents

Diff 64680

lib/Transforms/Scalar/DeadStoreElimination.cpp

test/Transforms/DeadStoreElimination/simple.ll

[DSE] Implement dead store elimination using MemorySSA (disabled by default).
AbandonedPublic