This is an archive of the discontinued LLVM Phabricator instance.

Paths

Table of Contentst

-
llvm/
-
lib/Transforms/Scalar/
-
Transforms/
-
Scalar/
-
DeadStoreElimination.cpp
-
test/Transforms/DeadStoreElimination/
-
Transforms/
-
DeadStoreElimination/
-
merge-stores-big-endian.ll
-
merge-stores.ll
-
multiblock.ll
-
simple.ll

Differential D40480

MemorySSA backed Dead Store Elimination.
Needs ReviewPublic

Authored by rnk on Nov 27 2017, 2:02 AM.

Download Raw Diff

Details

Reviewers

dmgreen

Summary

This is an upgrade of DSE to use MemorySSA, not MemDeps. Which allows in to work across basic blocks in a sparser manner.

Halfway into making this I found D29624 by bryant, which is an attempt at the same thing, so I stole all their good ideas. There is also D29866, which is a PDSE pass by the same author. As far as I understand, that would be superior but harder to write. Unfortunately both seem to have been abandoned.

I believe this version should handle everything that the old memdeps version does (it passes all the tests). This includes complete overwrite (so long as the later store postdoms), noop stores, partial overwrites, stores before frees/lifetime_ends and PartialEarlierWithFullLater.

The only exception that I know of is for the coroutine tests, which rely on removing stores to soon to be freed data, even across function calls that may throw. See test37 in simple.ll and ex3.ll in coroutine tests. It should be possible to get that working but might involve looking through the llvm.coro.begin.

Putting up for early review, I need to do some extra testing/benchmarking/compile time etc. Added subscribers from anyone who looked interested in D29624. This is a fairly big chunk of code, let me know if I can do anything to make it easier to review.

Diff Detail

Build Status

Buildable 22283
Build 22283: arc lint + arc unit

Event Timeline

dmgreen created this revision.Nov 27 2017, 2:02 AM

Just a suggestion: To ease review, could this be split into smaller patches along the lines you mention - noop stores, partial overwrites, stores before frees/lifetime_ends ?

Yep, that may be an option, although the various things it does can be interrelated. There are probably some semi-sensible ways to split this up though. And having the whole thing up can hopefully make things clearer. It may be easier to do in one go, not spend time on multiple reviews, maybe not.

I will leave that as an option for whoever is willing to take a look at the review. Let me know.

Does this support any transforms which aren't supported by the memdep-based DSE?

lib/Transforms/Scalar/DeadStoreElimination.cpp
1425 ↗	(On Diff #124334)	It should be very easy to implement memoryIsNotModifiedBetween using MSSA, I think? But not a big deal to leave it for now.
1469 ↗	(On Diff #124334)	This is a confusing name, given what the function checks.
1651 ↗	(On Diff #124334)	Is it possible for any DSE-related transform to invalidate this? I'm not sure I really understand how you're using MayThrowsPostOrders here... more comments would be helpful.
test/Transforms/DeadStoreElimination/simple.ll
555 ↗	(On Diff #124334)	Yes, you're right; I guess I missed that case when I fixed the other noalias issues in DSE.

• dberlin added inline comments.Nov 29 2017, 1:11 AM

lib/Transforms/Scalar/DeadStoreElimination.cpp
1324 ↗	(On Diff #124334)	Thinking about this not too hard, my initial thought is that this should be unnecessary, IMHO. I'm curious what's happening if it's not. Assuming we did not hit the optimization limits during memory ssa use building, any use points to a store that may-or -must aliases it. (if we hit the opt limits, you probably want to respect them anyway :P). So this check should just be flat out unnecessary when current == original access. If that has a memory use it's not dead on the path to the memoryuse (ignoring PDSE , that means it's not dead overall) When walking from def to def, (IE when current != originalaccess), that leaves two cases: The store it's a use for may-aliases your original access's siloc - your store is not dead the store it's a use for must-aliases your original access's siloc - your stores are the same (and so maybe one is dead) Note that in either case, what the use is for or aliases is irrelevant. The only thing relevant to next access is the store aliasing. The uses force your store alive or not based on what store they are linked to and it's equivalence to your original one. Put another way: I can't see a reason to check check explicitly whether the use aliases you. You only have to check whether use->getDefiningAccess->memloc mayalias siloc. MemorySSA has already checked whether use mayalias def for you. You can cache that, and now your walk is much faster because you only have to know things about the defs and not the uses. Again, i could be completely wrong, but i'd like to understand why :) It's true that aliasing is not transitive in theory, but in llvm, the vast majority of cases where it isn't have metadata attached and you could only check here in those cases if you are worried about catching every case.
1689 ↗	(On Diff #124334)	FWIW: You can do better than this, but it's complicated. We could build a form with factored stores and multiple phis/sigmas if it is really worth it. You also are going to end up rediscovering the same data here again and again (IE for each store, because you are working backwards and they are all linked, you will eventually hit the same use chain you just saw. for a given memloc or anything that is mustalias of that memloc, the answers must be the same) . There are various hashing /pair/etc schemes you can use to cache answers. Not sure it is worth it. Especially vs building a real form for stores.

Hello all. Thanks for taking a look. Much appreciated. The main thing this does over the old version is work across basic block, which is why we here are interested. In something like this:

for (int i = ..)
    X[i] = 0;
    for (int j = ..)
        X[i] += Y[j];

The inner X[i] will currently be pulled out of the loop by LICM, but the original X[i] = 0 will remain as a dead store.

lib/Transforms/Scalar/DeadStoreElimination.cpp
1324 ↗	(On Diff #124334)	This is one of those things where I remember changing the "if" here to an assert, seeing it assert a lot and figuring it was then needed. A lot of those cases will be benign though. I took another look, and after wading through a few csmith cases where extra stores are removed with this check, something like the following is where this comes up: define void @test(i32* %P, i32* noalias %Q, i32* %R) { ; 1 = MemoryDef(liveOnEntry) store i32 1, i32* %Q ; 2 = MemoryDef(1) store i32 2, i32* %P ; 3 = MemoryDef(2) store i32 3, i32* %Q ; MemoryUse(2) %1 = load i32, i32* %R ret void } store 3 to Q can be removed, but as MemAccess "2" has 2 operands, the second Use would cause us to fail as we walk from 2 to 3. I'm not sure if this is worth the cost though. I don't have a great grasp of what will end up being expensive. I will try some benchmarks and try to get some compile time numbers to see, and if there are not any useful cases where this comes up, I'll remove it. My first go at getting compile time numbers was too noisy to be useful. One thing I never looked into very much in creating this was looking into the internals of MemorySSA. MemSSA seemed to just work well enough to not need it. The only thing I remember coming up was cases like this: define void @test16(i32* noalias %P) { %P2 = bitcast i32* %P to i8* store i32 1, i32* %P br i1 true, label %bb1, label %bb3 bb1: store i32 1, i32* %P br label %bb3 bb3: call void @free(i8* %P2) ret void } We remove the store 1 in the middle, leaving this: define void @test16(i32* noalias %P) { %P2 = bitcast i32* %P to i8* ; 1 = MemoryDef(liveOnEntry) store i32 1, i32* %P br i1 true, label %bb1, label %bb3 bb1: ; preds = %0 br label %bb3 bb3: ; preds = %bb1, %0 ; 4 = MemoryPhi({%0,1},{bb1,1}) ; 3 = MemoryDef(4) call void @free(i8* %P2) ret void } The memory phi remains with two operands both coming from "1". I presume this is OK, and it's easy enough to work around. It's different to a newly constructed memssa graph though.
1469 ↗	(On Diff #124334)	This is very sparse on comments, isn't it. My bad. I'll try and add some more in here. The important part of this function is to check that there are no throwing instructions between the SI and NI, except where SI is an Alloca/AllocLikeFn that do not escape.
1651 ↗	(On Diff #124334)	More comments, got it, will do. This is one of the good ideas that came from D29624 (any bugs are still on me, of course :) Nothing should invalidate the post orders. It's the relative order that we care about and dse only removes instructions, never re-orders them. So if there is a throw between two PO numbers the throw will remain in it's order when we remove one of the stores. I was just thinking more about this and it may end up giving odd results at times. This linearising of the PO numbers is dependant on the order the blocks are visited. The store 1 here is removed: define void @test(i32* noalias %P) { br i1 true, label %bb1, label %bb2 bb1: store i32 1, i32* %P br label %bb3 bb2: call void @unknown_func() br label %bb3 bb3: store i32 0, i32* %P ret void } But changing it to "br i1 true, label %bb2, label %bb1", the store wont be removed. That doesn't sound like something we want. Maybe this should be keeping the maythrown instructions around and checking they are really in the way. I'll look into this, but may need to read a book.
1689 ↗	(On Diff #124334)	OK. As I said above I never looked deeply into the internals of MemSSA. Doing the optimisation of stores in the MemorySSA graph is what gcc does, right? So it only needs to look at a stores immediate uses to find dead stores. And look through PHIs. My understanding is that it would only be DSE that needs this at current. For all the other users of MemSSA (gvn, licm, etc) only need optimised stores. So if we did it that way it should be an optional thing that DSE can then use. Do you think that way would be better? We will still have to walk through PHI's, but that's fairly trivial. I will try and get some compile time results together and see how things look.
test/Transforms/DeadStoreElimination/simple.ll
555 ↗	(On Diff #124334)	Thanks for checking. That pesky handleEnd function I feel was doing more than it should have.

rogfer01 added a subscriber: rogfer01.Dec 4 2017, 3:32 AM

New changes. Including changing some of how maythrows are handled, some extra comments and some optimisations for slow operations.

OK. I have some performance numbers. I'm compiling clang ("ninja clang") and using
-ftime-report/-stat to get info (with some extra precision for decimal places) and
summing the results for all the compiled files. The total runtime is a little noisy on
this machine, but these sub-numbers seem pretty stable between runs.

Firstly the good news. With this version we now remove more dead store.
Old: 41310 New: 51660
With my "MemSSA can enable us to remove more stores" hat on, this is good stuff.

Some more good news is that DSE is now quicker, for the sum of time for each file:
Old: ~26s New: ~19s

The bad news is that we also need to add in the MemorySSA passes. I think we now
calc this twice in the pipeline, not once as before, so times roughly double.
Old: ~35s New: ~69s
I'm hoping that in the long run we can shared the cost of this between other passes.
NewGVN is a couple of hops earlier in the LTO pass pipeline, LICM also quite close
in the normal one. Hopefully this cost can be shared out.

The other bad news is we use a post-dom tree (again, maybe sharable?):
Old: ~15s New: ~27s
But Memdeps is somehow now quicker:
Old: ~13s New: ~8.5s

The total runtime here was on the order of 10000s, so it's hard to pick out the overall
cost exactly. These results suggest that the total is now ~30s more, and excluding
MemSSA we are at roughly the same time.

I'm going to try and take a look at the most costly files and see if we can knock the most
expensive ones down without making the total slower. As Daniel mentioned, there some
good candidates for caching the results here, like those in isOverlap.

Maths isn't on my side for making the whole thing quicker. But it removes more dead stores :)

dmgreen added a parent revision: D41302: [LoopSimplify] Preserve Post Dom Trees across Loop Simplify.Jan 12 2018, 10:26 AM

Ping.

Any futher progress here?

Herald added a subscriber: modocache. · View Herald TranscriptMay 21 2018, 2:26 PM

aqjune added a subscriber: aqjune.May 29 2018, 7:35 PM

I started rebasing this, I'll upload it.

Herald added a subscriber: jfb. · View Herald TranscriptSep 5 2018, 3:49 PM

rebase

Herald added a subscriber: hiraditya. · View Herald TranscriptSep 5 2018, 3:49 PM

Harbormaster completed remote builds in B22283: Diff 164118.Sep 5 2018, 3:49 PM

Can you add tests for volatile?

Thanks for taking a look at this, it would be good to make use of it. You may want to check the llvm-commits mails for more of Daniels thoughts, as they unfortunately didn't make it into phabricator.

IIRC my plan was to add some sort of caching to isOverwrite, to reduce the number of required calls to getUnderlyingObject/whatever else in that function can take time. That helped speed things up in some cases, but still left some times that this wasn't as quick as I would like. It may still be quicker than the "old" pass overall.

There is also a patch somewhere to make LICM preserve PDT's which would mean we get it for free. That's not needed before this though, it just cuts down the time again. From the old tests I did it seemed that this version of dse was quicker, but the analysis we need are not free. I was looking for cross-basic-block DSE, so was happy enough paying the price. I believe you are looking for something that has less degenerate compile time regressions?

Can you add tests for volatile?

The idea here would be that that enable-dse-memoryssa would be set to true eventually, so this should pass every existing DSE test. The old parts of this file could then be deleted. The multiblock tests are new as that's not something the old pass could do. I'm not sure if this will still pass all the existing tests, or if the old DSE algorithm has learnt new tricks since then.

Can you add tests for volatile?

The idea here would be that that enable-dse-memoryssa would be set to true eventually, so this should pass every existing DSE test. The old parts of this file could then be deleted. The multiblock tests are new as that's not something the old pass could do. I'm not sure if this will still pass all the existing tests, or if the old DSE algorithm has learnt new tricks since then.

Old LLVM code is often poorly tests. The DSE directory has two volatile stores, one volatile load, and no volatile atomics. It has a decent amount of atomic load / store, and no atomic RMW or cmpxchg. I agree that you want to pass old tests (potentially with tweaks as some things change), but the current volatile and atomic situation is not solid grounds for an algorithm swap.

I *want* DSE of dead stores around volatile and atomic. I'm not asking for this patch to implement it. I just want basic correctness so that, now and once we start being more aggressive, we know there's no breakage.

fhahn added a subscriber: fhahn.Jan 11 2019, 1:35 PM

Hi!

I would be interested giving dead store elimination using MemorySSA another push. We've hit plenty DSE limitations especially together with -ftrivial-auto-var-init and I think modernizing DSE to use MemorySSA will help us to improve the situation.

@rnk, @dmgreen are you still interested in pushing for DSE + MSSA? Otherwise I would be happy to look into that. I think this patch is a great foundation and I think it be possible to break it up into a few distinct parts, starting with just replacing completely overwritten stores and adding additional cases as follow ups. I also have a few potential simplifications in mind I'd like to try. I rebased the patch and I tried to build SPEC2006/MultiSource tests, but it does not get very far without crashing. Again, my main goal would be to break it up into more manageable pieces :)

Looking a bit further ahead, I'd also like to lift the post-domination requirement for cases where we can prove that an overriding store on all paths to the exit/other reads.

Please let me know what you think!

Herald added a project: Restricted Project. · View Herald TranscriptDec 6 2019, 5:36 AM

Herald added a subscriber: asbirlea. · View Herald Transcript

Yes, please feel free to grab this and take it forward. I got this rebased and realized it was going to be much more work to get it thoroughly tested.

Tyker added a subscriber: Tyker.Dec 7 2019, 6:22 AM

Sounds great. It would be good to see some improved DSE in llvm.

This will have bitrotted heavily on the last 2 years. I can imagine that most of the parts of "old dse" it was using will have changed, leading to different assumptions.

The algorithm here may not be the greatest thing ever created. I think this is probably better than the current DSE in tree at the moment (which I have a low opinion of, but is battle-tested). It can handle cross-basic block dependencies for example, which is awesome, but tat can come at a price. And there are places in here that are O(n^2), which I did run into on a normal bootstrap from my testing. They weren't super slow, and the old DSE can have the same problems, but it's something to watch out for. Both GCC and Swift have different DSE algorithms. IIRC, Swifts propagates store values backwards and GCC has something closer to PDSE (which is like GVN but in reverse from what I understand).

fhahn mentioned this in D72146: [DSE] Add first version of MemorySSA-backed DSE..Jan 3 2020, 6:09 AM

I've put up the first few patches starting at D72146

Tyker mentioned this in D72182: [WIP][DSE] Add basic cross-block dse pass using MemorySSA.Jan 3 2020, 2:12 PM

fhahn mentioned this in D72700: [DSE] Add first version of MemorySSA-backed DSE (Bottom up walk)..Jan 14 2020, 6:12 AM

fhahn mentioned this in rGd0c4d4fe0929: [DSE] Add first version of MemorySSA-backed DSE (Bottom up walk)..Feb 10 2020, 3:53 AM

Revision Contents

Path

Size

llvm/

lib/

Transforms/

Scalar/

DeadStoreElimination.cpp

653 lines

test/

Transforms/

DeadStoreElimination/

merge-stores-big-endian.ll

1 line

merge-stores.ll

1 line

multiblock.ll

775 lines

simple.ll

108 lines

Diff 164118

llvm/lib/Transforms/Scalar/DeadStoreElimination.cpp

Show All 12 Lines
// FIXME: This should eventually be extended to be a post-dominator tree		// FIXME: This should eventually be extended to be a post-dominator tree
// traversal. Doing so would be pretty trivial.		// traversal. Doing so would be pretty trivial.
//		//
//===----------------------------------------------------------------------===//		//===----------------------------------------------------------------------===//

#include "llvm/Transforms/Scalar/DeadStoreElimination.h"		#include "llvm/Transforms/Scalar/DeadStoreElimination.h"
#include "llvm/ADT/APInt.h"		#include "llvm/ADT/APInt.h"
#include "llvm/ADT/DenseMap.h"		#include "llvm/ADT/DenseMap.h"
		#include "llvm/ADT/PostOrderIterator.h"
#include "llvm/ADT/SetVector.h"		#include "llvm/ADT/SetVector.h"
#include "llvm/ADT/SmallPtrSet.h"		#include "llvm/ADT/SmallPtrSet.h"
#include "llvm/ADT/SmallVector.h"		#include "llvm/ADT/SmallVector.h"
#include "llvm/ADT/Statistic.h"		#include "llvm/ADT/Statistic.h"
#include "llvm/ADT/StringRef.h"		#include "llvm/ADT/StringRef.h"
#include "llvm/Analysis/AliasAnalysis.h"		#include "llvm/Analysis/AliasAnalysis.h"
#include "llvm/Analysis/CaptureTracking.h"		#include "llvm/Analysis/CaptureTracking.h"
#include "llvm/Analysis/GlobalsModRef.h"		#include "llvm/Analysis/GlobalsModRef.h"
#include "llvm/Analysis/MemoryBuiltins.h"		#include "llvm/Analysis/MemoryBuiltins.h"
#include "llvm/Analysis/MemoryDependenceAnalysis.h"		#include "llvm/Analysis/MemoryDependenceAnalysis.h"
#include "llvm/Analysis/MemoryLocation.h"		#include "llvm/Analysis/MemoryLocation.h"
		#include "llvm/Analysis/MemorySSA.h"
		#include "llvm/Analysis/MemorySSAUpdater.h"
		#include "llvm/Analysis/PostDominators.h"
#include "llvm/Analysis/TargetLibraryInfo.h"		#include "llvm/Analysis/TargetLibraryInfo.h"
#include "llvm/Transforms/Utils/Local.h"		#include "llvm/Transforms/Utils/Local.h"
#include "llvm/Analysis/ValueTracking.h"		#include "llvm/Analysis/ValueTracking.h"
#include "llvm/IR/Argument.h"		#include "llvm/IR/Argument.h"
#include "llvm/IR/BasicBlock.h"		#include "llvm/IR/BasicBlock.h"
#include "llvm/IR/CallSite.h"		#include "llvm/IR/CallSite.h"
#include "llvm/IR/Constant.h"		#include "llvm/IR/Constant.h"
#include "llvm/IR/Constants.h"		#include "llvm/IR/Constants.h"
Show All 40 Lines	EnablePartialOverwriteTracking("enable-dse-partial-overwrite-tracking",
cl::init(true), cl::Hidden,		cl::init(true), cl::Hidden,
cl::desc("Enable partial-overwrite tracking in DSE"));		cl::desc("Enable partial-overwrite tracking in DSE"));

static cl::opt<bool>		static cl::opt<bool>
EnablePartialStoreMerging("enable-dse-partial-store-merging",		EnablePartialStoreMerging("enable-dse-partial-store-merging",
cl::init(true), cl::Hidden,		cl::init(true), cl::Hidden,
cl::desc("Enable partial store merging in DSE"));		cl::desc("Enable partial store merging in DSE"));

		static cl::opt<bool>
		EnableMemorySSA("enable-dse-memoryssa", cl::init(false), cl::Hidden,
		cl::desc("Use the new MemorySSA-backed DSE."));

		static cl::opt<unsigned> MemorySSAMemoryAccessScanLimit(
		"dse-memoryssa-scanlimit", cl::init(100), cl::Hidden,
		cl::desc("The number of memory instructions to scan dead store elimination "
		"(default = 100)"));

//===----------------------------------------------------------------------===//		//===----------------------------------------------------------------------===//
// Helper functions		// Helper functions
//===----------------------------------------------------------------------===//		//===----------------------------------------------------------------------===//
using OverlapIntervalsTy = std::map<int64_t, int64_t>;		using OverlapIntervalsTy = std::map<int64_t, int64_t>;
using InstOverlapIntervalsTy = DenseMap<Instruction *, OverlapIntervalsTy>;		using InstOverlapIntervalsTy = DenseMap<Instruction *, OverlapIntervalsTy>;

/// Delete this instruction. Before we do, go through and zero out all the		/// Delete this instruction. Before we do, go through and zero out all the
/// operands of this instruction. If any of them become dead, delete them and		/// operands of this instruction. If any of them become dead, delete them and
▲ Show 20 Lines • Show All 1,221 Lines • ▼ Show 20 Lines	for (BasicBlock &BB : F)
// Only check non-dead blocks. Dead blocks may have strange pointer		// Only check non-dead blocks. Dead blocks may have strange pointer
// cycles that will confuse alias analysis.		// cycles that will confuse alias analysis.
if (DT->isReachableFromEntry(&BB))		if (DT->isReachableFromEntry(&BB))
MadeChange \|= eliminateDeadStores(BB, AA, MD, DT, TLI);		MadeChange \|= eliminateDeadStores(BB, AA, MD, DT, TLI);

return MadeChange;		return MadeChange;
}		}

		//=============================================================================
		// MemorySSA form of Dead store elimination.
		//
		// This uses MemorySSA to perform dead store elimination. MemorySSA will give
		// us a graph of MemoryAccesses which will either be Defs (stores), Uses (loads)
		// or Phis. The Uses will not be noalias of the Def they are a use of (unless
		// we hit the memoryssa optimistaion limit). Unfortately we cannot say the same
		// things of stores. So we walk forward from one store to the next, looking
		// until we hit a store (or collection of stores) that cause the original to be
		// dead.
		//
		// Because we are using MemorySSA we can also look across blocks. Anywhere where
		// there are multiple possible successors we treat as a break. Otherwise, so
		// long as the later store post-dominated the earlier one, the earlier one is
		// dead and can be removed.
		//
		// There are numerous other things we need to handle along the way.
		// - Stores that hit the end of the function can be removed if they are not
		// used (ie if they are allocas or alloca-like object that dont escape)
		// - Likewise for stores that hit a free or lifetime_end.
		// - Noop stores are those that load and then store the same value.
		// - Multiple stores together may overwrite an earlier store. These are
		// handled with InstOverlapIntervals inside isOverwrite and
		// removePartiallyOverlappedStores.
		// - Constant stores may be merged into their predecessor if the modify a
		// sub-potion of the earlier store. This is called
		// PartialEarlierWithFullLater and works in reverse to normal dse - the
		// later store is removed, with it's constant value merged into the earlier
		// store. A few extra checks are needed to ensure this remains valid.
		// - Any instruction that may throw usually acts a dse barrier (So long as the
		// memory is non-escaping)
		// - We have to be careful about stores that may also be self-reads.
		// - Atomics can be optimised (especially unordered), but are often not worth
		// it. We treat fences like maythrows so that the block dse in the same
		// manner.
		//
		// The above MemDep based methods will eventually be removed (if they are unused
		// by those below).

		struct NextResult {
		enum WalkType { Bad, Next, End };
		WalkType type;
		MemoryAccess *next;
		};

		// Find the next MemorySSA use from Current, weaning out things that dont alias.
		NextResult getNextMemoryAccess(MemoryAccess *Current, MemoryLocation SILoc,
		AliasAnalysis &AA, bool IsStart) {
		NextResult Next = {NextResult::End, nullptr};

		// Scan the uses to see if we have a single interesting Next MemoryAccess.
		for (Use &U : Current->uses()) {
		MemoryAccess *Use = cast<MemoryAccess>(U.getUser());
		if (auto MA = dyn_cast<MemoryUse>(Use)) {
		// Any MemoryUse's that may alias is a break. From the first node
		// (SI == current) this will always be true.
		if (IsStart \|\| isRefSet(AA.getModRefInfo(MA->getMemoryInst(), SILoc)))
		return {NextResult::Bad, nullptr};
		} else if (isa<MemoryDef>(Use) \|\| isa<MemoryPhi>(Use)) {
		// If we see two different nodes, we are at a split point
		if (Next.next && Next.next != Use)
		return {NextResult::Bad, nullptr};
		Next = {NextResult::Next, Use};
		} else
		llvm_unreachable("Unexpected MemoryAccess type");
		}

		return Next;
		}

		// Delete dead memory defs
		void deleteDeadInstruction(Instruction *SI, InstOverlapIntervalsTy &IOL,
		MemorySSA &MSSA, const TargetLibraryInfo &TLI) {
		MemorySSAUpdater updater(&MSSA);
		SmallVector<Instruction *, 32> NowDeadInsts;
		NowDeadInsts.push_back(SI);
		--NumFastOther;

		while (!NowDeadInsts.empty()) {
		Instruction *DeadInst = NowDeadInsts.pop_back_val();
		++NumFastOther;

		// Remove the Instruction from MSSA and IOL
		if (MemoryAccess *MA = MSSA.getMemoryAccess(DeadInst))
		updater.removeMemoryAccess(MA);
		IOL.erase(DeadInst);

		// Remove it's operands
		for (Use &O : DeadInst->operands())
		if (Instruction *OpI = dyn_cast<Instruction>(O)) {
		O = nullptr;
		if (isInstructionTriviallyDead(OpI, &TLI))
		NowDeadInsts.push_back(OpI);
		}

		DeadInst->eraseFromParent();
		}
		}

		// Merge constant stores into predecessors
		void mergePartialMemoryDefs(StoreInst Earlier, StoreInst Later,
		InstOverlapIntervalsTy &IOL, const DataLayout &DL,
		MemorySSA &MSSA, const TargetLibraryInfo &TLI) {
		// If the store we find is:
		// a) partially overwritten by the store to 'Loc'
		// b) the later store is fully contained in the earlier one and
		// c) they both have a constant value
		// Merge the two stores, replacing the earlier store's value with a
		// merge of both values.
		// TODO: Deal with other constant types (vectors, etc), and probably
		// some mem intrinsics (if needed)

		APInt EarlierValue =
		cast<ConstantInt>(Earlier->getValueOperand())->getValue();
		APInt LaterValue = cast<ConstantInt>(Later->getValueOperand())->getValue();
		unsigned LaterBits = LaterValue.getBitWidth();
		assert(EarlierValue.getBitWidth() > LaterValue.getBitWidth());
		LaterValue = LaterValue.zext(EarlierValue.getBitWidth());

		const Value *P1 = Earlier->getPointerOperand()->stripPointerCasts();
		const Value *P2 = Later->getPointerOperand()->stripPointerCasts();
		int64_t InstWriteOffset, DepWriteOffset;
		GetPointerBaseWithConstantOffset(P1, DepWriteOffset, DL);
		GetPointerBaseWithConstantOffset(P2, InstWriteOffset, DL);

		// Offset of the smaller store inside the larger store
		unsigned BitOffsetDiff = (InstWriteOffset - DepWriteOffset) * 8;
		unsigned LShiftAmount =
		DL.isBigEndian() ? EarlierValue.getBitWidth() - BitOffsetDiff - LaterBits
		: BitOffsetDiff;
		APInt Mask = APInt::getBitsSet(EarlierValue.getBitWidth(), LShiftAmount,
		LShiftAmount + LaterBits);
		// Clear the bits we'll be replacing, then OR with the smaller
		// store, shifted appropriately.
		APInt Merged = (EarlierValue & ~Mask) \| (LaterValue << LShiftAmount);
		LLVM_DEBUG(dbgs() << "DSE: Merge Stores:\n Earlier: " << *Earlier << "\n Later: "
		<< *Later << "\n Merged Value: " << Merged << '\n');

		Earlier->setOperand(
		0, ConstantInt::get(Earlier->getValueOperand()->getType(), Merged));
		deleteDeadInstruction(Later, IOL, MSSA, TLI);
		IOL.erase(Earlier);
		++NumModifiedStores;
		}

		// FIXME: Make this use MSSA
		bool eliminateNoopStore(MemoryDef &MD, InstOverlapIntervalsTy &IOL,
		const DataLayout &DL, AliasAnalysis &AA,
		MemorySSA &MSSA, const TargetLibraryInfo &TLI) {
		// Must be a store instruction.
		StoreInst *SI = dyn_cast<StoreInst>(MD.getMemoryInst());
		if (!SI)
		return false;

		// If we're storing the same value back to a pointer that we just loaded from,
		// then the store can be removed.
		if (LoadInst *DepLoad = dyn_cast<LoadInst>(SI->getValueOperand())) {
		if (SI->getPointerOperand() == DepLoad->getPointerOperand() &&
		isRemovable(SI) && memoryIsNotModifiedBetween(DepLoad, SI, &AA)) {

		LLVM_DEBUG(dbgs() << "DSE: Remove Store Of Load from same pointer:\n LOAD: "
		<< DepLoad << "\n STORE: " << SI << '\n');

		deleteDeadInstruction(SI, IOL, MSSA, TLI);
		++NumRedundantStores;
		return true;
		}
		}

		// Remove null stores into the calloc'ed objects
		Constant *StoredConstant = dyn_cast<Constant>(SI->getValueOperand());
		if (StoredConstant && StoredConstant->isNullValue() && isRemovable(SI)) {
		Instruction *UnderlyingPointer =
		dyn_cast<Instruction>(GetUnderlyingObject(SI->getPointerOperand(), DL));

		if (UnderlyingPointer && isCallocLikeFn(UnderlyingPointer, &TLI) &&
		memoryIsNotModifiedBetween(UnderlyingPointer, SI, &AA)) {
		LLVM_DEBUG(
		dbgs() << "DSE: Remove null store to the calloc'ed object:\n DEAD: "
		<< SI << "\n OBJECT: " << UnderlyingPointer << '\n');

		deleteDeadInstruction(SI, IOL, MSSA, TLI);
		++NumRedundantStores;
		return true;
		}
		}
		return false;
		}

		// We treat Fences as if they are mayThrow instructions, to get the same
		// behaviour.
		bool isThrowLikeInstruction(Instruction *I) {
		return isa<FenceInst>(I) \|\| I->mayThrow();
		}

		// Check for any extra throws between SI and NI that block DSE. This only
		// checks extra maythrows (those that arn't MemoryDef's). MemoryDef that may
		// throw are handled during the walk from one def to the next.
		bool mayThrowBetween(Instruction SI, Instruction NI, const Value *SILocUnd,
		SmallSetVector<const Value *, 16> &NonEscapingStackObjects,
		DenseMap<Value *, unsigned> &InstructionPostOrders,
		SmallVector<std::pair<unsigned, Instruction *>, 4>
		&ExtraMayThrowInstructions,
		DominatorTree &DT) {
		// First see if we can ignore it by using the fact that SI is an alloca/alloca
		// like object that doesn't escape.
		if (NonEscapingStackObjects.count(SILocUnd))
		return false;

		// We know that SI dominates NI and that NI post dominates SI. For one of the
		// potential maythrows to remain interesting it needs to have a PO between
		// SIPO and NIPO, and dominate SI.
		assert(InstructionPostOrders.count(SI));
		assert(InstructionPostOrders.count(NI));
		unsigned SIPO = InstructionPostOrders[SI];
		unsigned NIPO = InstructionPostOrders[NI];
		assert(std::is_sorted(ExtraMayThrowInstructions.begin(),
		ExtraMayThrowInstructions.end()));
		assert(SIPO > NIPO);
		auto lit = std::upper_bound(
		ExtraMayThrowInstructions.begin(), ExtraMayThrowInstructions.end(),
		std::pair<unsigned, Instruction *>(NIPO + 1, nullptr));
		for (auto it = lit; it != ExtraMayThrowInstructions.end() && it->first < SIPO;
		it++)
		if (DT.dominates(SI->getParent(), it->second->getParent()))
		return true;
		return false;
		}

		// Look for frees or lifetime_ends that mark the end of the lifetime of the
		// object.
		static bool
		handleFree(Instruction SI, const Value SILocUnd, Instruction *NI,
		MemoryLocation &NILoc,
		SmallSetVector<const Value *, 16> &NonEscapingStackObjects,
		DenseMap<Value *, unsigned> &InstructionPostOrders,
		SmallVector<std::pair<unsigned, Instruction *>, 4>
		&ExtraMayThrowInstructions,
		InstOverlapIntervalsTy &IOL, AliasAnalysis &AA, MemorySSA &MSSA,
		PostDominatorTree &PDT, DominatorTree &DT, const DataLayout &DL,
		const TargetLibraryInfo &TLI) {
		const Value *NIUndPtr = nullptr;
		if (isFreeCall(NI, &TLI)) {
		NIUndPtr = GetUnderlyingObject(cast<CallInst>(NI)->getArgOperand(0), DL);
		} else if (auto *In = dyn_cast<IntrinsicInst>(NI)) {
		if (In->getIntrinsicID() == Intrinsic::lifetime_end)
		NIUndPtr = GetUnderlyingObject(NILoc.Ptr, DL);
		}

		if (!NIUndPtr)
		return false;

		// If we have hit the lifetime end, we can deleted the store if it postdoms
		// and there are not throws between
		if (!AA.isMustAlias(SILocUnd, NIUndPtr) \|\|
		!PDT.dominates(NI->getParent(), SI->getParent()) \|\|
		mayThrowBetween(SI, NI, SILocUnd, NonEscapingStackObjects,
		InstructionPostOrders, ExtraMayThrowInstructions, DT))
		return false;

		LLVM_DEBUG(dbgs() << "DSE: Dead Store to soon to be freed memory:\n DEAD: " << *SI
		<< '\n');
		deleteDeadInstruction(SI, IOL, MSSA, TLI);
		++NumFastStores;
		return true;
		}

		// Remove dead stores to stack-allocated locations that reach the end of the
		// function. Ex: %A = alloca i32
		// ...
		// store i32 1, i32* %A
		// ret void
		static bool
		handleEnd(Instruction *SI,
		SmallSetVector<const Value *, 16> &NonEscapingStackObjects,
		InstOverlapIntervalsTy &IOL, MemorySSA &MSSA, const DataLayout &DL,
		const TargetLibraryInfo &TLI) {
		// See through pointer-to-pointer bitcasts
		SmallVector<Value *, 4> Pointers;
		GetUnderlyingObjects(getStoredPointerOperand(SI), Pointers, DL);

		// Stores to stack values are valid candidates for removal (so long as they
		// are not returned).
		for (Value *Pointer : Pointers)
		if (!NonEscapingStackObjects.count(Pointer) \|\|
		(isAllocLikeFn(Pointer, &TLI) &&
		PointerMayBeCaptured(Pointer, true, true)))
		return false;

		LLVM_DEBUG(dbgs() << "DSE: Dead Store at End of Block:\n DEAD: " << *SI
		<< "\n Objects: ";
		for (SmallVectorImpl<Value *>::iterator I = Pointers.begin(),
		E = Pointers.end();
		I != E; ++I) {
		dbgs() << **I;
		if (std::next(I) != E)
		dbgs() << ", ";
		}
		dbgs() << '\n');
		deleteDeadInstruction(SI, IOL, MSSA, TLI);
		++NumFastStores;
		return true;
		}

		// Check for anything that act as a DSE barrier.
		// Such as atomics stronger that monotonic.
		// Anything that looks like a self read.
		// And if NI may throw.
		static bool
		isDSEBarrier(Instruction SI, MemoryLocation &SILoc, const Value SILocUnd,
		Instruction *NI, MemoryLocation &NILoc, ModRefInfo &MaxModRefInfo,
		std::vector<Instruction *> &MayModStores,
		SmallSetVector<const Value *, 16> &NonEscapingStackObjects,
		AliasAnalysis &AA, const TargetLibraryInfo &TLI) {
		// If NI may throw it acts as a barrier, unless we are to an alloca/alloca
		// like object that does not escape.
		if (isThrowLikeInstruction(NI) && !NonEscapingStackObjects.count(SILocUnd))
		return true;

		// Here we mostly only care about if SI -> NI refs, not if it mods. So we
		// make a quick optimisation of treating any mayReadFromMemory as noref. The
		// exception where we do care about refs is PartialEarlierWithFullLater, which
		// needs to know if a store modified the loc between the two. So we save
		// stores and defer the mod check to later if needed.
		ModRefInfo NIModRefInfo = ModRefInfo::NoModRef;
		if (NI->mayReadFromMemory())
		NIModRefInfo = AA.getModRefInfo(NI, SILoc);
		else
		MayModStores.push_back(NI);

		if (NI->isAtomic()) {
		if (auto *NSI = dyn_cast<StoreInst>(NI)) {
		if (isStrongerThanMonotonic(NSI->getOrdering()))
		return true;
		} else if (auto *NLI = dyn_cast<LoadInst>(NI)) {
		if (isStrongerThanMonotonic(NLI->getOrdering()) \|\|
		!AA.isNoAlias(MemoryLocation::get(NLI), SILoc))
		return true;
		} else if (isa<FenceInst>(NI))
		// We can remove the dead stores, irrespective of the fence and its
		// ordering (release/acquire/seq_cst). Fences only constraints the
		// ordering of already visible stores, it does not make a store visible to
		// other threads. So, skipping over a fence does not change a store from
		// being dead. We treat fences as maythrows, to be conservative so that
		// they block optimisations in the same way
		return false;
		else
		return true;
		} else if (isRefSet(NIModRefInfo))
		return true;

		if (hasAnalyzableMemoryWrite(NI, TLI) && isPossibleSelfRead(NI, NILoc, SI, TLI, AA))
		return true;

		MaxModRefInfo = unionModRef(MaxModRefInfo, NIModRefInfo);
		return false;
		}

		// TODOD Combine into a single getLocForWrite version above
		static MemoryLocation getLocForWriteEx(Instruction *SI, AliasAnalysis &AA,
		const TargetLibraryInfo &TLI) {
		MemoryLocation Loc = getLocForWrite(SI);
		if (Loc.Ptr)
		return Loc;

		if (hasAnalyzableMemoryWrite(SI, TLI)) {
		CallSite CS(SI);
		// All the supported functions so far happen to have dest as their first
		// argument.
		return MemoryLocation(CS.getArgument(0));
		}
		return MemoryLocation();
		}

		static bool eliminateDeadStoresMemorySSA(Function &F, AliasAnalysis &AA,
		MemorySSA &MSSA, DominatorTree &DT,
		PostDominatorTree &PDT,
		const TargetLibraryInfo &TLI) {
		const DataLayout &DL = F.getParent()->getDataLayout();
		bool MadeChange = false;

		// All MemoryDef stores
		SmallVector<MemoryDef *, 4> Stores;
		// Any that should be skipped as they are already deleted
		SmallPtrSet<MemoryDef *, 4> SkipStores;
		// A map of interval maps representing partially-overwritten value parts
		InstOverlapIntervalsTy IOL;
		// Post-orders for each interesting instruction, basic block and throw
		// These count up in post order, so lower numbers are later. Used to
		// detect loop latches and throws between memdefs.
		DenseMap<Value *, unsigned> InstructionPostOrders;
		// Extra maythrow instructions are any that are not memdefs (those we will
		// hit during the walk naturally). We keep the PO and the instruction to
		// check if there are throws in a given range.
		SmallVector<std::pair<unsigned, Instruction *>, 4> ExtraMayThrowInstructions;
		// Keep track of all of the stack objects that are dead at the end of the
		// function. They may be returned.
		SmallSetVector<const Value *, 16> NonEscapingStackObjects;

		// Get all memory writes and post orders of anything interesting
		unsigned PO = 0;
		for (BasicBlock *BB : post_order(&F)) {
		for (Instruction &I : reverse(*BB)) {
		bool MayThrow = isThrowLikeInstruction(&I);
		auto *MD = dyn_cast_or_null<MemoryDef>(MSSA.getMemoryAccess(&I));

		if (MayThrow \|\| MD) {
		unsigned IPO = PO++;
		InstructionPostOrders[&I] = IPO;

		if (!MD)
		ExtraMayThrowInstructions.push_back({IPO, &I});
		if (MD && hasAnalyzableMemoryWrite(&I, TLI) && isRemovable(&I))
		Stores.push_back(MD);
		}

		// Track alloca and alloca-like objects
		if (isa<AllocaInst>(&I))
		NonEscapingStackObjects.insert(&I);
		// Okay, so these are dead heap objects, but if the pointer never escapes
		// then it's leaked by this function anyways.
		else if (isAllocLikeFn(&I, &TLI) &&
		!PointerMayBeCaptured(&I, false, true))
		NonEscapingStackObjects.insert(&I);
		}

		InstructionPostOrders[BB] = PO++;
		}
		// Treat byval or inalloca arguments the same as Allocas, stores to them are
		// dead at the end of the function.
		for (Argument &AI : F.args())
		if (AI.hasByValOrInAllocaAttr())
		NonEscapingStackObjects.insert(&AI);

		// For each store:
		for (MemoryDef *SIMD : Stores) {
		if (SkipStores.count(SIMD))
		continue;
		Instruction *SI = SIMD->getMemoryInst();
		MemoryLocation SILoc = getLocForWriteEx(SI, AA, TLI);
		assert(SILoc.Ptr && "SILoc should not be null");
		const Value *SILocUnd = GetUnderlyingObject(SILoc.Ptr, DL);

		// Check for noop stores that store what was read to the same address
		if (eliminateNoopStore(*SIMD, IOL, DL, AA, MSSA, TLI)) {
		MadeChange = true;
		continue;
		}

		// Limit on the number of memory instruction we can look at
		int InstructionLimit = MemorySSAMemoryAccessScanLimit;
		// Used by PartialEarlierWithFullLater to ensure that we were free of
		// anything that could modref (excluding the current instruction)
		ModRefInfo MaxModRefInfo = ModRefInfo::NoModRef;
		// MayModStores
		std::vector<Instruction *> MayModStores;
		// Prev node in case we hit a loop latch
		MemoryAccess *Prev = SIMD;

		// Walk forward for stores that kill this
		for (NextResult Next = getNextMemoryAccess(SIMD, SILoc, AA, true);
		Next.type != NextResult::Bad && --InstructionLimit != 0;
		Prev = Next.next,
		Next = getNextMemoryAccess(Next.next, SILoc, AA, false)) {

		// We have hit the end of the walk. If the store is non-escaping, it is
		// dead and we can kill it.
		if (Next.type == NextResult::End) {
		if (handleEnd(SI, NonEscapingStackObjects, IOL, MSSA, DL, TLI))
		MadeChange = true;
		break;
		}

		if (auto ND = dyn_cast<MemoryDef>(Next.next)) {
		Instruction *NI = ND->getMemoryInst();
		MemoryLocation NILoc = getLocForWriteEx(NI, AA, TLI);

		// If we hit a free or lifetime end, we can kill the store
		if (handleFree(SI, SILocUnd, NI, NILoc, NonEscapingStackObjects,
		InstructionPostOrders, ExtraMayThrowInstructions, IOL,
		AA, MSSA, PDT, DT, DL, TLI)) {
		MadeChange = true;
		break;
		}

		// Check for anything that looks like it will be a barrier to further
		// removal
		ModRefInfo MaxModRefInfoPrev = MaxModRefInfo;
		if (isDSEBarrier(SI, SILoc, SILocUnd, NI, NILoc, MaxModRefInfo,
		MayModStores, NonEscapingStackObjects, AA, TLI))
		break;

		// We must post-dom the instructions to safely remove it
		if (!PDT.dominates(ND->getBlock(), SIMD->getBlock()))
		continue;

		// Before we try to remove anything, check for any extra throwing
		// instructions that block us from DSEing
		if (mayThrowBetween(SI, NI, SILocUnd, NonEscapingStackObjects,
		InstructionPostOrders, ExtraMayThrowInstructions,
		DT))
		break;

		// Get what type of overwrite this might be
		int64_t InstWriteOffset, DepWriteOffset;
		OverwriteResult OR = isOverwrite(NILoc, SILoc, DL, TLI, DepWriteOffset,
		InstWriteOffset, SI, IOL, AA, &F);

		if (OR == OW_Complete) {
		LLVM_DEBUG(dbgs() << "DSE: Remove Dead Store:\n DEAD: " << *SI
		<< "\n KILLER: " << *NI << '\n');
		deleteDeadInstruction(SI, IOL, MSSA, TLI);
		++NumFastStores;
		MadeChange = true;
		break;
		} else if (OR == OW_PartialEarlierWithFullLater) {
		// We currently merge values between stores of constants
		auto *Earlier = dyn_cast<StoreInst>(SI);
		auto *Later = dyn_cast<StoreInst>(NI);
		if (!Earlier \|\| !isa<ConstantInt>(Earlier->getValueOperand()) \|\|
		!Later \|\| !isa<ConstantInt>(Later->getValueOperand()))
		continue;

		// There are a few extra checks if it is a
		// PartialEarlierWithFullLater, as the removal happens in the opposite
		// order to normal (NI is removed, not SI)
		if (!isRemovable(NI) \|\| ND->getBlock() != SIMD->getBlock() \|\|
		isModOrRefSet(MaxModRefInfoPrev))
		continue;

		// And one last check of stores between the two instructions that
		// we defered the check of, to ensure they do not mod.
		if (std::any_of(MayModStores.begin(), MayModStores.end(),
		[&](Instruction *I) {
		return I != NI &&
		isModSet(AA.getModRefInfo(I, SILoc));
		}))
		continue;

		// Merge the two stores
		mergePartialMemoryDefs(Earlier, Later, IOL, DL, MSSA, TLI);
		// This removes ND, so we start over from the last entry, ensuring we
		// won't look at ND. We dont reset the InstructionLimit.
		Next = {NextResult::Next, Prev};
		SkipStores.insert(ND);
		// Remove the effects of NI from MaxModRefInfo/MayModStores
		MaxModRefInfo = MaxModRefInfoPrev;
		assert(!MayModStores.empty() && MayModStores.back() == NI);
		MayModStores.pop_back();
		MadeChange = true;
		}
		} else if (auto NP = dyn_cast<MemoryPhi>(Next.next)) {
		// Detect loops by checking if the PO instr numbers are earlier
		Value *PrevValue = nullptr;
		if (auto SMD = dyn_cast<MemoryDef>(Prev))
		PrevValue = SMD->getMemoryInst();
		else if (auto SMP = dyn_cast<MemoryPhi>(Prev))
		PrevValue = SMP->getBlock();
		assert(InstructionPostOrders.count(NP->getBlock()));
		assert(InstructionPostOrders.count(PrevValue));
		if (InstructionPostOrders[NP->getBlock()] >
		InstructionPostOrders[PrevValue])
		break;
		}
		}
		}

		if (EnablePartialOverwriteTracking)
		MadeChange \|= removePartiallyOverlappedStores(&AA, DL, IOL);

		return MadeChange;
		}

//===----------------------------------------------------------------------===//		//===----------------------------------------------------------------------===//
// DSE Pass		// DSE Pass
//===----------------------------------------------------------------------===//		//===----------------------------------------------------------------------===//
PreservedAnalyses DSEPass::run(Function &F, FunctionAnalysisManager &AM) {		PreservedAnalyses DSEPass::run(Function &F, FunctionAnalysisManager &AM) {
AliasAnalysis *AA = &AM.getResult<AAManager>(F);		AliasAnalysis &AA = AM.getResult<AAManager>(F);
DominatorTree *DT = &AM.getResult<DominatorTreeAnalysis>(F);		const TargetLibraryInfo &TLI = AM.getResult<TargetLibraryAnalysis>(F);
MemoryDependenceResults *MD = &AM.getResult<MemoryDependenceAnalysis>(F);		DominatorTree &DT = AM.getResult<DominatorTreeAnalysis>(F);
const TargetLibraryInfo *TLI = &AM.getResult<TargetLibraryAnalysis>(F);
		if (EnableMemorySSA) {
		MemorySSA &MSSA = AM.getResult<MemorySSAAnalysis>(F).getMSSA();
		PostDominatorTree &PDT = AM.getResult<PostDominatorTreeAnalysis>(F);

if (!eliminateDeadStores(F, AA, MD, DT, TLI))		if (!eliminateDeadStoresMemorySSA(F, AA, MSSA, DT, PDT, TLI))
return PreservedAnalyses::all();		return PreservedAnalyses::all();
		} else {
		MemoryDependenceResults &MD = AM.getResult<MemoryDependenceAnalysis>(F);

		if (!eliminateDeadStores(F, &AA, &MD, &DT, &TLI))
		return PreservedAnalyses::all();
		}

PreservedAnalyses PA;		PreservedAnalyses PA;
PA.preserveSet<CFGAnalyses>();		PA.preserveSet<CFGAnalyses>();
PA.preserve<GlobalsAA>();		PA.preserve<GlobalsAA>();
		if (EnableMemorySSA)
		PA.preserve<MemorySSAAnalysis>();
		else
PA.preserve<MemoryDependenceAnalysis>();		PA.preserve<MemoryDependenceAnalysis>();
return PA;		return PA;
}		}

namespace {		namespace {

/// A legacy pass for the legacy pass manager that wraps \c DSEPass.		/// A legacy pass for the legacy pass manager that wraps \c DSEPass.
class DSELegacyPass : public FunctionPass {		class DSELegacyPass : public FunctionPass {
public:		public:
static char ID; // Pass identification, replacement for typeid		static char ID; // Pass identification, replacement for typeid

DSELegacyPass() : FunctionPass(ID) {		DSELegacyPass() : FunctionPass(ID) {
initializeDSELegacyPassPass(*PassRegistry::getPassRegistry());		initializeDSELegacyPassPass(*PassRegistry::getPassRegistry());
}		}

bool runOnFunction(Function &F) override {		bool runOnFunction(Function &F) override {
if (skipFunction(F))		if (skipFunction(F))
return false;		return false;

DominatorTree *DT = &getAnalysis<DominatorTreeWrapperPass>().getDomTree();		AliasAnalysis &AA = getAnalysis<AAResultsWrapperPass>().getAAResults();
AliasAnalysis *AA = &getAnalysis<AAResultsWrapperPass>().getAAResults();		DominatorTree &DT = getAnalysis<DominatorTreeWrapperPass>().getDomTree();
MemoryDependenceResults *MD =		const TargetLibraryInfo &TLI =
&getAnalysis<MemoryDependenceWrapperPass>().getMemDep();		getAnalysis<TargetLibraryInfoWrapperPass>().getTLI();
const TargetLibraryInfo *TLI =
&getAnalysis<TargetLibraryInfoWrapperPass>().getTLI();		if (EnableMemorySSA) {
		MemorySSA &MSSA = getAnalysis<MemorySSAWrapperPass>().getMSSA();
		PostDominatorTree &PDT =
		getAnalysis<PostDominatorTreeWrapperPass>().getPostDomTree();

		return eliminateDeadStoresMemorySSA(F, AA, MSSA, DT, PDT, TLI);
		} else {
		MemoryDependenceResults &MD =
		getAnalysis<MemoryDependenceWrapperPass>().getMemDep();

return eliminateDeadStores(F, AA, MD, DT, TLI);		return eliminateDeadStores(F, &AA, &MD, &DT, &TLI);
		}
}		}

void getAnalysisUsage(AnalysisUsage &AU) const override {		void getAnalysisUsage(AnalysisUsage &AU) const override {
AU.setPreservesCFG();		AU.setPreservesCFG();
AU.addRequired<DominatorTreeWrapperPass>();
AU.addRequired<AAResultsWrapperPass>();		AU.addRequired<AAResultsWrapperPass>();
AU.addRequired<MemoryDependenceWrapperPass>();
AU.addRequired<TargetLibraryInfoWrapperPass>();		AU.addRequired<TargetLibraryInfoWrapperPass>();
AU.addPreserved<DominatorTreeWrapperPass>();
AU.addPreserved<GlobalsAAWrapperPass>();		AU.addPreserved<GlobalsAAWrapperPass>();
		AU.addRequired<DominatorTreeWrapperPass>();
		AU.addPreserved<DominatorTreeWrapperPass>();

		if (EnableMemorySSA) {
		AU.addRequired<PostDominatorTreeWrapperPass>();
		AU.addRequired<MemorySSAWrapperPass>();
		AU.addPreserved<PostDominatorTreeWrapperPass>();
		AU.addPreserved<MemorySSAWrapperPass>();
		} else {
		AU.addRequired<MemoryDependenceWrapperPass>();
AU.addPreserved<MemoryDependenceWrapperPass>();		AU.addPreserved<MemoryDependenceWrapperPass>();
}		}
		}
};		};

} // end anonymous namespace		} // end anonymous namespace

char DSELegacyPass::ID = 0;		char DSELegacyPass::ID = 0;

INITIALIZE_PASS_BEGIN(DSELegacyPass, "dse", "Dead Store Elimination", false,		INITIALIZE_PASS_BEGIN(DSELegacyPass, "dse", "Dead Store Elimination", false,
false)		false)
INITIALIZE_PASS_DEPENDENCY(DominatorTreeWrapperPass)		INITIALIZE_PASS_DEPENDENCY(DominatorTreeWrapperPass)
		INITIALIZE_PASS_DEPENDENCY(PostDominatorTreeWrapperPass)
INITIALIZE_PASS_DEPENDENCY(AAResultsWrapperPass)		INITIALIZE_PASS_DEPENDENCY(AAResultsWrapperPass)
INITIALIZE_PASS_DEPENDENCY(GlobalsAAWrapperPass)		INITIALIZE_PASS_DEPENDENCY(GlobalsAAWrapperPass)
		INITIALIZE_PASS_DEPENDENCY(MemorySSAWrapperPass)
INITIALIZE_PASS_DEPENDENCY(MemoryDependenceWrapperPass)		INITIALIZE_PASS_DEPENDENCY(MemoryDependenceWrapperPass)
INITIALIZE_PASS_DEPENDENCY(TargetLibraryInfoWrapperPass)		INITIALIZE_PASS_DEPENDENCY(TargetLibraryInfoWrapperPass)
INITIALIZE_PASS_END(DSELegacyPass, "dse", "Dead Store Elimination", false,		INITIALIZE_PASS_END(DSELegacyPass, "dse", "Dead Store Elimination", false,
false)		false)

FunctionPass *llvm::createDeadStoreEliminationPass() {		FunctionPass *llvm::createDeadStoreEliminationPass() {
return new DSELegacyPass();		return new DSELegacyPass();
}		}

llvm/test/Transforms/DeadStoreElimination/merge-stores-big-endian.ll

	Show All 34 Lines
	; CHECK-NEXT: store i64 72638273700655232, i64* [[PTR:%.*]]			; CHECK-NEXT: store i64 72638273700655232, i64* [[PTR:%.*]]
	; CHECK-NEXT: ret void			; CHECK-NEXT: ret void
	;			;
	entry:			entry:
	store i64 72623859790382856, i64* %ptr ; 0x0102030405060708			store i64 72623859790382856, i64* %ptr ; 0x0102030405060708

	%wptr = bitcast i64* %ptr to i16*			%wptr = bitcast i64* %ptr to i16*
	%wptr1 = getelementptr inbounds i16, i16* %wptr, i64 1			%wptr1 = getelementptr inbounds i16, i16* %wptr, i64 1
	%wptr2 = getelementptr inbounds i16, i16* %wptr, i64 2
	%wptr3 = getelementptr inbounds i16, i16* %wptr, i64 3			%wptr3 = getelementptr inbounds i16, i16* %wptr, i64 3

	;; We should be able to merge these two stores with the i64 one above			;; We should be able to merge these two stores with the i64 one above
	; value (and bytes) stored before ; 0x0102030405060708			; value (and bytes) stored before ; 0x0102030405060708
	store i16 4128, i16* %wptr1 ; 1020			store i16 4128, i16* %wptr1 ; 1020
	store i16 28800, i16* %wptr3 ; 7080			store i16 28800, i16* %wptr3 ; 7080
	; 0x0102102005067080			; 0x0102102005067080

	▲ Show 20 Lines • Show All 122 Lines • Show Last 20 Lines

llvm/test/Transforms/DeadStoreElimination/merge-stores.ll

	Show All 33 Lines
	; CHECK-NEXT: store i64 8106482645252179720, i64* [[PTR:%.*]]			; CHECK-NEXT: store i64 8106482645252179720, i64* [[PTR:%.*]]
	; CHECK-NEXT: ret void			; CHECK-NEXT: ret void
	;			;
	entry:			entry:
	store i64 72623859790382856, i64* %ptr ; 0x0102030405060708			store i64 72623859790382856, i64* %ptr ; 0x0102030405060708

	%wptr = bitcast i64* %ptr to i16*			%wptr = bitcast i64* %ptr to i16*
	%wptr1 = getelementptr inbounds i16, i16* %wptr, i64 1			%wptr1 = getelementptr inbounds i16, i16* %wptr, i64 1
	%wptr2 = getelementptr inbounds i16, i16* %wptr, i64 2
	%wptr3 = getelementptr inbounds i16, i16* %wptr, i64 3			%wptr3 = getelementptr inbounds i16, i16* %wptr, i64 3

	;; We should be able to merge these two stores with the i64 one above			;; We should be able to merge these two stores with the i64 one above
	; value (not bytes) stored before ; 0x0102030405060708			; value (not bytes) stored before ; 0x0102030405060708
	store i16 4128, i16* %wptr1 ; 1020			store i16 4128, i16* %wptr1 ; 1020
	store i16 28800, i16* %wptr3 ; 7080			store i16 28800, i16* %wptr3 ; 7080
	; 0x7080030410200708			; 0x7080030410200708
	ret void			ret void
	▲ Show 20 Lines • Show All 187 Lines • Show Last 20 Lines

llvm/test/Transforms/DeadStoreElimination/multiblock.ll

This file was added.

				; NOTE: Assertions have been autogenerated by utils/update_test_checks.py
				; RUN: opt < %s -basicaa -dse -enable-dse-memoryssa -S \| FileCheck %s

				target datalayout = "e-m:e-p:32:32-i64:64-v128:64:128-a:0:32-n32-S64"
				declare void @unknown_func()
				declare void @llvm.lifetime.start.p0i8(i64, i8* nocapture) nounwind
				declare void @llvm.lifetime.end.p0i8(i64, i8* nocapture) nounwind
				declare void @llvm.memcpy.p0i8.p0i8.i64(i8* nocapture, i8* nocapture, i64, i1) nounwind
				declare void @llvm.memset.p0i8.i64(i8* nocapture, i8, i64, i32, i1) nounwind
				declare noalias i8* @malloc(i32)
				declare void @free(i8* nocapture)


				define void @test1(i32* noalias %P) {
				; CHECK-LABEL: @test1(
				; CHECK-NEXT: br label [[BB1:%.*]]
				; CHECK: bb1:
				; CHECK-NEXT: store i32 0, i32* [[P:%.*]]
				; CHECK-NEXT: ret void
				;
				%DEAD = load i32, i32* %P
				%DEAD2 = add i32 %DEAD, 1
				store i32 %DEAD2, i32* %P
				br label %bb1
				bb1:
				store i32 0, i32* %P
				ret void
				}


				define void @test2(i32* noalias %P) {
				; CHECK-LABEL: @test2(
				; CHECK-NEXT: br i1 true, label [[BB1:%.]], label [[BB2:%.]]
				; CHECK: bb1:
				; CHECK-NEXT: br label [[BB3:%.*]]
				; CHECK: bb2:
				; CHECK-NEXT: br label [[BB3]]
				; CHECK: bb3:
				; CHECK-NEXT: store i32 0, i32* [[P:%.*]]
				; CHECK-NEXT: ret void
				;
				store i32 1, i32* %P
				br i1 true, label %bb1, label %bb2
				bb1:
				br label %bb3
				bb2:
				br label %bb3
				bb3:
				store i32 0, i32* %P
				ret void
				}


				define void @test3(i32* noalias %P) {
				; CHECK-LABEL: @test3(
				; CHECK-NEXT: store i32 0, i32* [[P:%.*]]
				; CHECK-NEXT: br i1 true, label [[BB1:%.]], label [[BB2:%.]]
				; CHECK: bb1:
				; CHECK-NEXT: br label [[BB3:%.*]]
				; CHECK: bb2:
				; CHECK-NEXT: store i32 0, i32* [[P]]
				; CHECK-NEXT: br label [[BB3]]
				; CHECK: bb3:
				; CHECK-NEXT: ret void
				;
				store i32 0, i32* %P
				br i1 true, label %bb1, label %bb2
				bb1:
				br label %bb3
				bb2:
				store i32 0, i32* %P
				br label %bb3
				bb3:
				ret void
				}


				define void @test4(i32* noalias %P) {
				; CHECK-LABEL: @test4(
				; CHECK-NEXT: store i32 0, i32* [[P:%.*]]
				; CHECK-NEXT: br i1 true, label [[BB1:%.]], label [[BB2:%.]]
				; CHECK: bb1:
				; CHECK-NEXT: br label [[BB3:%.*]]
				; CHECK: bb2:
				; CHECK-NEXT: [[X:%.]] = load i32, i32 [[P]]
				; CHECK-NEXT: br label [[BB3]]
				; CHECK: bb3:
				; CHECK-NEXT: store i32 0, i32* [[P]]
				; CHECK-NEXT: ret void
				;
				store i32 0, i32* %P
				br i1 true, label %bb1, label %bb2
				bb1:
				br label %bb3
				bb2:
				%x = load i32, i32* %P
				br label %bb3
				bb3:
				store i32 0, i32* %P
				ret void
				}


				define void @test5(i32* noalias %P) {
				; CHECK-LABEL: @test5(
				; CHECK-NEXT: br i1 true, label [[BB1:%.]], label [[BB2:%.]]
				; CHECK: bb1:
				; CHECK-NEXT: br label [[BB3:%.*]]
				; CHECK: bb2:
				; CHECK-NEXT: br label [[BB3]]
				; CHECK: bb3:
				; CHECK-NEXT: store i32 0, i32* [[P:%.*]]
				; CHECK-NEXT: ret void
				;
				br i1 true, label %bb1, label %bb2
				bb1:
				store i32 1, i32* %P
				br label %bb3
				bb2:
				store i32 1, i32* %P
				br label %bb3
				bb3:
				store i32 0, i32* %P
				ret void
				}


				define void @test6(i32* noalias %P) {
				; CHECK-LABEL: @test6(
				; CHECK-NEXT: store i32 0, i32* [[P:%.*]]
				; CHECK-NEXT: br i1 true, label [[BB1:%.]], label [[BB2:%.]]
				; CHECK: bb1:
				; CHECK-NEXT: br label [[BB3:%.*]]
				; CHECK: bb2:
				; CHECK-NEXT: call void @unknown_func()
				; CHECK-NEXT: br label [[BB3]]
				; CHECK: bb3:
				; CHECK-NEXT: store i32 0, i32* [[P]]
				; CHECK-NEXT: ret void
				;
				store i32 0, i32* %P
				br i1 true, label %bb1, label %bb2
				bb1:
				br label %bb3
				bb2:
				call void @unknown_func()
				br label %bb3
				bb3:
				store i32 0, i32* %P
				ret void
				}


				define void @test7(i32* noalias %P, i32* noalias %Q) {
				; CHECK-LABEL: @test7(
				; CHECK-NEXT: br i1 true, label [[BB1:%.]], label [[BB2:%.]]
				; CHECK: bb1:
				; CHECK-NEXT: [[TMP1:%.]] = load i32, i32 [[P:%.*]]
				; CHECK-NEXT: br label [[BB3:%.*]]
				; CHECK: bb2:
				; CHECK-NEXT: br label [[BB3]]
				; CHECK: bb3:
				; CHECK-NEXT: store i32 0, i32* [[Q:%.*]]
				; CHECK-NEXT: store i32 0, i32* [[P]]
				; CHECK-NEXT: ret void
				;
				store i32 1, i32* %Q
				br i1 true, label %bb1, label %bb2
				bb1:
				load i32, i32* %P
				br label %bb3
				bb2:
				br label %bb3
				bb3:
				store i32 0, i32* %Q
				store i32 0, i32* %P
				ret void
				}


				define void @test8(i32* noalias %P) {
				; CHECK-LABEL: @test8(
				; CHECK-NEXT: br i1 true, label [[BB1:%.]], label [[BB2:%.]]
				; CHECK: bb1:
				; CHECK-NEXT: br label [[BB3:%.*]]
				; CHECK: bb2:
				; CHECK-NEXT: br label [[BB3]]
				; CHECK: bb3:
				; CHECK-NEXT: store i32 0, i32* [[P:%.*]]
				; CHECK-NEXT: ret void
				;
				br i1 true, label %bb1, label %bb2
				bb1:
				store i32 1, i32* %P
				br label %bb3
				bb2:
				store i32 1, i32* %P
				br label %bb3
				bb3:
				store i32 0, i32* %P
				ret void
				}


				define void @test9(i32* noalias %P) {
				; CHECK-LABEL: @test9(
				; CHECK-NEXT: store i32 0, i32* [[P:%.*]]
				; CHECK-NEXT: br i1 true, label [[BB1:%.]], label [[BB2:%.]]
				; CHECK: bb1:
				; CHECK-NEXT: br label [[BB3:%.*]]
				; CHECK: bb2:
				; CHECK-NEXT: ret void
				; CHECK: bb3:
				; CHECK-NEXT: store i32 0, i32* [[P]]
				; CHECK-NEXT: ret void
				;
				store i32 0, i32* %P
				br i1 true, label %bb1, label %bb2
				bb1:
				br label %bb3
				bb2:
				ret void
				bb3:
				store i32 0, i32* %P
				ret void
				}


				define void @test10(i32* noalias %P) {
				; CHECK-LABEL: @test10(
				; CHECK-NEXT: [[P2:%.]] = bitcast i32 [[P:%.]] to i8
				; CHECK-NEXT: store i32 0, i32* [[P]]
				; CHECK-NEXT: br i1 true, label [[BB1:%.]], label [[BB2:%.]]
				; CHECK: bb1:
				; CHECK-NEXT: br label [[BB3:%.*]]
				; CHECK: bb2:
				; CHECK-NEXT: br label [[BB3]]
				; CHECK: bb3:
				; CHECK-NEXT: store i8 1, i8* [[P2]]
				; CHECK-NEXT: ret void
				;
				%P2 = bitcast i32* %P to i8*
				store i32 0, i32* %P
				br i1 true, label %bb1, label %bb2
				bb1:
				br label %bb3
				bb2:
				br label %bb3
				bb3:
				store i8 1, i8* %P2
				ret void
				}


				define void @test11(i32* noalias %P) {
				; CHECK-LABEL: @test11(
				; CHECK-NEXT: br i1 true, label [[BB1:%.]], label [[BB2:%.]]
				; CHECK: bb1:
				; CHECK-NEXT: br label [[BB3:%.*]]
				; CHECK: bb2:
				; CHECK-NEXT: br label [[BB3]]
				; CHECK: bb3:
				; CHECK-NEXT: store i32 0, i32* [[P:%.*]]
				; CHECK-NEXT: ret void
				;
				%P2 = bitcast i32* %P to i8*
				store i8 1, i8* %P2
				br i1 true, label %bb1, label %bb2
				bb1:
				br label %bb3
				bb2:
				br label %bb3
				bb3:
				store i32 0, i32* %P
				ret void
				}


				define void @test12(i32* noalias %P) {
				; CHECK-LABEL: @test12(
				; CHECK-NEXT: [[P2:%.]] = bitcast i32 [[P:%.]] to i8
				; CHECK-NEXT: store i32 0, i32* [[P]]
				; CHECK-NEXT: br i1 true, label [[BB1:%.]], label [[BB2:%.]]
				; CHECK: bb1:
				; CHECK-NEXT: br label [[BB3:%.*]]
				; CHECK: bb2:
				; CHECK-NEXT: ret void
				; CHECK: bb3:
				; CHECK-NEXT: store i8 0, i8* [[P2]]
				; CHECK-NEXT: ret void
				;
				%P2 = bitcast i32* %P to i8*
				store i32 0, i32* %P
				br i1 true, label %bb1, label %bb2
				bb1:
				br label %bb3
				bb2:
				ret void
				bb3:
				store i8 0, i8* %P2
				ret void
				}


				define void @test13(i32* noalias %P) {
				; CHECK-LABEL: @test13(
				; CHECK-NEXT: entry:
				; CHECK-NEXT: br label [[FOR:%.*]]
				; CHECK: for:
				; CHECK-NEXT: store i32 0, i32* [[P:%.*]]
				; CHECK-NEXT: br i1 false, label [[FOR]], label [[END:%.*]]
				; CHECK: end:
				; CHECK-NEXT: ret void
				;
				entry:
				br label %for
				for:
				store i32 0, i32* %P
				br i1 false, label %for, label %end
				end:
				ret void
				}


				define void @test14(i32* noalias %P) {
				; CHECK-LABEL: @test14(
				; CHECK-NEXT: entry:
				; CHECK-NEXT: br label [[FOR:%.*]]
				; CHECK: for:
				; CHECK-NEXT: store i32 0, i32* [[P:%.*]]
				; CHECK-NEXT: br i1 false, label [[FOR]], label [[END:%.*]]
				; CHECK: end:
				; CHECK-NEXT: ret void
				;
				entry:
				store i32 1, i32* %P
				br label %for
				for:
				store i32 0, i32* %P
				br i1 false, label %for, label %end
				end:
				ret void
				}


				define void @test15(i32* noalias %P) {
				; CHECK-LABEL: @test15(
				; CHECK-NEXT: [[P2:%.]] = bitcast i32 [[P:%.]] to i8
				; CHECK-NEXT: br i1 true, label [[BB1:%.]], label [[BB3:%.]]
				; CHECK: bb1:
				; CHECK-NEXT: store i32 1, i32* [[P]]
				; CHECK-NEXT: br label [[BB3]]
				; CHECK: bb3:
				; CHECK-NEXT: store i8 2, i8* [[P2]]
				; CHECK-NEXT: ret void
				;
				%P2 = bitcast i32* %P to i8*
				br i1 true, label %bb1, label %bb3
				bb1:
				store i32 1, i32* %P
				br label %bb3
				bb3:
				store i8 2, i8* %P2
				ret void
				}


				define void @test16(i32* noalias %P) {
				; CHECK-LABEL: @test16(
				; CHECK-NEXT: [[P2:%.]] = bitcast i32 [[P:%.]] to i8
				; CHECK-NEXT: br i1 true, label [[BB1:%.]], label [[BB3:%.]]
				; CHECK: bb1:
				; CHECK-NEXT: br label [[BB3]]
				; CHECK: bb3:
				; CHECK-NEXT: call void @free(i8* [[P2]])
				; CHECK-NEXT: ret void
				;
				%P2 = bitcast i32* %P to i8*
				store i32 1, i32* %P
				br i1 true, label %bb1, label %bb3
				bb1:
				store i32 1, i32* %P
				br label %bb3
				bb3:
				call void @free(i8* %P2)
				ret void
				}


				define void @test17(i32* noalias %P) {
				; CHECK-LABEL: @test17(
				; CHECK-NEXT: [[P2:%.]] = bitcast i32 [[P:%.]] to i8
				; CHECK-NEXT: store i32 1, i32* [[P]]
				; CHECK-NEXT: br i1 true, label [[BB1:%.]], label [[BB3:%.]]
				; CHECK: bb1:
				; CHECK-NEXT: call void @unknown_func()
				; CHECK-NEXT: br label [[BB3]]
				; CHECK: bb3:
				; CHECK-NEXT: call void @free(i8* [[P2]])
				; CHECK-NEXT: ret void
				;
				%P2 = bitcast i32* %P to i8*
				store i32 1, i32* %P
				br i1 true, label %bb1, label %bb3
				bb1:
				call void @unknown_func()
				store i32 1, i32* %P
				br label %bb3
				bb3:
				call void @free(i8* %P2)
				ret void
				}


				define void @test18(i32* noalias %P) {
				; CHECK-LABEL: @test18(
				; CHECK-NEXT: entry:
				; CHECK-NEXT: [[P2:%.]] = bitcast i32 [[P:%.]] to i8
				; CHECK-NEXT: store i32 0, i32* [[P]]
				; CHECK-NEXT: br label [[FOR:%.*]]
				; CHECK: for:
				; CHECK-NEXT: store i8 1, i8* [[P2]]
				; CHECK-NEXT: [[X:%.]] = load i32, i32 [[P]]
				; CHECK-NEXT: store i8 2, i8* [[P2]]
				; CHECK-NEXT: br i1 false, label [[FOR]], label [[END:%.*]]
				; CHECK: end:
				; CHECK-NEXT: ret void
				;
				entry:
				%P2 = bitcast i32* %P to i8*
				store i32 0, i32* %P
				br label %for
				for:
				store i8 1, i8* %P2
				%x = load i32, i32* %P
				store i8 2, i8* %P2
				br i1 false, label %for, label %end
				end:
				ret void
				}


				define void @test19(i32* noalias %P) {
				; CHECK-LABEL: @test19(
				; CHECK-NEXT: entry:
				; CHECK-NEXT: [[ARRAYIDX0:%.]] = getelementptr inbounds i32, i32 [[P:%.*]], i64 1
				; CHECK-NEXT: [[P3:%.]] = bitcast i32 [[ARRAYIDX0]] to i8*
				; CHECK-NEXT: call void @llvm.memset.p0i8.i64(i8* align 4 [[P3]], i8 0, i64 28, i1 false)
				; CHECK-NEXT: br i1 true, label [[BB1:%.]], label [[BB2:%.]]
				; CHECK: bb1:
				; CHECK-NEXT: br label [[BB3:%.*]]
				; CHECK: bb2:
				; CHECK-NEXT: [[ARRAYIDX1:%.]] = getelementptr inbounds i32, i32 [[P]], i64 1
				; CHECK-NEXT: store i32 1, i32* [[ARRAYIDX1]], align 4
				; CHECK-NEXT: br label [[BB3]]
				; CHECK: bb3:
				; CHECK-NEXT: ret void
				;
				entry:
				%arrayidx0 = getelementptr inbounds i32, i32* %P, i64 1
				%p3 = bitcast i32* %arrayidx0 to i8*
				call void @llvm.memset.p0i8.i64(i8* %p3, i8 0, i64 28, i32 4, i1 false)
				br i1 true, label %bb1, label %bb2
				bb1:
				br label %bb3
				bb2:
				%arrayidx1 = getelementptr inbounds i32, i32* %P, i64 1
				store i32 1, i32* %arrayidx1, align 4
				br label %bb3
				bb3:
				ret void
				}


				define void @test20(i32* noalias %P) {
				; CHECK-LABEL: @test20(
				; CHECK-NEXT: entry:
				; CHECK-NEXT: [[ARRAYIDX0:%.]] = getelementptr inbounds i32, i32 [[P:%.*]], i64 1
				; CHECK-NEXT: [[P3:%.]] = bitcast i32 [[ARRAYIDX0]] to i8*
				; CHECK-NEXT: [[TMP0:%.]] = getelementptr inbounds i8, i8 [[P3]], i64 4
				; CHECK-NEXT: call void @llvm.memset.p0i8.i64(i8* align 4 [[TMP0]], i8 0, i64 24, i1 false)
				; CHECK-NEXT: br i1 true, label [[BB1:%.]], label [[BB2:%.]]
				; CHECK: bb1:
				; CHECK-NEXT: br label [[BB3:%.*]]
				; CHECK: bb2:
				; CHECK-NEXT: br label [[BB3]]
				; CHECK: bb3:
				; CHECK-NEXT: [[ARRAYIDX1:%.]] = getelementptr inbounds i32, i32 [[P]], i64 1
				; CHECK-NEXT: store i32 1, i32* [[ARRAYIDX1]], align 4
				; CHECK-NEXT: ret void
				;
				entry:
				%arrayidx0 = getelementptr inbounds i32, i32* %P, i64 1
				%p3 = bitcast i32* %arrayidx0 to i8*
				call void @llvm.memset.p0i8.i64(i8* %p3, i8 0, i64 28, i32 4, i1 false)
				br i1 true, label %bb1, label %bb2
				bb1:
				br label %bb3
				bb2:
				br label %bb3
				bb3:
				%arrayidx1 = getelementptr inbounds i32, i32* %P, i64 1
				store i32 1, i32* %arrayidx1, align 4
				ret void
				}


				define void @test21(i32* noalias %P) {
				; CHECK-LABEL: @test21(
				; CHECK-NEXT: entry:
				; CHECK-NEXT: [[ARRAYIDX0:%.]] = getelementptr inbounds i32, i32 [[P:%.*]], i64 1
				; CHECK-NEXT: [[P3:%.]] = bitcast i32 [[ARRAYIDX0]] to i8*
				; CHECK-NEXT: [[TMP0:%.]] = getelementptr inbounds i8, i8 [[P3]], i64 4
				; CHECK-NEXT: call void @llvm.memset.p0i8.i64(i8* align 4 [[TMP0]], i8 0, i64 24, i1 false)
				; CHECK-NEXT: br label [[FOR:%.*]]
				; CHECK: for:
				; CHECK-NEXT: [[ARRAYIDX1:%.]] = getelementptr inbounds i32, i32 [[P]], i64 1
				; CHECK-NEXT: store i32 1, i32* [[ARRAYIDX1]], align 4
				; CHECK-NEXT: br i1 false, label [[FOR]], label [[END:%.*]]
				; CHECK: end:
				; CHECK-NEXT: ret void
				;
				entry:
				%arrayidx0 = getelementptr inbounds i32, i32* %P, i64 1
				%p3 = bitcast i32* %arrayidx0 to i8*
				call void @llvm.memset.p0i8.i64(i8* %p3, i8 0, i64 28, i32 4, i1 false)
				br label %for
				for:
				%arrayidx1 = getelementptr inbounds i32, i32* %P, i64 1
				store i32 1, i32* %arrayidx1, align 4
				br i1 false, label %for, label %end
				end:
				ret void
				}


				define i32 @test22(i32* %P, i32* noalias %Q, i32* %R) {
				; CHECK-LABEL: @test22(
				; CHECK-NEXT: store i32 2, i32* [[P:%.*]]
				; CHECK-NEXT: store i32 3, i32* [[Q:%.*]]
				; CHECK-NEXT: [[L:%.]] = load i32, i32 [[R:%.*]]
				; CHECK-NEXT: ret i32 [[L]]
				;
				store i32 1, i32* %Q
				store i32 2, i32* %P
				store i32 3, i32* %Q
				%l = load i32, i32* %R
				ret i32 %l
				}


				define void @test23(i32* noalias %P) {
				; CHECK-LABEL: @test23(
				; CHECK-NEXT: br i1 true, label [[BB1:%.]], label [[BB2:%.]]
				; CHECK: bb1:
				; CHECK-NEXT: br label [[BB3:%.*]]
				; CHECK: bb2:
				; CHECK-NEXT: call void @unknown_func()
				; CHECK-NEXT: br label [[BB3]]
				; CHECK: bb3:
				; CHECK-NEXT: store i32 0, i32* [[P:%.*]]
				; CHECK-NEXT: ret void
				;
				br i1 true, label %bb1, label %bb2
				bb1:
				store i32 0, i32* %P
				br label %bb3
				bb2:
				call void @unknown_func()
				br label %bb3
				bb3:
				store i32 0, i32* %P
				ret void
				}


				define void @test24(i32* noalias %P) {
				; CHECK-LABEL: @test24(
				; CHECK-NEXT: br i1 true, label [[BB2:%.]], label [[BB1:%.]]
				; CHECK: bb1:
				; CHECK-NEXT: br label [[BB3:%.*]]
				; CHECK: bb2:
				; CHECK-NEXT: call void @unknown_func()
				; CHECK-NEXT: br label [[BB3]]
				; CHECK: bb3:
				; CHECK-NEXT: store i32 0, i32* [[P:%.*]]
				; CHECK-NEXT: ret void
				;
				br i1 true, label %bb2, label %bb1
				bb1:
				store i32 0, i32* %P
				br label %bb3
				bb2:
				call void @unknown_func()
				br label %bb3
				bb3:
				store i32 0, i32* %P
				ret void
				}


				define void @test25(i32* noalias %P) {
				; CHECK-LABEL: @test25(
				; CHECK-NEXT: store i32 1, i32* [[P:%.*]]
				; CHECK-NEXT: [[P2:%.]] = bitcast i32 [[P]] to i8*
				; CHECK-NEXT: br i1 true, label [[BB2:%.]], label [[BB1:%.]]
				; CHECK: bb1:
				; CHECK-NEXT: br label [[BB3:%.*]]
				; CHECK: bb2:
				; CHECK-NEXT: call void @free(i8* [[P2]])
				; CHECK-NEXT: ret void
				; CHECK: bb3:
				; CHECK-NEXT: ret void
				;
				store i32 1, i32* %P
				%P2 = bitcast i32* %P to i8*
				br i1 true, label %bb2, label %bb1
				bb1:
				br label %bb3
				bb2:
				call void @free(i8* %P2)
				ret void
				bb3:
				ret void
				}


				define i8* @test26() {
				; CHECK-LABEL: @test26(
				; CHECK-NEXT: bb1:
				; CHECK-NEXT: br i1 true, label [[BB2:%.]], label [[BB3:%.]]
				; CHECK: bb2:
				; CHECK-NEXT: [[M:%.]] = call noalias i8 @malloc(i32 10)
				; CHECK-NEXT: store i8 1, i8* [[M]]
				; CHECK-NEXT: br label [[BB3]]
				; CHECK: bb3:
				; CHECK-NEXT: [[R:%.]] = phi i8 [ null, [[BB1:%.*]] ], [ [[M]], [[BB2]] ]
				; CHECK-NEXT: ret i8* [[R]]
				;
				bb1:
				br i1 true, label %bb2, label %bb3
				bb2:
				%m = call noalias i8* @malloc(i32 10)
				store i8 1, i8* %m
				br label %bb3
				bb3:
				%r = phi i8* [ null, %bb1 ], [ %m, %bb2 ]
				ret i8* %r
				}


				define void @test27() {
				; CHECK-LABEL: @test27(
				; CHECK-NEXT: bb1:
				; CHECK-NEXT: br i1 true, label [[BB2:%.]], label [[BB3:%.]]
				; CHECK: bb2:
				; CHECK-NEXT: [[M:%.]] = call noalias i8 @malloc(i32 10)
				; CHECK-NEXT: br label [[BB3]]
				; CHECK: bb3:
				; CHECK-NEXT: [[R:%.]] = phi i8 [ null, [[BB1:%.*]] ], [ [[M]], [[BB2]] ]
				; CHECK-NEXT: ret void
				;
				bb1:
				br i1 true, label %bb2, label %bb3
				bb2:
				%m = call noalias i8* @malloc(i32 10)
				store i8 1, i8* %m
				br label %bb3
				bb3:
				%r = phi i8* [ null, %bb1 ], [ %m, %bb2 ]
				ret void
				}


				define i8* @test28() {
				; CHECK-LABEL: @test28(
				; CHECK-NEXT: bb0:
				; CHECK-NEXT: [[M:%.]] = call noalias i8 @malloc(i32 10)
				; CHECK-NEXT: [[MC0:%.]] = bitcast i8 [[M]] to i8*
				; CHECK-NEXT: [[MC1:%.]] = bitcast i8 [[MC0]] to i8*
				; CHECK-NEXT: [[MC2:%.]] = bitcast i8 [[MC1]] to i8*
				; CHECK-NEXT: [[MC3:%.]] = bitcast i8 [[MC2]] to i8*
				; CHECK-NEXT: [[MC4:%.]] = bitcast i8 [[MC3]] to i8*
				; CHECK-NEXT: [[MC5:%.]] = bitcast i8 [[MC4]] to i8*
				; CHECK-NEXT: [[MC6:%.]] = bitcast i8 [[MC5]] to i8*
				; CHECK-NEXT: [[M0:%.]] = bitcast i8 [[MC6]] to i8*
				; CHECK-NEXT: store i8 2, i8* [[M]]
				; CHECK-NEXT: ret i8* [[M0]]
				;
				bb0:
				%m = call noalias i8* @malloc(i32 10)
				%mc0 = bitcast i8* %m to i8*
				%mc1 = bitcast i8* %mc0 to i8*
				%mc2 = bitcast i8* %mc1 to i8*
				%mc3 = bitcast i8* %mc2 to i8*
				%mc4 = bitcast i8* %mc3 to i8*
				%mc5 = bitcast i8* %mc4 to i8*
				%mc6 = bitcast i8* %mc5 to i8*
				%m0 = bitcast i8* %mc6 to i8*
				store i8 2, i8* %m
				ret i8* %m0
				}


				define void @test_loop(i32 %N, i32* noalias nocapture readonly %A, i32* noalias nocapture readonly %x, i32* noalias nocapture %b) local_unnamed_addr {
				; CHECK-LABEL: @test_loop(
				; CHECK-NEXT: entry:
				; CHECK-NEXT: [[CMP27:%.]] = icmp sgt i32 [[N:%.]], 0
				; CHECK-NEXT: br i1 [[CMP27]], label [[FOR_BODY4_LR_PH_PREHEADER:%.]], label [[FOR_COND_CLEANUP:%.]]
				; CHECK: for.body4.lr.ph.preheader:
				; CHECK-NEXT: br label [[FOR_BODY4_LR_PH:%.*]]
				; CHECK: for.cond.cleanup:
				; CHECK-NEXT: ret void
				; CHECK: for.body4.lr.ph:
				; CHECK-NEXT: [[I_028:%.]] = phi i32 [ [[INC11:%.]], [[FOR_COND_CLEANUP3:%.*]] ], [ 0, [[FOR_BODY4_LR_PH_PREHEADER]] ]
				; CHECK-NEXT: [[ARRAYIDX:%.]] = getelementptr inbounds i32, i32 [[B:%.*]], i32 [[I_028]]
				; CHECK-NEXT: [[MUL:%.*]] = mul nsw i32 [[I_028]], [[N]]
				; CHECK-NEXT: br label [[FOR_BODY4:%.*]]
				; CHECK: for.cond.cleanup3:
				; CHECK-NEXT: store i32 [[ADD9:%.]], i32 [[ARRAYIDX]], align 4
				; CHECK-NEXT: [[INC11]] = add nuw nsw i32 [[I_028]], 1
				; CHECK-NEXT: [[EXITCOND29:%.*]] = icmp eq i32 [[INC11]], [[N]]
				; CHECK-NEXT: br i1 [[EXITCOND29]], label [[FOR_COND_CLEANUP]], label [[FOR_BODY4_LR_PH]]
				; CHECK: for.body4:
				; CHECK-NEXT: [[TMP0:%.*]] = phi i32 [ 0, [[FOR_BODY4_LR_PH]] ], [ [[ADD9]], [[FOR_BODY4]] ]
				; CHECK-NEXT: [[J_026:%.]] = phi i32 [ 0, [[FOR_BODY4_LR_PH]] ], [ [[INC:%.]], [[FOR_BODY4]] ]
				; CHECK-NEXT: [[ADD:%.*]] = add nsw i32 [[J_026]], [[MUL]]
				; CHECK-NEXT: [[ARRAYIDX5:%.]] = getelementptr inbounds i32, i32 [[A:%.*]], i32 [[ADD]]
				; CHECK-NEXT: [[TMP1:%.]] = load i32, i32 [[ARRAYIDX5]], align 4
				; CHECK-NEXT: [[ARRAYIDX6:%.]] = getelementptr inbounds i32, i32 [[X:%.*]], i32 [[J_026]]
				; CHECK-NEXT: [[TMP2:%.]] = load i32, i32 [[ARRAYIDX6]], align 4
				; CHECK-NEXT: [[MUL7:%.*]] = mul nsw i32 [[TMP2]], [[TMP1]]
				; CHECK-NEXT: [[ADD9]] = add nsw i32 [[MUL7]], [[TMP0]]
				; CHECK-NEXT: [[INC]] = add nuw nsw i32 [[J_026]], 1
				; CHECK-NEXT: [[EXITCOND:%.*]] = icmp eq i32 [[INC]], [[N]]
				; CHECK-NEXT: br i1 [[EXITCOND]], label [[FOR_COND_CLEANUP3]], label [[FOR_BODY4]]
				;
				entry:
				%cmp27 = icmp sgt i32 %N, 0
				br i1 %cmp27, label %for.body4.lr.ph.preheader, label %for.cond.cleanup

				for.body4.lr.ph.preheader: ; preds = %entry
				br label %for.body4.lr.ph

				for.cond.cleanup: ; preds = %for.cond.cleanup3, %entry
				ret void

				for.body4.lr.ph: ; preds = %for.body4.lr.ph.preheader, %for.cond.cleanup3
				%i.028 = phi i32 [ %inc11, %for.cond.cleanup3 ], [ 0, %for.body4.lr.ph.preheader ]
				%arrayidx = getelementptr inbounds i32, i32* %b, i32 %i.028
				store i32 0, i32* %arrayidx, align 4
				%mul = mul nsw i32 %i.028, %N
				br label %for.body4

				for.cond.cleanup3: ; preds = %for.body4
				store i32 %add9, i32* %arrayidx, align 4
				%inc11 = add nuw nsw i32 %i.028, 1
				%exitcond29 = icmp eq i32 %inc11, %N
				br i1 %exitcond29, label %for.cond.cleanup, label %for.body4.lr.ph

				for.body4: ; preds = %for.body4, %for.body4.lr.ph
				%0 = phi i32 [ 0, %for.body4.lr.ph ], [ %add9, %for.body4 ]
				%j.026 = phi i32 [ 0, %for.body4.lr.ph ], [ %inc, %for.body4 ]
				%add = add nsw i32 %j.026, %mul
				%arrayidx5 = getelementptr inbounds i32, i32* %A, i32 %add
				%1 = load i32, i32* %arrayidx5, align 4
				%arrayidx6 = getelementptr inbounds i32, i32* %x, i32 %j.026
				%2 = load i32, i32* %arrayidx6, align 4
				%mul7 = mul nsw i32 %2, %1
				%add9 = add nsw i32 %mul7, %0
				%inc = add nuw nsw i32 %j.026, 1
				%exitcond = icmp eq i32 %inc, %N
				br i1 %exitcond, label %for.cond.cleanup3, label %for.body4
				}

llvm/test/Transforms/DeadStoreElimination/simple.ll

Show First 20 Lines • Show All 889 Lines • ▼ Show 20 Lines	;

tail call void @llvm.memcpy.element.unordered.atomic.p0i8.p0i8.i64(i8* align 1 %P, i8* align 1 %Q, i64 12, i32 1)		tail call void @llvm.memcpy.element.unordered.atomic.p0i8.p0i8.i64(i8* align 1 %P, i8* align 1 %Q, i64 12, i32 1)
tail call void @llvm.memcpy.element.unordered.atomic.p0i8.p0i8.i64(i8* align 1 %P, i8* align 1 %R, i64 8, i32 1)		tail call void @llvm.memcpy.element.unordered.atomic.p0i8.p0i8.i64(i8* align 1 %P, i8* align 1 %R, i64 8, i32 1)
ret void		ret void
}		}

declare void @llvm.memmove.p0i8.p0i8.i64(i8* nocapture, i8* nocapture readonly, i64, i1)		declare void @llvm.memmove.p0i8.p0i8.i64(i8* nocapture, i8* nocapture readonly, i64, i1)
declare void @llvm.memmove.element.unordered.atomic.p0i8.p0i8.i64(i8* nocapture, i8* nocapture readonly, i64, i32)		declare void @llvm.memmove.element.unordered.atomic.p0i8.p0i8.i64(i8* nocapture, i8* nocapture readonly, i64, i32)

		declare void @llvm.lifetime.start.p0i8(i64, i8* nocapture) nounwind
		declare void @llvm.lifetime.end.p0i8(i64, i8* nocapture) nounwind
		define void @test40(i32** noalias %Pp, i32* noalias %Q) {
		; CHECK-LABEL: @test40(
		; CHECK-NEXT: entry:
		; CHECK-NEXT: [[A:%.*]] = alloca i32, align 4
		; CHECK-NEXT: [[AC:%.]] = bitcast i32 [[A]] to i8*
		; CHECK-NEXT: call void @llvm.lifetime.start.p0i8(i64 4, i8* nonnull [[AC]])
		; CHECK-NEXT: [[TMP0:%.]] = bitcast i32* [[PP:%.]] to i8*
		; CHECK-NEXT: [[PC:%.]] = load i8, i8** [[TMP0]], align 8
		; CHECK-NEXT: [[QC:%.]] = bitcast i32 [[Q:%.]] to i8
		; CHECK-NEXT: call void @llvm.memcpy.p0i8.p0i8.i64(i8* nonnull align 4 [[AC]], i8* align 4 [[QC]], i64 4, i1 false)
		; CHECK-NEXT: call void @llvm.memcpy.p0i8.p0i8.i64(i8* align 4 [[PC]], i8* nonnull align 4 [[AC]], i64 4, i1 true)
		; CHECK-NEXT: call void @llvm.lifetime.end.p0i8(i64 4, i8* nonnull [[AC]])
		; CHECK-NEXT: ret void
		;
		entry:
		%A = alloca i32, align 4
		%Ac = bitcast i32* %A to i8*
		call void @llvm.lifetime.start.p0i8(i64 4, i8* nonnull %Ac)
		%0 = bitcast i32 %Pp to i8
		%Pc = load i8, i8* %0, align 8
		%Qc = bitcast i32* %Q to i8*
		call void @llvm.memcpy.p0i8.p0i8.i64(i8* nonnull align 4 %Ac, i8* align 4 %Qc, i64 4, i1 false)
		call void @llvm.memcpy.p0i8.p0i8.i64(i8* align 4 %Pc, i8* nonnull align 4 %Ac, i64 4, i1 true)
		call void @llvm.lifetime.end.p0i8(i64 4, i8* nonnull %Ac)
		ret void
		}

		; I think this case is currently handled incorrectly by memdeps dse
		; throwing should leave store i32 1, not remove from the free.
		declare void @free(i8* nocapture)
		define void @test41(i32* noalias %P) {
		; NOCHECK-LABEL: @test41(
		; NOCHECK-NEXT: [[P2:%.]] = bitcast i32 [[P:%.]] to i8
		; NOCHECK-NEXT: store i32 1, i32* [[P]]
		; NOCHECK-NEXT: call void @unknown_func()
		; NOCHECK-NEXT: call void @free(i8* [[P2]])
		; NOCHECK-NEXT: ret void
		;
		%P2 = bitcast i32* %P to i8*
		store i32 1, i32* %P
		call void @unknown_func()
		store i32 2, i32* %P
		call void @free(i8* %P2)
		ret void
		}

		define void @test42(i32* %P, i32* %Q) {
		; NOCHECK-LABEL: @test42(
		; NOCHECK-NEXT: store i32 1, i32* [[P:%.*]]
		; NOCHECK-NEXT: [[P2:%.]] = bitcast i32 [[P]] to i8*
		; NOCHECK-NEXT: store i32 2, i32* [[Q:%.*]]
		; NOCHECK-NEXT: store i8 3, i8* [[P2]]
		; NOCHECK-NEXT: ret void
		;
		store i32 1, i32* %P
		%P2 = bitcast i32* %P to i8*
		store i32 2, i32* %Q
		store i8 3, i8* %P2
		ret void
		}

		define void @test42a(i32* %P, i32* %Q) {
		; NOCHECK-LABEL: @test42a(
		; NOCHECK-NEXT: store atomic i32 1, i32* [[P:%.*]] unordered, align 4
		; NOCHECK-NEXT: [[P2:%.]] = bitcast i32 [[P]] to i8*
		; NOCHECK-NEXT: store atomic i32 2, i32* [[Q:%.*]] unordered, align 4
		; NOCHECK-NEXT: store atomic i8 3, i8* [[P2]] unordered, align 4
		; NOCHECK-NEXT: ret void
		;
		store atomic i32 1, i32* %P unordered, align 4
		%P2 = bitcast i32* %P to i8*
		store atomic i32 2, i32* %Q unordered, align 4
		store atomic i8 3, i8* %P2 unordered, align 4
		ret void
		}

		define void @test43(i32* %P, i32* noalias %Q) {
		; CHECK-LABEL: @test43(
		; CHECK-NEXT: entry:
		; CHECK-NEXT: store i32 50331649, i32* [[P:%.*]]
		; CHECK-NEXT: store i32 2, i32* [[Q:%.*]]
		; CHECK-NEXT: ret void
		;
		entry:
		store i32 1, i32* %P
		%P2 = bitcast i32* %P to i8*
		store i32 2, i32* %Q
		store i8 3, i8* %P2
		ret void
		}

		define void @test43a(i32* %P, i32* noalias %Q) {
		; CHECK-LABEL: @test43a(
		; CHECK-NEXT: entry:
		; CHECK-NEXT: store atomic i32 50331649, i32* [[P:%.*]] unordered, align 4
		; CHECK-NEXT: store atomic i32 2, i32* [[Q:%.*]] unordered, align 4
		; CHECK-NEXT: ret void
		;
		entry:
		store atomic i32 1, i32* %P unordered, align 4
		%P2 = bitcast i32* %P to i8*
		store atomic i32 2, i32* %Q unordered, align 4
		store atomic i8 3, i8* %P2 unordered, align 4
		ret void
		}