This is an archive of the discontinued LLVM Phabricator instance.

Paths

Table of Contentst

-
llvm/
-
lib/Transforms/Scalar/
-
Transforms/
-
Scalar/
29/33
DeadStoreElimination.cpp
-
test/Transforms/DeadStoreElimination/MSSA/
-
Transforms/
-
DeadStoreElimination/
-
MSSA/
-
PartialStore.ll
-
fence-todo.ll
-
fence.ll
-
memcpy-complete-overwrite.ll
-
memcpy-lifetimes.ll
-
memoryssa-scan-limit.ll
-
memset-and-memcpy.ll
-
multiblock-captures.ll
-
multiblock-exceptions.ll
-
multiblock-loops.ll
-
multiblock-malloc-free.ll
-
multiblock-memintrinsics.ll
-
multiblock-memoryphis.ll
-
multiblock-partial.ll
-
multiblock-simple.ll
-
multiblock-throwing.ll
-
simple.ll

Differential D72700

[DSE] Add first version of MemorySSA-backed DSE (Bottom up walk).
ClosedPublic

Authored by fhahn on Jan 14 2020, 6:12 AM.

Download Raw Diff

Details

Reviewers

dmgreen
rnk
efriedma
bryant
asbirlea
Tyker

Commits

rGd0c4d4fe0929: [DSE] Add first version of MemorySSA-backed DSE (Bottom up walk).

Summary

This patch adds a first version of a MemorySSA based DSE. It is missing
a lot of features, which will get added as follow-ups, to help to keep
the review manageable.

The patch uses the following general approach: given a MemoryDef, walk
upwards to find clobbering MemoryDefs that may be killed by the
starting def. Then check that there are no uses that may read the
location of the original MemoryDef in between both MemoryDefs. A bit
more concretely:

For all MemoryDefs StartDef:

Get the next dominating clobbering MemoryDef (DomAccess) by walking upwards.
Check that there no reads between DomAccess and the StartDef by checking all uses starting at DomAccess and walking until we see StartDef.
For each found DomDef, check that:
1. There are no barrier instructions between DomDef and StartDef (like throws or stores with ordering constraints).
2. StartDef is executed whenever DomDef is executed.
StartDef completely overwrites DomDef.
Erase DomDef from the function and MemorySSA.

The patch uses a very simple approach to guarantee that no throwing
instructions are between 2 stores: We only allow accesses to stack
objects, access that are in the same basic block if the block does not
contain any throwing instructions or accesses in functions that do
not contain any throwing instructions. This will get lifted later.

Besides adding support for the missing cases, there is plenty of additional
potential for improvements as follow-up work, e.g. the way we visit stores
(could be just a traversal of the MemorySSA, rather than collecting them
up-front), using the alias information discovered during walking to optimize
the MemorySSA.

This is loosely based on D40480 by Dave Green.

Diff Detail

Repository: rG LLVM Github Monorepo

Event Timeline

fhahn created this revision.Jan 14 2020, 6:12 AM

Herald added a project: Restricted Project. · View Herald TranscriptJan 14 2020, 6:12 AM

Herald added subscribers: asbirlea, jfb, george.burgess.iv and 2 others. · View Herald Transcript

Unit tests: unknown.

clang-tidy: unknown.

clang-format: unknown.

Build artifacts: diff.json, console-log.txt

Harbormaster failed remote builds in B43933: Diff 237958!Jan 14 2020, 6:17 AM

fhahn mentioned this in D72146: [DSE] Add first version of MemorySSA-backed DSE..Jan 14 2020, 6:31 AM

asbirlea added inline comments.Jan 14 2020, 12:37 PM

llvm/lib/Transforms/Scalar/DeadStoreElimination.cpp
1472	MemorySSA should handle this, if it doesn't we should fix that.
1536	Nit: Use only DomDef, known to be non-null outside the loop. Restrict scope of DomAccess to inside the loop above.
1539	Walking uses may miss cases due to aliasing not being transitive. This needs to be throughly analyzed. Here's a very rough example. 1 = Def(LoE) ; <----- DomDef stores [0,3] 2 = Def(1) ; (2, 1)=NoAlias, stores [4,7] Use(2) ; MayAlias 2 and 1, the Use points to the first Def it may alias, loads [0, 7]. 3 = Def(1) ; <---- Current (3, 2)=NoAlias, (3,1)=MayAlias, stores [0,3] The situation may be simplified due to handling stores, but all Uses may need looking at. Note this will not work to recurse on uses of the uses either. Rough example why: 1 = Def(LoE) 2 = Def(1) ; <----- DomDef 3 = Def(1) ; (3, 2)=NoAlias Use(3) ; MayAlias 3 and 2, the Use points to the first Def it may alias. 4 = Def(2) ; <---- Current (4, 3)=NoAlias, (4,2)=MayAlias
1541	You're right this can be partially lifted. Here's an example: %a: 1 = Def(LoE) ; <----- DomDef br %cond1, %b, %c %b: 2 = Def(1) br %cond2, %d, %e %c: br %e %d: br %f %e: 3 = phi(1,2) br %f %f: 4 = Phi(2,3) 5 = Def(4) ; <---- Current
1744	I don't see how this being in a loop will work. Shouldn't this be a "give up"? Example: ; 1 = MemoryDef (LoE) store a ; 2 = MemoryDef(1) call_reading_a_and_overwriting_a ; 3 = MemoryDef(2) store a Once the getClobbering found a mayAlias of 3 with 2, even if an overwrite is not proven, def 1 cannot be removed. I may have missed such a check in `getDomMemoryDef`.

I did a few experiment with bottom-up algorithm before the patch i showed on phabricator. my implementation of the bottom-up had similar average complie-time to the current pass on the test-suite but it was only barely removing more stores than the current pass.
i gave up on it because i didn't found a good way forward to make it deal with cases like to the following:

   store i32 0, i32* %Ptr.  ; DEAD
   br i1 %cond, label %a, label %b
a:
  store i32 2, i32* %Ptr
   br label %c
b: 
  store i32 2, i32* %Ptr
  br label %c

which is IMO definitely something we want to do. by the way gcc does this optimization. https://godbolt.org/z/VFBf-c
maybe there is a bottom-up way to deal with this that i didn't thought about. any thought ?

would you be interested in a port of my top-down algorithm from D72182 to work with this patch series's framework ?
i have gotten since i wrote it many idea of how to improve it mostly on the compile-time side.

In D72700#1821571, @Tyker wrote:
I did a few experiment with bottom-up algorithm before the patch i showed on phabricator. my implementation of the bottom-up had similar average complie-time to the current pass on the test-suite but it was only barely removing more stores than the current pass.
i gave up on it because i didn't found a good way forward to make it deal with cases like to the following:
   store i32 0, i32* %Ptr.  ; DEAD
   br i1 %cond, label %a, label %b
a:
  store i32 2, i32* %Ptr
   br label %c
b: 
  store i32 2, i32* %Ptr
  br label %c
which is IMO definitely something we want to do. by the way gcc does this optimization. https://godbolt.org/z/VFBf-c
maybe there is a bottom-up way to deal with this that i didn't thought about. any thought ?

Yes I think we should make sure to cover this case. One approach could be to optimize the defining accesses of the accesses we start our bottom-up walks with. After visiting all MemoryDefs, to detect cases like the one above, we would just have to look for MemoryDefs D with multiple MemoryDefs as users (which we could probably collect along the way). Then we would just have to check that D is not read in between. Does that sound sensible?

would you be interested in a port of my top-down algorithm from D72182 to work with this patch series's framework ?
i have gotten since i wrote it many idea of how to improve it mostly on the compile-time side.

Yes, but I think it would be good to settle the question bottom-up vs top-down.

fhahn marked 2 inline comments as done.Jan 15 2020, 6:44 AM

fhahn added inline comments.

llvm/lib/Transforms/Scalar/DeadStoreElimination.cpp
1539	Thanks for the example. I think I might be missing something for the optimized version though. From the example in MemorySSA.h: /// Given this form, all the stores that could ever effect the load at %8 can be /// gotten by using the MemoryUse associated with it, and walking from use to /// def until you hit the top of the function. From the comment above, shouldn't we visit both 2 and 3 when walking up from Use(3), as both may change the read location? In the example we would only see 3 and 1. But even assuming we would visit both 2 and 3, I think I can see how we could end up with scenarios we could miss overlapping reads. I think we would have to take a look at all users of DomDef and we specifically cannot skip any non-aliasing MemoryDefs for the read-checks. That would make things more expensive, but would be something we have to do regardless of going bottom-up/top-down. Does that make sense to you? However I think that would mean that in the worst-case, we would have to do a top-down walk similar to the general top-down approach for the read checks.
1744	I had another look at the getClobberingMemoryAccess, and we indeed need an additional check ensuring that the memorydef does not also read the original location!

Regarding handling the branching example, the solution fhahn proposes sounds reasonable to me.
I was thinking something similar, along the lines of: do the checks bottom-up from both second and third stores during the main algorithm, both find the dead store but do not postdominate it; don't abandon them, but keep this info (include the checks for no intervening reads) for after the main search. The data structure could be a Hashmap<PotentialDeadStore, ListOfDefsOrBlocksWhoFoundThisPotentialDeadStore>. If all paths are covered in the List for a PotentialDeadStore, then it's truly dead.

llvm/lib/Transforms/Scalar/DeadStoreElimination.cpp
1539	I think I know what I missed. MemoryDefs keep two fields, the defining and the optimized access. So `3 = Def(1)` actually looks like this: `3 = Def(2) - Optimized 1` , and is a user of both 1 and 2. Yes, I think you're right that the read checks for all accesses are needed regardless of which approach is taken. And yes, it will be expensive. I came across something similar in LICM, and I limited or outright avoided analyzing all uses against a store to avoid the cost of analyzing all of them (see ~LICM.cpp:1300)

In D72700#1822594, @asbirlea wrote:

Regarding handling the branching example, the solution fhahn proposes sounds reasonable to me.
I was thinking something similar, along the lines of: do the checks bottom-up from both second and third stores during the main algorithm, both find the dead store but do not postdominate it; don't abandon them, but keep this info (include the checks for no intervening reads) for after the main search. The data structure could be a Hashmap<PotentialDeadStore, ListOfDefsOrBlocksWhoFoundThisPotentialDeadStore>. If all paths are covered in the List for a PotentialDeadStore, then it's truly dead.

this seems like it should work.

by the way do you have any thought on adding all calls that may throw in the memory ssa graph even if there are specified to not access memory ?
being able to rely on this would simplify greatly and probably speedup any DSE algorithm. and it is a corner case, readnone functions that may throw can't occur in C++ and i expect most other languages to be the same.

In D72700#1824724, @Tyker wrote:

In D72700#1822594, @asbirlea wrote:

Regarding handling the branching example, the solution fhahn proposes sounds reasonable to me.
I was thinking something similar, along the lines of: do the checks bottom-up from both second and third stores during the main algorithm, both find the dead store but do not postdominate it; don't abandon them, but keep this info (include the checks for no intervening reads) for after the main search. The data structure could be a Hashmap<PotentialDeadStore, ListOfDefsOrBlocksWhoFoundThisPotentialDeadStore>. If all paths are covered in the List for a PotentialDeadStore, then it's truly dead.

this seems like it should work.

Sounds good! I'll update this patch in the next few days to improve the read-clobber checks as suggested. I'll update the follow on patches to work with the bottom-up approach as well.

Address comments, rework to correctly check for reads between Current and DomAccess by visiting all uses (including non-aliasing MemoryDefs) until we reach the original def.

There are some cases where we might visit more uses than necessary (e.g. if Current does not post-dominate DomAccess), but I think it is better to get a correct version and fairly complete version in and then tune for compile-time. I will also post a bunch of follow-up changes that implement various additional cases. I measured compile-time on X86 for the full patch series (covering a lot more cases) and without much tuning the worst compile-time difference is around ~1.5%.

It would be great if you could have another look.

Unit tests: unknown.

clang-tidy: unknown.

clang-format: unknown.

Build artifacts: diff.json, console-log.txt

Pre-merge checks is in beta. Report issue. Please join beta or enable it for your project.

Harbormaster failed remote builds in B45379: Diff 241544!Jan 30 2020, 12:34 PM

fhahn added reviewers: dmgreen, rnk, efriedma, bryant, asbirlea, Tyker.Jan 30 2020, 5:07 PM

fhahn edited the summary of this revision. (Show Details)

fhahn added a child revision: D72148: [DSE] Support traversing MemoryPhis..Jan 30 2020, 5:44 PM

fhahn mentioned this in D72628: [DSE] Move state for MemorySSA-drive DSE to DSEState..

In D72700#1824724, @Tyker wrote:

In D72700#1822594, @asbirlea wrote:

Regarding handling the branching example, the solution fhahn proposes sounds reasonable to me.
I was thinking something similar, along the lines of: do the checks bottom-up from both second and third stores during the main algorithm, both find the dead store but do not postdominate it; don't abandon them, but keep this info (include the checks for no intervening reads) for after the main search. The data structure could be a Hashmap<PotentialDeadStore, ListOfDefsOrBlocksWhoFoundThisPotentialDeadStore>. If all paths are covered in the List for a PotentialDeadStore, then it's truly dead.

this seems like it should work.

I've experimented with a different, more lightweight approach to support that case, which seems to fit more natural: if we insert MemoryUses that clobber all locations visible to the caller just before each exit, we should be able to cover this case directly with the read checks: D73763

DaniilSuchkov added a subscriber: DaniilSuchkov.Jan 30 2020, 9:14 PM

Some small comments.

llvm/lib/Transforms/Scalar/DeadStoreElimination.cpp
1463	s/modeled/modelled s/MemroySSA/MemorySSA
1465	Consider adding a cl::opt limit on the things you look to process in F. We've had this issue with generated code where a BB has order of thousands stores.
1587	Reduce scope of DomAccess to inside the do-while loop and use DomDef here, hence no need for the `if (isa<MemoryDef>(DomAccess))` check - see previous comment (please mark them as done when updating).
1592	SmallSetVector to avoid processing duplicates?
1643	Can this happen? Wouldn't the getClobbering above have found UseAccess /UseInst instead of DomDef?
1686	s/arn't/aren't

Address comments, thanks!

llvm/lib/Transforms/Scalar/DeadStoreElimination.cpp
1463	I'm not a native speaker, but I think modeled is the US spelling and modelled is the UK spelling. I thought we prefer the US spelling, but I don't mind either way.
1465	I've added a rather generous limit for Defs per BB, but we can always adjust it later. We still have to check the whole basic block for throwing instructions.
1472	it's handled by MemorySSA, I've dropped that.
1539	I've added a test for the scenario: overlapping_read in multiblock-simple.ll
1541	Restrictions related to MemoryPhis are lifted in D72148
1592	Done. I think in the initial version the only nodes we might add multiple times are MemoryPhis and we bail out on the first one we see. I've added such a test to llvm/test/Transforms/DeadStoreElimination/MSSA/multiblock-memoryphis.ll
1643	I think it can happen in loops where a store in a the header is killed by a store in the exit, but it is also stored to in the loop body. I've added a few additional test cases to llvm/test/Transforms/DeadStoreElimination/MSSA/multiblock-loops.ll loop_multiple_def* . I've also moved the post dominance check in into the function and also updated the check here to skip PushMemUses for MemoryDefs that completely overwrite DefLoc. This helps with avoiding unnecessary checks.

Unit tests: unknown.

clang-tidy: unknown.

clang-format: unknown.

Build artifacts: diff.json, console-log.txt

Pre-merge checks is in beta. Report issue. Please join beta or enable it for your project.

Harbormaster failed remote builds in B45591: Diff 242070!Feb 3 2020, 7:44 AM

asbirlea marked 4 inline comments as done.Feb 3 2020, 11:48 AM

asbirlea added inline comments.

llvm/lib/Transforms/Scalar/DeadStoreElimination.cpp
1463	Yes, you're right, I'm not a native speaker either. Thank you for the correction!
1465	sgtm
1592	I think when you process uses, you may find 2 different operands for a Def, if that Def is optimized. Something like: 1 = Def(LoE) 2 = Def (1) 3 = Def (2) - Optimized field = 1 So if adding uses of 1, one may add (2, 3), then when processing 2, (3) is a use and added again. This may be too convoluted to actually happen, but thought it's worth to have the safety net. Thank you for the update!
1643	I see, yes, I wasn't thinking with the MemoryPhi condition lifted.
1679	s/it's/its

Change it's -> its and regenerate check lines.

llvm/lib/Transforms/Scalar/DeadStoreElimination.cpp
1592	Thanks for the example! That should be handled now :)
1679	Updated, thanks!

fhahn mentioned this in D72148: [DSE] Support traversing MemoryPhis..Feb 3 2020, 1:36 PM

Unit tests: unknown.

clang-tidy: unknown.

clang-format: unknown.

Build artifacts: diff.json, console-log.txt

Pre-merge checks is in beta. Report issue. Please join beta or enable it for your project.

Harbormaster failed remote builds in B45622: Diff 242172!Feb 3 2020, 1:52 PM

Some more minor comments, but I think this is a reasonable first version to check in.

llvm/lib/Transforms/Scalar/DeadStoreElimination.cpp
1378	s/there/there are
1518	Can you add a comment here, along the lines: `UseInst has a MemoryDef associated in MemorySSA. It's possible for a MemoryDef to not write to memory, e.g. a volatile load is modeled as a MemoryDef.`
1687	The MemDep variant of DSE also attempts to keep debug info. Does this also make sense here? // Try to preserve debug information attached to the dead instruction. salvageDebugInfo(*DeadInst);
1764	Nit: Move declaration of Next inside while condition?

Just curious..

Can you compare this solution vs. GCC's solution vs. PDSE (https://reviews.llvm.org/D29866)?

Addressed latest comments, thanks!

In D72700#1857699, @xbolva00 wrote:

Just curious..

Can you compare this solution vs. GCC's solution vs. PDSE (https://reviews.llvm.org/D29866)?

I unfortunately do not have in-depth knowledge of either, but I think GCC uses a similar approach (I think GCC's virtual operands are similar to MemorySSA).

IIUC PDSE tries to remove partially redundant stores, by inserting/moving stores to split points. I think that is mostly orthogonal to the MemorySSA-backed DSE in this patch, although there might be some overlap in the handled cases.

fhahn added inline comments.Feb 4 2020, 1:46 PM

llvm/lib/Transforms/Scalar/DeadStoreElimination.cpp
1687	Yes, we should definitely do this! I've updated it and there's a debug info test that passes now.

Unit tests: unknown.

clang-tidy: unknown.

clang-format: unknown.

Build artifacts: diff.json, console-log.txt

Pre-merge checks is in beta. Report issue. Please join beta or enable it for your project.

Harbormaster failed remote builds in B45717: Diff 242423!Feb 4 2020, 1:55 PM

In D72700#1857691, @asbirlea wrote:

Some more minor comments, but I think this is a reasonable first version to check in.

That's great, thanks for all the feedback!

I do think there are outstanding issues that need answers, but I believe the way to make progress is to have an initial good version and iterate on it.

The major issue is performance, and to start testing this out we need a working version in tree.

There are many MemorySSA and AA calls in this variant, that we may be able to do better. For example: build a walker to do a single getClobbering with all the preconditions on what instructions are safe to skip, instead of doing this in a loop here.

Add all missing cases from add-on patches and get parity or better stores eliminated than current DSE. Merge tests when this happens to make this clear.
The performance problem's scope goes beyond DSE. The current pass pipeline for both pass managers has a sequence of (gvn, memcpyopt, dse), where all of these use MemDepAnalysis. Switching DSE to MemorySSA may initially get worse compile times, as we need to build both MemDepAnalysis and MemorySSA, but switching all three (use newgvn and port memcpyopt and dse), may be worthwhile. This is the long term goal I have in mind.

Thanks for all the work on this!

This revision is now accepted and ready to land.Feb 7 2020, 3:06 PM

Closed by commit rGd0c4d4fe0929: [DSE] Add first version of MemorySSA-backed DSE (Bottom up walk). (authored by fhahn). · Explain WhyFeb 10 2020, 3:52 AM

This revision was automatically updated to reflect the committed changes.

In D72700#1865066, @asbirlea wrote:

I do think there are outstanding issues that need answers, but I believe the way to make progress is to have an initial good version and iterate on it.

Great, thanks for all the helpful comments!

The major issue is performance, and to start testing this out we need a working version in tree.

There are many MemorySSA and AA calls in this variant, that we may be able to do better. For example: build a walker to do a single getClobbering with all the preconditions on what instructions are safe to skip, instead of doing this in a loop here.

Agreed! I think it makes sense to look into perf-tuning once we cover most cases legacy DSE handles. Otherwise perf comparisons might be a bit skewed.

Add all missing cases from add-on patches and get parity or better stores eliminated than current DSE. Merge tests when this happens to make this clear.

One missing piece that is not covered yet in the patch series is load->store forwarding. If anybody is interested in pushing this forward, that would be great! Otherwise I'll get to that in a bit.

The performance problem's scope goes beyond DSE. The current pass pipeline for both pass managers has a sequence of (gvn, memcpyopt, dse), where all of these use MemDepAnalysis. Switching DSE to MemorySSA may initially get worse compile times, as we need to build both MemDepAnalysis and MemorySSA, but switching all three (use newgvn and port memcpyopt and dse), may be worthwhile. This is the long term goal I have in mind.

Yes, one follow-up to MSSA backed DSE is MSSA backed MemCopyOpt. My hope is that we can re-use/share some/most of the walking strategy & safety checks between DSE and MemCpyOpt.

Thanks for all the work on this!

fhahn mentioned this in D73763: [DSE] Lift post-dominance restriction..Feb 10 2020, 7:29 AM

fhahn mentioned this in D72410: [DSE] Eliminate stores by terminators (free,lifetime.end)..Feb 10 2020, 8:02 AM

fhahn mentioned this in D72631: [DSE] Eliminate stores at the end of the function..Feb 10 2020, 8:05 AM

Reverting in 42f8b915eb72364cc5e84adf58a2c2d4947e8b10 as this results in a use-after-free, see http://lab.llvm.org:8011/builders/sanitizer-x86_64-linux-fast/builds/38628/steps/check-llvm%20asan/logs/stdio

Revision Contents

Path

Size

llvm/

lib/

Transforms/

Scalar/

DeadStoreElimination.cpp

467 lines

test/

Transforms/

DeadStoreElimination/

MSSA/

PartialStore.ll

33 lines

	fence-todo.ll
	fence.ll

50 lines

fence.ll

48 lines

memcpy-complete-overwrite.ll

189 lines

memcpy-lifetimes.ll

64 lines

memoryssa-scan-limit.ll

82 lines

memset-and-memcpy.ll

92 lines

multiblock-captures.ll

322 lines

multiblock-exceptions.ll

64 lines

multiblock-loops.ll

171 lines

multiblock-malloc-free.ll

291 lines

multiblock-memintrinsics.ll

70 lines

multiblock-memoryphis.ll

105 lines

multiblock-partial.ll

56 lines

multiblock-simple.ll

114 lines

multiblock-throwing.ll

95 lines

simple.ll

229 lines

Diff 237958

llvm/lib/Transforms/Scalar/DeadStoreElimination.cpp

Show All 23 Lines
#include "llvm/ADT/Statistic.h"		#include "llvm/ADT/Statistic.h"
#include "llvm/ADT/StringRef.h"		#include "llvm/ADT/StringRef.h"
#include "llvm/Analysis/AliasAnalysis.h"		#include "llvm/Analysis/AliasAnalysis.h"
#include "llvm/Analysis/CaptureTracking.h"		#include "llvm/Analysis/CaptureTracking.h"
#include "llvm/Analysis/GlobalsModRef.h"		#include "llvm/Analysis/GlobalsModRef.h"
#include "llvm/Analysis/MemoryBuiltins.h"		#include "llvm/Analysis/MemoryBuiltins.h"
#include "llvm/Analysis/MemoryDependenceAnalysis.h"		#include "llvm/Analysis/MemoryDependenceAnalysis.h"
#include "llvm/Analysis/MemoryLocation.h"		#include "llvm/Analysis/MemoryLocation.h"
		#include "llvm/Analysis/MemorySSA.h"
		#include "llvm/Analysis/MemorySSAUpdater.h"
#include "llvm/Analysis/OrderedBasicBlock.h"		#include "llvm/Analysis/OrderedBasicBlock.h"
		#include "llvm/Analysis/PostDominators.h"
#include "llvm/Analysis/TargetLibraryInfo.h"		#include "llvm/Analysis/TargetLibraryInfo.h"
#include "llvm/Analysis/ValueTracking.h"		#include "llvm/Analysis/ValueTracking.h"
#include "llvm/IR/Argument.h"		#include "llvm/IR/Argument.h"
#include "llvm/IR/BasicBlock.h"		#include "llvm/IR/BasicBlock.h"
#include "llvm/IR/CallSite.h"		#include "llvm/IR/CallSite.h"
#include "llvm/IR/Constant.h"		#include "llvm/IR/Constant.h"
#include "llvm/IR/Constants.h"		#include "llvm/IR/Constants.h"
#include "llvm/IR/DataLayout.h"		#include "llvm/IR/DataLayout.h"
#include "llvm/IR/Dominators.h"		#include "llvm/IR/Dominators.h"
#include "llvm/IR/Function.h"		#include "llvm/IR/Function.h"
		#include "llvm/IR/InstIterator.h"
#include "llvm/IR/InstrTypes.h"		#include "llvm/IR/InstrTypes.h"
#include "llvm/IR/Instruction.h"		#include "llvm/IR/Instruction.h"
#include "llvm/IR/Instructions.h"		#include "llvm/IR/Instructions.h"
#include "llvm/IR/IntrinsicInst.h"		#include "llvm/IR/IntrinsicInst.h"
#include "llvm/IR/Intrinsics.h"		#include "llvm/IR/Intrinsics.h"
#include "llvm/IR/LLVMContext.h"		#include "llvm/IR/LLVMContext.h"
#include "llvm/IR/Module.h"		#include "llvm/IR/Module.h"
#include "llvm/IR/PassManager.h"		#include "llvm/IR/PassManager.h"
Show All 31 Lines	EnablePartialOverwriteTracking("enable-dse-partial-overwrite-tracking",
cl::init(true), cl::Hidden,		cl::init(true), cl::Hidden,
cl::desc("Enable partial-overwrite tracking in DSE"));		cl::desc("Enable partial-overwrite tracking in DSE"));

static cl::opt<bool>		static cl::opt<bool>
EnablePartialStoreMerging("enable-dse-partial-store-merging",		EnablePartialStoreMerging("enable-dse-partial-store-merging",
cl::init(true), cl::Hidden,		cl::init(true), cl::Hidden,
cl::desc("Enable partial store merging in DSE"));		cl::desc("Enable partial store merging in DSE"));

		static cl::opt<bool>
		EnableMemorySSA("enable-dse-memoryssa", cl::init(false), cl::Hidden,
		cl::desc("Use the new MemorySSA-backed DSE."));

		static cl::opt<unsigned>
		MemorySSAScanLimit("dse-memoryssa-scanlimit", cl::init(100), cl::Hidden,
		cl::desc("The number of memory instructions to scan for "
		"dead store elimination (default = 100)"));

//===----------------------------------------------------------------------===//		//===----------------------------------------------------------------------===//
// Helper functions		// Helper functions
//===----------------------------------------------------------------------===//		//===----------------------------------------------------------------------===//
using OverlapIntervalsTy = std::map<int64_t, int64_t>;		using OverlapIntervalsTy = std::map<int64_t, int64_t>;
using InstOverlapIntervalsTy = DenseMap<Instruction *, OverlapIntervalsTy>;		using InstOverlapIntervalsTy = DenseMap<Instruction *, OverlapIntervalsTy>;

/// Delete this instruction. Before we do, go through and zero out all the		/// Delete this instruction. Before we do, go through and zero out all the
/// operands of this instruction. If any of them become dead, delete them and		/// operands of this instruction. If any of them become dead, delete them and
▲ Show 20 Lines • Show All 1,246 Lines • ▼ Show 20 Lines	for (BasicBlock &BB : F)
// Only check non-dead blocks. Dead blocks may have strange pointer		// Only check non-dead blocks. Dead blocks may have strange pointer
// cycles that will confuse alias analysis.		// cycles that will confuse alias analysis.
if (DT->isReachableFromEntry(&BB))		if (DT->isReachableFromEntry(&BB))
MadeChange \|= eliminateDeadStores(BB, AA, MD, DT, TLI);		MadeChange \|= eliminateDeadStores(BB, AA, MD, DT, TLI);

return MadeChange;		return MadeChange;
}		}

		namespace {
		//=============================================================================
		// MemorySSA form of Dead store elimination.
		//
		// This uses MemorySSA to perform dead store elimination. MemorySSA will give
		// us a graph of MemoryAccesses which will either be Defs (stores), Uses (loads)
		// or Phis. The Uses will not be noalias of the Def they are a use of (unless
		// we hit the memoryssa optimistaion limit). Unfortately we cannot say the same
		// things of stores. So we walk forward from one store to the next, looking
		// until we hit a store (or collection of stores) that cause the original to be
		// dead.
		//
		// Because we are using MemorySSA we can also look across blocks. Anywhere where
		// there are multiple possible successors we treat as a break. Otherwise, so
		asbirleaUnsubmitted Done Reply Inline Actions s/there/there are asbirlea: s/there/there are
		// long as the later store post-dominated the earlier one, the earlier one is
		// dead and can be removed.
		//
		// There are numerous other things we need to handle along the way.
		// - Any instruction that may throw usually acts a dse barrier (So long as the
		// memory is non-escaping)
		// - We have to be careful about stores that may also be self-reads.
		// - Atomics can be optimised (especially unordered), but are often not worth
		// it. We treat fences like maythrows so that the block dse in the same
		// manner.
		//
		// The above MemDep based methods will eventually be removed (if they are unused
		// by those below).

		enum class WalkType { Bad, Next };
		struct WalkResult {
		WalkType Type;
		MemoryDef *Next;

		operator bool() { return Type != WalkType::Bad; }
		};

		Optional<MemoryLocation> getLocForWriteEx(Instruction *I,
		const TargetLibraryInfo &TLI) {
		if (!I->mayWriteToMemory())
		return None;

		if (auto *MTI = dyn_cast<AnyMemIntrinsic>(I))
		return {MemoryLocation::getForDest(MTI)};

		if (auto CS = CallSite(I)) {
		if (Function *F = CS.getCalledFunction()) {
		StringRef FnName = F->getName();
		if (TLI.has(LibFunc_strcpy) && FnName == TLI.getName(LibFunc_strcpy))
		return {MemoryLocation(CS.getArgument(0))};
		if (TLI.has(LibFunc_strncpy) && FnName == TLI.getName(LibFunc_strncpy))
		return {MemoryLocation(CS.getArgument(0))};
		if (TLI.has(LibFunc_strcat) && FnName == TLI.getName(LibFunc_strcat))
		return {MemoryLocation(CS.getArgument(0))};
		if (TLI.has(LibFunc_strncat) && FnName == TLI.getName(LibFunc_strncat))
		return {MemoryLocation(CS.getArgument(0))};
		}
		return None;
		}

		return MemoryLocation::getOrNone(I);
		}

		// Returns true if \p Use may read from \p DefLoc.
		bool isReadClobber(MemoryLocation DefLoc, Instruction *UseInst,
		AliasAnalysis &AA, const TargetLibraryInfo &TLI,
		DominatorTree &DT) {
		if (!UseInst->mayReadFromMemory())
		return false;

		if (auto CS = CallSite(UseInst))
		if (CS.onlyAccessesInaccessibleMemory())
		return false;

		ModRefInfo MR = AA.getModRefInfo(UseInst, DefLoc);
		// If necessary, perform additional analysis.
		if (isRefSet(MR))
		MR = AA.callCapturesBefore(UseInst, DefLoc, &DT);
		return isRefSet(MR);
		}

		// Returns true if \p M is an intrisnic that does not read or write memory.
		bool isNoopIntrinsic(MemoryUseOrDef *M) {
		if (const IntrinsicInst *II = dyn_cast<IntrinsicInst>(M->getMemoryInst())) {
		switch (II->getIntrinsicID()) {
		case Intrinsic::lifetime_start:
		case Intrinsic::lifetime_end:
		case Intrinsic::invariant_end:
		case Intrinsic::assume:
		return true;
		case Intrinsic::dbg_addr:
		case Intrinsic::dbg_declare:
		case Intrinsic::dbg_label:
		case Intrinsic::dbg_value:
		llvm_unreachable("Intrinsic should not be modeled in MemorySSA");
		default:
		return false;
		}
		}
		return false;
		asbirleaUnsubmitted Done Reply Inline Actions s/modeled/modelled s/MemroySSA/MemorySSA asbirlea: s/modeled/modelled s/MemroySSA/MemorySSA
		fhahnAuthorUnsubmitted Done Reply Inline Actions I'm not a native speaker, but I think modeled is the US spelling and modelled is the UK spelling. I thought we prefer the US spelling, but I don't mind either way. fhahn: I'm not a native speaker, but I think modeled is the US spelling and modelled is the UK…
		asbirleaUnsubmitted Done Reply Inline Actions Yes, you're right, I'm not a native speaker either. Thank you for the correction! asbirlea: Yes, you're right, I'm not a native speaker either. Thank you for the correction!
		}

		asbirleaUnsubmitted Not Done Reply Inline Actions Consider adding a cl::opt limit on the things you look to process in F. We've had this issue with generated code where a BB has order of thousands stores. asbirlea: Consider adding a cl::opt limit on the things you look to process in F. We've had this issue…
		fhahnAuthorUnsubmitted Done Reply Inline Actions I've added a rather generous limit for Defs per BB, but we can always adjust it later. We still have to check the whole basic block for throwing instructions. fhahn: I've added a rather generous limit for Defs per BB, but we can always adjust it later. We still…
		asbirleaUnsubmitted Done Reply Inline Actions sgtm asbirlea: sgtm
		// Check if we can ignored \p D for DSE.
		bool canSkipDef(MemoryDef *D, bool DefVisibleToCaller) {
		Instruction *DI = D->getMemoryInst();
		// Calls that only access inaccessible memory cannot read or write any memory
		// locations we consider for elimination.
		if (auto CS = CallSite(DI))
		if (CS.onlyAccessesInaccessibleMemory())
		asbirleaUnsubmitted Done Reply Inline Actions MemorySSA should handle this, if it doesn't we should fix that. asbirlea: MemorySSA should handle this, if it doesn't we should fix that.
		fhahnAuthorUnsubmitted Done Reply Inline Actions it's handled by MemorySSA, I've dropped that. fhahn: it's handled by MemorySSA, I've dropped that.
		return true;

		// We can eliminate stores to locations not visible to the caller across
		// throwing instructions.
		if (DI->mayThrow() && !DefVisibleToCaller)
		return true;

		// We can remove the dead stores, irrespective of the fence and its ordering
		// (release/acquire/seq_cst). Fences only constraints the ordering of
		// already visible stores, it does not make a store visible to other
		// threads. So, skipping over a fence does not change a store from being
		// dead.
		if (isa<FenceInst>(DI))
		return true;

		// Skip intrinsics that do not really read or modify memory.
		if (isNoopIntrinsic(D))
		return true;

		return false;
		}

		// Find MemoryDef writing to \p DefLoc and dominating \p Current, with no read
		// access in between. Currently we bail out when we encounter any of the
		// following
		// * An aliasing MemoryUse (read).
		// * A MemoryPHI
		WalkResult getDomMemoryDef(MemoryAccess *Current, MemoryLocation DefLoc,
		bool DefVisibleToCaller, MemorySSA &MSSA,
		int ScanLimit, AliasAnalysis &AA,
		const TargetLibraryInfo &TLI, DominatorTree &DT) {
		MemoryAccess *DomAccess;
		MemoryDef *DomDef;
		bool StepAgain;
		// Find the next clobbering Mod access for DefLoc, starting at Current.
		do {
		StepAgain = false;
		if (MSSA.isLiveOnEntryDef(Current))
		return {WalkType::Bad, nullptr};

		MemoryUseOrDef *CurrentUD = dyn_cast<MemoryUseOrDef>(Current);
		if (!CurrentUD)
		return {WalkType::Bad, nullptr};

		// Look for access that clobber DecLoc.
		DomAccess = MSSA.getSkipSelfWalker()->getClobberingMemoryAccess(
		asbirleaUnsubmitted Done Reply Inline Actions Can you add a comment here, along the lines: `UseInst has a MemoryDef associated in MemorySSA. It's possible for a MemoryDef to not write to memory, e.g. a volatile load is modeled as a MemoryDef.` asbirlea: Can you add a comment here, along the lines: `UseInst has a MemoryDef associated in MemorySSA.
		CurrentUD->getDefiningAccess(), DefLoc);
		DomDef = dyn_cast<MemoryDef>(DomAccess);
		if (!DomDef \|\| MSSA.isLiveOnEntryDef(DomDef))
		return {WalkType::Bad, nullptr};

		// Check if we can skip DomDef for DSE.
		if (canSkipDef(DomDef, DefVisibleToCaller)) {
		StepAgain = true;
		Current = DomDef;
		}
		} while (StepAgain);

		SmallVector<MemoryAccess *, 4> WorkList;
		auto PushMemUses = [&WorkList](MemoryAccess *Acc) {
		for (Use &U : Acc->uses())
		WorkList.push_back(cast<MemoryAccess>(U.getUser()));
		};
		PushMemUses(DomAccess);
		asbirleaUnsubmitted Done Reply Inline Actions Nit: Use only DomDef, known to be non-null outside the loop. Restrict scope of DomAccess to inside the loop above. asbirlea: Nit: Use only DomDef, known to be non-null outside the loop. Restrict scope of DomAccess to…

		// Check if DomDef may be read.
		for (unsigned I = 0; I < WorkList.size(); I++) {
		asbirleaUnsubmitted Not Done Reply Inline Actions Walking uses may miss cases due to aliasing not being transitive. This needs to be throughly analyzed. Here's a very rough example. 1 = Def(LoE) ; <----- DomDef stores [0,3] 2 = Def(1) ; (2, 1)=NoAlias, stores [4,7] Use(2) ; MayAlias 2 and 1, the Use points to the first Def it may alias, loads [0, 7]. 3 = Def(1) ; <---- Current (3, 2)=NoAlias, (3,1)=MayAlias, stores [0,3] The situation may be simplified due to handling stores, but all Uses may need looking at. Note this will not work to recurse on uses of the uses either. Rough example why: 1 = Def(LoE) 2 = Def(1) ; <----- DomDef 3 = Def(1) ; (3, 2)=NoAlias Use(3) ; MayAlias 3 and 2, the Use points to the first Def it may alias. 4 = Def(2) ; <---- Current (4, 3)=NoAlias, (4,2)=MayAlias asbirlea: Walking uses may miss cases due to aliasing not being transitive. This needs to be throughly…
		fhahnAuthorUnsubmitted Done Reply Inline Actions Thanks for the example. I think I might be missing something for the optimized version though. From the example in MemorySSA.h: /// Given this form, all the stores that could ever effect the load at %8 can be /// gotten by using the MemoryUse associated with it, and walking from use to /// def until you hit the top of the function. From the comment above, shouldn't we visit both 2 and 3 when walking up from Use(3), as both may change the read location? In the example we would only see 3 and 1. But even assuming we would visit both 2 and 3, I think I can see how we could end up with scenarios we could miss overlapping reads. I think we would have to take a look at all users of DomDef and we specifically cannot skip any non-aliasing MemoryDefs for the read-checks. That would make things more expensive, but would be something we have to do regardless of going bottom-up/top-down. Does that make sense to you? However I think that would mean that in the worst-case, we would have to do a top-down walk similar to the general top-down approach for the read checks. fhahn: Thanks for the example. I think I might be missing something for the optimized version though.
		asbirleaUnsubmitted Done Reply Inline Actions I think I know what I missed. MemoryDefs keep two fields, the defining and the optimized access. So `3 = Def(1)` actually looks like this: `3 = Def(2) - Optimized 1` , and is a user of both 1 and 2. Yes, I think you're right that the read checks for all accesses are needed regardless of which approach is taken. And yes, it will be expensive. I came across something similar in LICM, and I limited or outright avoided analyzing all uses against a store to avoid the cost of analyzing all of them (see ~LICM.cpp:1300) asbirlea: I think I know what I missed. MemoryDefs keep two fields, the defining and the optimized access.
		fhahnAuthorUnsubmitted Done Reply Inline Actions I've added a test for the scenario: overlapping_read in multiblock-simple.ll fhahn: I've added a test for the scenario: overlapping_read in multiblock-simple.ll
		MemoryAccess *UseAccess = WorkList[I];
		// Bail out on MemoryPhis for now.
		asbirleaUnsubmitted Done Reply Inline Actions You're right this can be partially lifted. Here's an example: %a: 1 = Def(LoE) ; <----- DomDef br %cond1, %b, %c %b: 2 = Def(1) br %cond2, %d, %e %c: br %e %d: br %f %e: 3 = phi(1,2) br %f %f: 4 = Phi(2,3) 5 = Def(4) ; <---- Current asbirlea: You're right this can be partially lifted. Here's an example: ``` %a: 1 = Def(LoE) ; <…
		fhahnAuthorUnsubmitted Done Reply Inline Actions Restrictions related to MemoryPhis are lifted in D72148 fhahn: Restrictions related to MemoryPhis are lifted in D72148
		if (isa<MemoryPhi>(UseAccess))
		return {WalkType::Bad, nullptr};

		Instruction *UseInst = cast<MemoryUseOrDef>(UseAccess)->getMemoryInst();
		LLVM_DEBUG(dbgs() << " Checking use " << *UseInst << "\n");
		// Uses which may read the original MemoryDef mean we cannot eliminate the
		// original MD. Stop walk.
		if (isReadClobber(DefLoc, UseInst, AA, TLI, DT))
		return {WalkType::Bad, nullptr};
		}

		// No aliasing MemoryUses of DemDef found, DomDef is potentially dead.
		return {WalkType::Next, DomDef};
		}

		// Delete dead memory defs
		void deleteDeadInstruction(Instruction *SI, InstOverlapIntervalsTy &IOL,
		MemorySSA &MSSA, const TargetLibraryInfo &TLI,
		SmallPtrSetImpl<MemoryDef *> &SkipStores) {
		MemorySSAUpdater Updater(&MSSA);
		SmallVector<Instruction *, 32> NowDeadInsts;
		NowDeadInsts.push_back(SI);
		--NumFastOther;

		while (!NowDeadInsts.empty()) {
		Instruction *DeadInst = NowDeadInsts.pop_back_val();
		++NumFastOther;

		// Remove the Instruction from MSSA and IOL
		if (MemoryAccess *MA = MSSA.getMemoryAccess(DeadInst)) {
		Updater.removeMemoryAccess(MA);
		if (MemoryDef *MD = dyn_cast<MemoryDef>(MA)) {
		SkipStores.insert(MD);
		}
		}
		IOL.erase(DeadInst);

		// Remove it's operands
		for (Use &O : DeadInst->operands())
		if (Instruction *OpI = dyn_cast<Instruction>(O)) {
		O = nullptr;
		if (isInstructionTriviallyDead(OpI, &TLI))
		NowDeadInsts.push_back(OpI);
		}

		DeadInst->eraseFromParent();
		asbirleaUnsubmitted Done Reply Inline Actions Reduce scope of DomAccess to inside the do-while loop and use DomDef here, hence no need for the `if (isa<MemoryDef>(DomAccess))` check - see previous comment (please mark them as done when updating). asbirlea: Reduce scope of DomAccess to inside the do-while loop and use DomDef here, hence no need for…
		}
		}

		// Check for any extra throws between SI and NI that block DSE. This only
		// checks extra maythrows (those that arn't MemoryDef's). MemoryDef that may
		asbirleaUnsubmitted Done Reply Inline Actions SmallSetVector to avoid processing duplicates? asbirlea: SmallSetVector to avoid processing duplicates?
		fhahnAuthorUnsubmitted Not Done Reply Inline Actions Done. I think in the initial version the only nodes we might add multiple times are MemoryPhis and we bail out on the first one we see. I've added such a test to llvm/test/Transforms/DeadStoreElimination/MSSA/multiblock-memoryphis.ll fhahn: Done. I think in the initial version the only nodes we might add multiple times are…
		asbirleaUnsubmitted Done Reply Inline Actions I think when you process uses, you may find 2 different operands for a Def, if that Def is optimized. Something like: 1 = Def(LoE) 2 = Def (1) 3 = Def (2) - Optimized field = 1 So if adding uses of 1, one may add (2, 3), then when processing 2, (3) is a use and added again. This may be too convoluted to actually happen, but thought it's worth to have the safety net. Thank you for the update! asbirlea: I think when you process uses, you may find 2 different operands for a Def, if that Def is…
		fhahnAuthorUnsubmitted Done Reply Inline Actions Thanks for the example! That should be handled now :) fhahn: Thanks for the example! That should be handled now :)
		// throw are handled during the walk from one def to the next.
		bool mayThrowBetween(Instruction SI, Instruction NI, const Value *SILocUnd,
		SmallSetVector<const Value *, 16> &InvisibleToCaller,
		SmallPtrSetImpl<BasicBlock *> &ThrowingBlocks) {
		// First see if we can ignore it by using the fact that SI is an alloca/alloca
		// like object that is not visible to the caller during execution of the
		// function.
		if (SILocUnd && InvisibleToCaller.count(SILocUnd))
		return false;

		if (SI->getParent() == NI->getParent())
		return ThrowingBlocks.find(SI->getParent()) != ThrowingBlocks.end();
		return !ThrowingBlocks.empty();
		}

		// Check if \p NI acts as a DSE barrier for \p SI. The following instructions
		// act as barriers:
		// * A memory instruction that may throw and \p SI accesses a non-stack object.
		// * Atomic stores stronger that monotonic.
		bool isDSEBarrier(Instruction SI, MemoryLocation &SILoc, const Value SILocUnd,
		Instruction *NI, MemoryLocation &NILoc,
		SmallSetVector<const Value *, 16> &InvisibleToCaller,
		AliasAnalysis &AA, const TargetLibraryInfo &TLI) {
		// If NI may throw it acts as a barrier, unless we are to an alloca/alloca
		// like object that does not escape.
		if (NI->mayThrow() && !InvisibleToCaller.count(SILocUnd))
		return true;

		if (NI->isAtomic()) {
		if (auto *NSI = dyn_cast<StoreInst>(NI)) {
		if (isStrongerThanMonotonic(NSI->getOrdering()))
		return true;
		} else
		llvm_unreachable(
		"Other instructions should be modeled/skipped in MemorySSA");
		}

		return false;
		}

		bool eliminateDeadStoresMemorySSA(Function &F, AliasAnalysis &AA,
		MemorySSA &MSSA, DominatorTree &DT,
		PostDominatorTree &PDT,
		const TargetLibraryInfo &TLI) {
		const DataLayout &DL = F.getParent()->getDataLayout();
		bool MadeChange = false;

		// All MemoryDef stores
		SmallVector<MemoryDef *, 4> Stores;
		// Any that should be skipped as they are already deleted
		SmallPtrSet<MemoryDef *, 4> SkipStores;
		asbirleaUnsubmitted Done Reply Inline Actions Can this happen? Wouldn't the getClobbering above have found UseAccess /UseInst instead of DomDef? asbirlea: Can this happen? Wouldn't the getClobbering above have found UseAccess /UseInst instead of…
		fhahnAuthorUnsubmitted Done Reply Inline Actions I think it can happen in loops where a store in a the header is killed by a store in the exit, but it is also stored to in the loop body. I've added a few additional test cases to llvm/test/Transforms/DeadStoreElimination/MSSA/multiblock-loops.ll loop_multiple_def* . I've also moved the post dominance check in into the function and also updated the check here to skip PushMemUses for MemoryDefs that completely overwrite DefLoc. This helps with avoiding unnecessary checks. fhahn: I think it can happen in loops where a store in a the header is killed by a store in the exit…
		asbirleaUnsubmitted Done Reply Inline Actions I see, yes, I wasn't thinking with the MemoryPhi condition lifted. asbirlea: I see, yes, I wasn't thinking with the MemoryPhi condition lifted.
		// A map of interval maps representing partially-overwritten value parts
		InstOverlapIntervalsTy IOL;
		// Keep track of all of the objects that are invisible to the caller until the
		// function returns.
		SmallSetVector<const Value *, 16> InvisibleToCaller;
		// Keep track of blocks with throwing instructions not modeled in MemorySSA.
		SmallPtrSet<BasicBlock *, 16> ThrowingBlocks;

		// Collect blocks with throwing instructions not modeled in MemroySSA and
		// alloc-like objects.
		for (Instruction &I : instructions(F)) {
		if (I.mayThrow() && !MSSA.getMemoryAccess(&I))
		ThrowingBlocks.insert(I.getParent());

		auto *MD = dyn_cast_or_null<MemoryDef>(MSSA.getMemoryAccess(&I));
		if (MD && hasAnalyzableMemoryWrite(&I, TLI) && isRemovable(&I))
		Stores.push_back(MD);

		// Track alloca and alloca-like objects. Here we care about objects not
		// visible to the caller during function execution. Alloca objects are
		// invalid in the caller, for alloca-like objects we ensure that they are
		// not captured throughout the function.
		if (isa<AllocaInst>(&I) \|\|
		(isAllocLikeFn(&I, &TLI) && !PointerMayBeCaptured(&I, false, true)))
		InvisibleToCaller.insert(&I);
		}
		// Treat byval or inalloca arguments the same as Allocas, stores to them are
		// dead at the end of the function.
		for (Argument &AI : F.args())
		if (AI.hasByValOrInAllocaAttr())
		InvisibleToCaller.insert(&AI);

		// For each store:
		while (!Stores.empty()) {
		MemoryAccess *SIMD = Stores.back();
		Stores.pop_back();
		asbirleaUnsubmitted Done Reply Inline Actions s/it's/its asbirlea: s/it's/its
		fhahnAuthorUnsubmitted Done Reply Inline Actions Updated, thanks! fhahn: Updated, thanks!
		if (SkipStores.count(cast<MemoryDef>(SIMD)))
		continue;
		Instruction *SI = cast<MemoryDef>(SIMD)->getMemoryInst();
		MemoryLocation SILoc = *getLocForWriteEx(SI, TLI);
		assert(SILoc.Ptr && "SILoc should not be null");
		const Value *SILocUnd = GetUnderlyingObject(SILoc.Ptr, DL);
		Instruction *DefObj =
		asbirleaUnsubmitted Done Reply Inline Actions s/arn't/aren't asbirlea: s/arn't/aren't
		const_cast<Instruction *>(dyn_cast<Instruction>(SILocUnd));
		asbirleaUnsubmitted Not Done Reply Inline Actions The MemDep variant of DSE also attempts to keep debug info. Does this also make sense here? // Try to preserve debug information attached to the dead instruction. salvageDebugInfo(DeadInst); asbirlea:* The MemDep variant of DSE also attempts to keep debug info. Does this also make sense here? ```…
		fhahnAuthorUnsubmitted Done Reply Inline Actions Yes, we should definitely do this! I've updated it and there's a debug info test that passes now. fhahn: Yes, we should definitely do this! I've updated it and there's a debug info test that passes…
		bool DefVisibleToCaller = !InvisibleToCaller.count(SILocUnd);
		if (DefObj && (isAllocLikeFn(DefObj, &TLI) &&
		!PointerMayBeCapturedBefore(DefObj, false, true, SI, &DT)))
		DefVisibleToCaller = false;

		LLVM_DEBUG(dbgs() << "Trying to eliminate MemoryDefs killed by " << *SI
		<< "\n");

		// Walk MemorySSA upward to find MemoryDefs that might be killed by SI.
		WalkResult Next;
		while ((Next = getDomMemoryDef(SIMD, SILoc, DefVisibleToCaller, MSSA,
		MemorySSAScanLimit, AA, TLI, DT))) {
		MemoryAccess *DomAccess = Next.Next;
		LLVM_DEBUG(dbgs() << " Checking if we can kill " << *DomAccess << "\n");
		MemoryDef *ND = dyn_cast<MemoryDef>(DomAccess);
		Instruction *NI = ND->getMemoryInst();
		LLVM_DEBUG(dbgs() << " def " << *NI << "\n");

		if (!hasAnalyzableMemoryWrite(NI, TLI))
		break;
		MemoryLocation NILoc = *getLocForWriteEx(NI, TLI);
		// Check for anything that looks like it will be a barrier to further
		// removal
		if (isDSEBarrier(SI, SILoc, SILocUnd, NI, NILoc, InvisibleToCaller, AA,
		TLI)) {
		LLVM_DEBUG(dbgs() << " stop, barrier\n");
		break;
		}

		// We must post-dom the instructions to safely remove it
		if (!PDT.dominates(SIMD->getBlock(), ND->getBlock())) {
		SIMD = ND;
		LLVM_DEBUG(dbgs() << " continue, not post-dominating!\n");
		continue;
		}

		// Before we try to remove anything, check for any extra throwing
		// instructions that block us from DSEing
		if (mayThrowBetween(SI, NI, SILocUnd, InvisibleToCaller,
		ThrowingBlocks)) {
		LLVM_DEBUG(dbgs() << " stop, may throw!\n");
		break;
		}

		// Get what type of overwrite this might be
		int64_t InstWriteOffset, DepWriteOffset;
		OverwriteResult OR = isOverwrite(SILoc, NILoc, DL, TLI, DepWriteOffset,
		InstWriteOffset, NI, IOL, AA, &F);

		if (OR == OW_Complete) {
		LLVM_DEBUG(dbgs() << "DSE: Remove Dead Store:\n DEAD: " << *NI
		<< "\n KILLER: " << *SI << '\n');
		deleteDeadInstruction(NI, IOL, MSSA, TLI, SkipStores);
		++NumFastStores;
		MadeChange = true;
		} else
		SIMD = ND;
		asbirleaUnsubmitted Done Reply Inline Actions I don't see how this being in a loop will work. Shouldn't this be a "give up"? Example: ; 1 = MemoryDef (LoE) store a ; 2 = MemoryDef(1) call_reading_a_and_overwriting_a ; 3 = MemoryDef(2) store a Once the getClobbering found a mayAlias of 3 with 2, even if an overwrite is not proven, def 1 cannot be removed. I may have missed such a check in `getDomMemoryDef`. asbirlea: I don't see how this being in a loop will work. Shouldn't this be a "give up"? Example: ``` ; 1…
		fhahnAuthorUnsubmitted Done Reply Inline Actions I had another look at the getClobberingMemoryAccess, and we indeed need an additional check ensuring that the memorydef does not also read the original location! fhahn: I had another look at the getClobberingMemoryAccess, and we indeed need an additional check…
		}
		}

		return MadeChange;
		}
		} // end anonymous namespace

//===----------------------------------------------------------------------===//		//===----------------------------------------------------------------------===//
// DSE Pass		// DSE Pass
//===----------------------------------------------------------------------===//		//===----------------------------------------------------------------------===//
PreservedAnalyses DSEPass::run(Function &F, FunctionAnalysisManager &AM) {		PreservedAnalyses DSEPass::run(Function &F, FunctionAnalysisManager &AM) {
AliasAnalysis *AA = &AM.getResult<AAManager>(F);		AliasAnalysis &AA = AM.getResult<AAManager>(F);
DominatorTree *DT = &AM.getResult<DominatorTreeAnalysis>(F);		const TargetLibraryInfo &TLI = AM.getResult<TargetLibraryAnalysis>(F);
MemoryDependenceResults *MD = &AM.getResult<MemoryDependenceAnalysis>(F);		DominatorTree &DT = AM.getResult<DominatorTreeAnalysis>(F);
const TargetLibraryInfo *TLI = &AM.getResult<TargetLibraryAnalysis>(F);
		if (EnableMemorySSA) {
		MemorySSA &MSSA = AM.getResult<MemorySSAAnalysis>(F).getMSSA();
		PostDominatorTree &PDT = AM.getResult<PostDominatorTreeAnalysis>(F);

if (!eliminateDeadStores(F, AA, MD, DT, TLI))		if (!eliminateDeadStoresMemorySSA(F, AA, MSSA, DT, PDT, TLI))
		asbirleaUnsubmitted Done Reply Inline Actions Nit: Move declaration of Next inside while condition? asbirlea: Nit: Move declaration of Next inside while condition?
return PreservedAnalyses::all();		return PreservedAnalyses::all();
		} else {
		MemoryDependenceResults &MD = AM.getResult<MemoryDependenceAnalysis>(F);

		if (!eliminateDeadStores(F, &AA, &MD, &DT, &TLI))
		return PreservedAnalyses::all();
		}

PreservedAnalyses PA;		PreservedAnalyses PA;
PA.preserveSet<CFGAnalyses>();		PA.preserveSet<CFGAnalyses>();
PA.preserve<GlobalsAA>();		PA.preserve<GlobalsAA>();
		if (EnableMemorySSA)
		PA.preserve<MemorySSAAnalysis>();
		else
PA.preserve<MemoryDependenceAnalysis>();		PA.preserve<MemoryDependenceAnalysis>();
return PA;		return PA;
}		}

namespace {		namespace {

/// A legacy pass for the legacy pass manager that wraps \c DSEPass.		/// A legacy pass for the legacy pass manager that wraps \c DSEPass.
class DSELegacyPass : public FunctionPass {		class DSELegacyPass : public FunctionPass {
public:		public:
static char ID; // Pass identification, replacement for typeid		static char ID; // Pass identification, replacement for typeid

DSELegacyPass() : FunctionPass(ID) {		DSELegacyPass() : FunctionPass(ID) {
initializeDSELegacyPassPass(*PassRegistry::getPassRegistry());		initializeDSELegacyPassPass(*PassRegistry::getPassRegistry());
}		}

bool runOnFunction(Function &F) override {		bool runOnFunction(Function &F) override {
if (skipFunction(F))		if (skipFunction(F))
return false;		return false;

DominatorTree *DT = &getAnalysis<DominatorTreeWrapperPass>().getDomTree();		AliasAnalysis &AA = getAnalysis<AAResultsWrapperPass>().getAAResults();
AliasAnalysis *AA = &getAnalysis<AAResultsWrapperPass>().getAAResults();		DominatorTree &DT = getAnalysis<DominatorTreeWrapperPass>().getDomTree();
MemoryDependenceResults *MD =		const TargetLibraryInfo &TLI =
&getAnalysis<MemoryDependenceWrapperPass>().getMemDep();		getAnalysis<TargetLibraryInfoWrapperPass>().getTLI(F);
const TargetLibraryInfo *TLI =
&getAnalysis<TargetLibraryInfoWrapperPass>().getTLI(F);		if (EnableMemorySSA) {
		MemorySSA &MSSA = getAnalysis<MemorySSAWrapperPass>().getMSSA();
		PostDominatorTree &PDT =
		getAnalysis<PostDominatorTreeWrapperPass>().getPostDomTree();

		return eliminateDeadStoresMemorySSA(F, AA, MSSA, DT, PDT, TLI);
		} else {
		MemoryDependenceResults &MD =
		getAnalysis<MemoryDependenceWrapperPass>().getMemDep();

return eliminateDeadStores(F, AA, MD, DT, TLI);		return eliminateDeadStores(F, &AA, &MD, &DT, &TLI);
		}
}		}

void getAnalysisUsage(AnalysisUsage &AU) const override {		void getAnalysisUsage(AnalysisUsage &AU) const override {
AU.setPreservesCFG();		AU.setPreservesCFG();
AU.addRequired<DominatorTreeWrapperPass>();
AU.addRequired<AAResultsWrapperPass>();		AU.addRequired<AAResultsWrapperPass>();
AU.addRequired<MemoryDependenceWrapperPass>();
AU.addRequired<TargetLibraryInfoWrapperPass>();		AU.addRequired<TargetLibraryInfoWrapperPass>();
AU.addPreserved<DominatorTreeWrapperPass>();
AU.addPreserved<GlobalsAAWrapperPass>();		AU.addPreserved<GlobalsAAWrapperPass>();
		AU.addRequired<DominatorTreeWrapperPass>();
		AU.addPreserved<DominatorTreeWrapperPass>();

		if (EnableMemorySSA) {
		AU.addRequired<PostDominatorTreeWrapperPass>();
		AU.addRequired<MemorySSAWrapperPass>();
		AU.addPreserved<PostDominatorTreeWrapperPass>();
		AU.addPreserved<MemorySSAWrapperPass>();
		} else {
		AU.addRequired<MemoryDependenceWrapperPass>();
AU.addPreserved<MemoryDependenceWrapperPass>();		AU.addPreserved<MemoryDependenceWrapperPass>();
}		}
		}
};		};

} // end anonymous namespace		} // end anonymous namespace

char DSELegacyPass::ID = 0;		char DSELegacyPass::ID = 0;

INITIALIZE_PASS_BEGIN(DSELegacyPass, "dse", "Dead Store Elimination", false,		INITIALIZE_PASS_BEGIN(DSELegacyPass, "dse", "Dead Store Elimination", false,
false)		false)
INITIALIZE_PASS_DEPENDENCY(DominatorTreeWrapperPass)		INITIALIZE_PASS_DEPENDENCY(DominatorTreeWrapperPass)
		INITIALIZE_PASS_DEPENDENCY(PostDominatorTreeWrapperPass)
INITIALIZE_PASS_DEPENDENCY(AAResultsWrapperPass)		INITIALIZE_PASS_DEPENDENCY(AAResultsWrapperPass)
INITIALIZE_PASS_DEPENDENCY(GlobalsAAWrapperPass)		INITIALIZE_PASS_DEPENDENCY(GlobalsAAWrapperPass)
		INITIALIZE_PASS_DEPENDENCY(MemorySSAWrapperPass)
INITIALIZE_PASS_DEPENDENCY(MemoryDependenceWrapperPass)		INITIALIZE_PASS_DEPENDENCY(MemoryDependenceWrapperPass)
INITIALIZE_PASS_DEPENDENCY(TargetLibraryInfoWrapperPass)		INITIALIZE_PASS_DEPENDENCY(TargetLibraryInfoWrapperPass)
INITIALIZE_PASS_END(DSELegacyPass, "dse", "Dead Store Elimination", false,		INITIALIZE_PASS_END(DSELegacyPass, "dse", "Dead Store Elimination", false,
false)		false)

FunctionPass *llvm::createDeadStoreEliminationPass() {		FunctionPass *llvm::createDeadStoreEliminationPass() {
return new DSELegacyPass();		return new DSELegacyPass();
}		}

llvm/test/Transforms/DeadStoreElimination/MSSA/PartialStore.ll

Show First 20 Lines • Show All 46 Lines • ▼ Show 20 Lines	; CHECK-NEXT: store double
%A = getelementptr i8, i8* %P, i32 3		%A = getelementptr i8, i8* %P, i32 3

store i8 42, i8* %A ;; dead		store i8 42, i8* %A ;; dead

%Q = bitcast i8* %P to double*		%Q = bitcast i8* %P to double*
store double 0.0, double* %Q		store double 0.0, double* %Q
ret void		ret void
}		}

; PR8657
declare void @test5a(i32*)
define void @test5(i32 %i) nounwind ssp {
%A = alloca i32
%B = bitcast i32* %A to i8*
%C = getelementptr i8, i8* %B, i32 %i
store i8 10, i8* %C ;; Dead store to variable index.
store i32 20, i32* %A

call void @test5a(i32* %A)
ret void
; CHECK-LABEL: @test5(
; CHECK-NEXT: alloca
; CHECK-NEXT: store i32 20
; CHECK-NEXT: call void @test5a
}

declare void @test5a_as1(i32*)
define void @test5_addrspacecast(i32 %i) nounwind ssp {
%A = alloca i32
%B = addrspacecast i32* %A to i8 addrspace(1)*
%C = getelementptr i8, i8 addrspace(1)* %B, i32 %i
store i8 10, i8 addrspace(1)* %C ;; Dead store to variable index.
store i32 20, i32* %A

call void @test5a(i32* %A)
ret void
; CHECK-LABEL: @test5_addrspacecast(
; CHECK-NEXT: alloca
; CHECK-NEXT: store i32 20
; CHECK-NEXT: call void @test5a
}

llvm/test/Transforms/DeadStoreElimination/MSSA/fence-todo.ll

This file was copied from llvm/test/Transforms/DeadStoreElimination/MSSA/fence.ll.

	; RUN: opt -S -basicaa -dse -enable-dse-memoryssa < %s \| FileCheck %s			; XFAIL: *

	; We conservative choose to prevent dead store elimination
	; across release or stronger fences. It's not required
	; (since the must still be a race on %addd.i), but
	; it is conservatively correct. A legal optimization
	; could hoist the second store above the fence, and then
	; DSE one of them.
	define void @test1(i32* %addr.i) {
	; CHECK-LABEL: @test1
	; CHECK: store i32 5
	; CHECK: fence
	; CHECK: store i32 5
	; CHECK: ret
	store i32 5, i32* %addr.i, align 4
	fence release
	store i32 5, i32* %addr.i, align 4
	ret void
	}

	; Same as previous, but with different values. If we ever optimize
	; this more aggressively, this allows us to check that the correct
	; store is retained (the 'i32 1' store in this case)
	define void @test1b(i32* %addr.i) {
	; CHECK-LABEL: @test1b
	; CHECK: store i32 42
	; CHECK: fence release
	; CHECK: store i32 1
	; CHECK: ret
	store i32 42, i32* %addr.i, align 4
	fence release
	store i32 1, i32* %addr.i, align 4
	ret void
	}

	; We could DSE across this fence, but don't. No other thread can			; RUN: opt -S -basicaa -dse -enable-dse-memoryssa < %s \| FileCheck %s
	; observe the order of the acquire fence and the store.
	define void @test2(i32* %addr.i) {
	; CHECK-LABEL: @test2
	; CHECK: store
	; CHECK: fence
	; CHECK: store
	; CHECK: ret
	store i32 5, i32* %addr.i, align 4
	fence acquire
	store i32 5, i32* %addr.i, align 4
	ret void
	}

	; We DSE stack alloc'ed and byval locations, in the presence of fences.			; We DSE stack alloc'ed and byval locations, in the presence of fences.
	; Fence does not make an otherwise thread local store visible.			; Fence does not make an otherwise thread local store visible.
	; Right now the DSE in presence of fence is only done in end blocks (with no successors),			; Right now the DSE in presence of fence is only done in end blocks (with no successors),
	; but the same logic applies to other basic blocks as well.			; but the same logic applies to other basic blocks as well.
	; The store to %addr.i can be removed since it is a byval attribute			; The store to %addr.i can be removed since it is a byval attribute
	define void @test3(i32* byval %addr.i) {			define void @test3(i32* byval %addr.i) {
	; CHECK-LABEL: @test3			; CHECK-LABEL: @test3
	Show All 31 Lines
	; CHECK-NEXT: fence seq_cst			; CHECK-NEXT: fence seq_cst
	; CHECK-NEXT: ret void			; CHECK-NEXT: ret void
	%P1 = alloca i32			%P1 = alloca i32
	store i32 0, i32* %P1, align 4			store i32 0, i32* %P1, align 4
	fence seq_cst			fence seq_cst
	store i32 4, i32* %P1, align 4			store i32 4, i32* %P1, align 4
	ret void			ret void
	}			}

llvm/test/Transforms/DeadStoreElimination/MSSA/fence.ll

This file was copied to llvm/test/Transforms/DeadStoreElimination/MSSA/fence-todo.ll.

	Show All 40 Lines
	; CHECK: fence			; CHECK: fence
	; CHECK: store			; CHECK: store
	; CHECK: ret			; CHECK: ret
	store i32 5, i32* %addr.i, align 4			store i32 5, i32* %addr.i, align 4
	fence acquire			fence acquire
	store i32 5, i32* %addr.i, align 4			store i32 5, i32* %addr.i, align 4
	ret void			ret void
	}			}

	; We DSE stack alloc'ed and byval locations, in the presence of fences.
	; Fence does not make an otherwise thread local store visible.
	; Right now the DSE in presence of fence is only done in end blocks (with no successors),
	; but the same logic applies to other basic blocks as well.
	; The store to %addr.i can be removed since it is a byval attribute
	define void @test3(i32* byval %addr.i) {
	; CHECK-LABEL: @test3
	; CHECK-NOT: store
	; CHECK: fence
	; CHECK: ret
	store i32 5, i32* %addr.i, align 4
	fence release
	ret void
	}

	declare void @foo(i8* nocapture %p)

	declare noalias i8* @malloc(i32)

	; DSE of stores in locations allocated through library calls.
	define void @test_nocapture() {
	; CHECK-LABEL: @test_nocapture
	; CHECK: malloc
	; CHECK: foo
	; CHECK-NOT: store
	; CHECK: fence
	%m = call i8* @malloc(i32 24)
	call void @foo(i8* %m)
	store i8 4, i8* %m
	fence release
	ret void
	}


	; This is a full fence, but it does not make a thread local store visible.
	; We can DSE the store in presence of the fence.
	define void @fence_seq_cst() {
	; CHECK-LABEL: @fence_seq_cst
	; CHECK-NEXT: fence seq_cst
	; CHECK-NEXT: ret void
	%P1 = alloca i32
	store i32 0, i32* %P1, align 4
	fence seq_cst
	store i32 4, i32* %P1, align 4
	ret void
	}

llvm/test/Transforms/DeadStoreElimination/MSSA/memcpy-complete-overwrite.ll

This file was added.

				; XFAIL: *

				; NOTE: Assertions have been autogenerated by utils/update_test_checks.py
				; RUN: opt < %s -basicaa -dse -enable-dse-memoryssa -S \| FileCheck %s
				; RUN: opt < %s -aa-pipeline=basic-aa -passes=dse -enable-dse-memoryssa -S \| FileCheck %s
				target datalayout = "E-p:64:64:64-a0:0:8-f32:32:32-f64:64:64-i1:8:8-i8:8:8-i16:16:16-i32:32:32-i64:32:64-v64:64:64-v128:128:128"

				declare void @llvm.memset.p0i8.i64(i8* nocapture, i8, i64, i1) nounwind
				declare void @llvm.memset.element.unordered.atomic.p0i8.i64(i8* nocapture, i8, i64, i32) nounwind
				declare void @llvm.memcpy.p0i8.p0i8.i64(i8* nocapture, i8* nocapture, i64, i1) nounwind
				declare void @llvm.memcpy.element.unordered.atomic.p0i8.p0i8.i64(i8* nocapture, i8* nocapture, i64, i32) nounwind

				; PR8701

				;; Fully dead overwrite of memcpy.
				define void @test15(i8* %P, i8* %Q) nounwind ssp {
				; CHECK-LABEL: @test15(
				; CHECK-NEXT: tail call void @llvm.memcpy.p0i8.p0i8.i64(i8* [[P:%.]], i8 [[Q:%.*]], i64 12, i1 false)
				; CHECK-NEXT: ret void
				;
				tail call void @llvm.memcpy.p0i8.p0i8.i64(i8* %P, i8* %Q, i64 12, i1 false)
				tail call void @llvm.memcpy.p0i8.p0i8.i64(i8* %P, i8* %Q, i64 12, i1 false)
				ret void
				}

				;; Fully dead overwrite of memcpy.
				define void @test15_atomic(i8* %P, i8* %Q) nounwind ssp {
				; CHECK-LABEL: @test15_atomic(
				; CHECK-NEXT: tail call void @llvm.memcpy.element.unordered.atomic.p0i8.p0i8.i64(i8* align 1 [[P:%.]], i8 align 1 [[Q:%.*]], i64 12, i32 1)
				; CHECK-NEXT: ret void
				;
				tail call void @llvm.memcpy.element.unordered.atomic.p0i8.p0i8.i64(i8* align 1 %P, i8* align 1 %Q, i64 12, i32 1)
				tail call void @llvm.memcpy.element.unordered.atomic.p0i8.p0i8.i64(i8* align 1 %P, i8* align 1 %Q, i64 12, i32 1)
				ret void
				}

				;; Fully dead overwrite of memcpy.
				define void @test15_atomic_weaker(i8* %P, i8* %Q) nounwind ssp {
				; CHECK-LABEL: @test15_atomic_weaker(
				; CHECK-NEXT: tail call void @llvm.memcpy.element.unordered.atomic.p0i8.p0i8.i64(i8* align 1 [[P:%.]], i8 align 1 [[Q:%.*]], i64 12, i32 1)
				; CHECK-NEXT: ret void
				;
				tail call void @llvm.memcpy.p0i8.p0i8.i64(i8* align 1 %P, i8* align 1 %Q, i64 12, i1 false)
				tail call void @llvm.memcpy.element.unordered.atomic.p0i8.p0i8.i64(i8* align 1 %P, i8* align 1 %Q, i64 12, i32 1)
				ret void
				}

				;; Fully dead overwrite of memcpy.
				define void @test15_atomic_weaker_2(i8* %P, i8* %Q) nounwind ssp {
				; CHECK-LABEL: @test15_atomic_weaker_2(
				; CHECK-NEXT: tail call void @llvm.memcpy.p0i8.p0i8.i64(i8* align 1 [[P:%.]], i8 align 1 [[Q:%.*]], i64 12, i1 false)
				; CHECK-NEXT: ret void
				;
				tail call void @llvm.memcpy.element.unordered.atomic.p0i8.p0i8.i64(i8* align 1 %P, i8* align 1 %Q, i64 12, i32 1)
				tail call void @llvm.memcpy.p0i8.p0i8.i64(i8* align 1 %P, i8* align 1 %Q, i64 12, i1 false)
				ret void
				}

				;; Full overwrite of smaller memcpy.
				define void @test16(i8* %P, i8* %Q) nounwind ssp {
				; CHECK-LABEL: @test16(
				; CHECK-NEXT: tail call void @llvm.memcpy.p0i8.p0i8.i64(i8* [[P:%.]], i8 [[Q:%.*]], i64 12, i1 false)
				; CHECK-NEXT: ret void
				;
				tail call void @llvm.memcpy.p0i8.p0i8.i64(i8* %P, i8* %Q, i64 8, i1 false)
				tail call void @llvm.memcpy.p0i8.p0i8.i64(i8* %P, i8* %Q, i64 12, i1 false)
				ret void
				}

				;; Full overwrite of smaller memcpy.
				define void @test16_atomic(i8* %P, i8* %Q) nounwind ssp {
				; CHECK-LABEL: @test16_atomic(
				; CHECK-NEXT: tail call void @llvm.memcpy.element.unordered.atomic.p0i8.p0i8.i64(i8* align 1 [[P:%.]], i8 align 1 [[Q:%.*]], i64 12, i32 1)
				; CHECK-NEXT: ret void
				;
				tail call void @llvm.memcpy.element.unordered.atomic.p0i8.p0i8.i64(i8* align 1 %P, i8* align 1 %Q, i64 8, i32 1)
				tail call void @llvm.memcpy.element.unordered.atomic.p0i8.p0i8.i64(i8* align 1 %P, i8* align 1 %Q, i64 12, i32 1)
				ret void
				}

				;; Full overwrite of smaller memory where overwrite has stronger atomicity
				define void @test16_atomic_weaker(i8* %P, i8* %Q) nounwind ssp {
				; CHECK-LABEL: @test16_atomic_weaker(
				; CHECK-NEXT: tail call void @llvm.memcpy.element.unordered.atomic.p0i8.p0i8.i64(i8* align 1 [[P:%.]], i8 align 1 [[Q:%.*]], i64 12, i32 1)
				; CHECK-NEXT: ret void
				;
				tail call void @llvm.memcpy.p0i8.p0i8.i64(i8* align 1 %P, i8* align 1 %Q, i64 8, i1 false)
				tail call void @llvm.memcpy.element.unordered.atomic.p0i8.p0i8.i64(i8* align 1 %P, i8* align 1 %Q, i64 12, i32 1)
				ret void
				}

				;; Full overwrite of smaller memory where overwrite has weaker atomicity.
				define void @test16_atomic_weaker_2(i8* %P, i8* %Q) nounwind ssp {
				; CHECK-LABEL: @test16_atomic_weaker_2(
				; CHECK-NEXT: tail call void @llvm.memcpy.p0i8.p0i8.i64(i8* align 1 [[P:%.]], i8 align 1 [[Q:%.*]], i64 12, i1 false)
				; CHECK-NEXT: ret void
				;
				tail call void @llvm.memcpy.element.unordered.atomic.p0i8.p0i8.i64(i8* align 1 %P, i8* align 1 %Q, i64 8, i32 1)
				tail call void @llvm.memcpy.p0i8.p0i8.i64(i8* align 1 %P, i8* align 1 %Q, i64 12, i1 false)
				ret void
				}

				;; Overwrite of memset by memcpy.
				define void @test17(i8* %P, i8* noalias %Q) nounwind ssp {
				; CHECK-LABEL: @test17(
				; CHECK-NEXT: tail call void @llvm.memcpy.p0i8.p0i8.i64(i8* [[P:%.]], i8 [[Q:%.*]], i64 12, i1 false)
				; CHECK-NEXT: ret void
				;
				tail call void @llvm.memset.p0i8.i64(i8* %P, i8 42, i64 8, i1 false)
				tail call void @llvm.memcpy.p0i8.p0i8.i64(i8* %P, i8* %Q, i64 12, i1 false)
				ret void
				}

				;; Overwrite of memset by memcpy.
				define void @test17_atomic(i8* %P, i8* noalias %Q) nounwind ssp {
				; CHECK-LABEL: @test17_atomic(
				; CHECK-NEXT: tail call void @llvm.memcpy.element.unordered.atomic.p0i8.p0i8.i64(i8* align 1 [[P:%.]], i8 align 1 [[Q:%.*]], i64 12, i32 1)
				; CHECK-NEXT: ret void
				;
				tail call void @llvm.memset.element.unordered.atomic.p0i8.i64(i8* align 1 %P, i8 42, i64 8, i32 1)
				tail call void @llvm.memcpy.element.unordered.atomic.p0i8.p0i8.i64(i8* align 1 %P, i8* align 1 %Q, i64 12, i32 1)
				ret void
				}

				;; Overwrite of memset by memcpy. Overwrite is stronger atomicity. We can
				;; remove the memset.
				define void @test17_atomic_weaker(i8* %P, i8* noalias %Q) nounwind ssp {
				; CHECK-LABEL: @test17_atomic_weaker(
				; CHECK-NEXT: tail call void @llvm.memcpy.element.unordered.atomic.p0i8.p0i8.i64(i8* align 1 [[P:%.]], i8 align 1 [[Q:%.*]], i64 12, i32 1)
				; CHECK-NEXT: ret void
				;
				tail call void @llvm.memset.p0i8.i64(i8* align 1 %P, i8 42, i64 8, i1 false)
				tail call void @llvm.memcpy.element.unordered.atomic.p0i8.p0i8.i64(i8* align 1 %P, i8* align 1 %Q, i64 12, i32 1)
				ret void
				}

				;; Overwrite of memset by memcpy. Overwrite is weaker atomicity. We can remove
				;; the memset.
				define void @test17_atomic_weaker_2(i8* %P, i8* noalias %Q) nounwind ssp {
				; CHECK-LABEL: @test17_atomic_weaker_2(
				; CHECK-NEXT: tail call void @llvm.memcpy.p0i8.p0i8.i64(i8* align 1 [[P:%.]], i8 align 1 [[Q:%.*]], i64 12, i1 false)
				; CHECK-NEXT: ret void
				;
				tail call void @llvm.memset.element.unordered.atomic.p0i8.i64(i8* align 1 %P, i8 42, i64 8, i32 1)
				tail call void @llvm.memcpy.p0i8.p0i8.i64(i8* align 1 %P, i8* align 1 %Q, i64 12, i1 false)
				ret void
				}

				; Should not delete the volatile memset.
				define void @test17v(i8* %P, i8* %Q) nounwind ssp {
				; CHECK-LABEL: @test17v(
				; CHECK-NEXT: tail call void @llvm.memset.p0i8.i64(i8* [[P:%.*]], i8 42, i64 8, i1 true)
				; CHECK-NEXT: tail call void @llvm.memcpy.p0i8.p0i8.i64(i8* [[P]], i8* [[Q:%.*]], i64 12, i1 false)
				; CHECK-NEXT: ret void
				;
				tail call void @llvm.memset.p0i8.i64(i8* %P, i8 42, i64 8, i1 true)
				tail call void @llvm.memcpy.p0i8.p0i8.i64(i8* %P, i8* %Q, i64 12, i1 false)
				ret void
				}

				; PR8728
				; Do not delete instruction where possible situation is:
				; A = B
				; A = A
				;
				; NB! See PR11763 - currently LLVM allows memcpy's source and destination to be
				; equal (but not inequal and overlapping).
				define void @test18(i8* %P, i8* %Q, i8* %R) nounwind ssp {
				; CHECK-LABEL: @test18(
				; CHECK-NEXT: tail call void @llvm.memcpy.p0i8.p0i8.i64(i8* [[P:%.]], i8 [[Q:%.*]], i64 12, i1 false)
				; CHECK-NEXT: tail call void @llvm.memcpy.p0i8.p0i8.i64(i8* [[P]], i8* [[R:%.*]], i64 12, i1 false)
				; CHECK-NEXT: ret void
				;
				tail call void @llvm.memcpy.p0i8.p0i8.i64(i8* %P, i8* %Q, i64 12, i1 false)
				tail call void @llvm.memcpy.p0i8.p0i8.i64(i8* %P, i8* %R, i64 12, i1 false)
				ret void
				}

				define void @test18_atomic(i8* %P, i8* %Q, i8* %R) nounwind ssp {
				; CHECK-LABEL: @test18_atomic(
				; CHECK-NEXT: tail call void @llvm.memcpy.element.unordered.atomic.p0i8.p0i8.i64(i8* align 1 [[P:%.]], i8 align 1 [[Q:%.*]], i64 12, i32 1)
				; CHECK-NEXT: tail call void @llvm.memcpy.element.unordered.atomic.p0i8.p0i8.i64(i8* align 1 [[P]], i8* align 1 [[R:%.*]], i64 12, i32 1)
				; CHECK-NEXT: ret void
				;
				tail call void @llvm.memcpy.element.unordered.atomic.p0i8.p0i8.i64(i8* align 1 %P, i8* align 1 %Q, i64 12, i32 1)
				tail call void @llvm.memcpy.element.unordered.atomic.p0i8.p0i8.i64(i8* align 1 %P, i8* align 1 %R, i64 12, i32 1)
				ret void
				}

llvm/test/Transforms/DeadStoreElimination/MSSA/memcpy-lifetimes.ll

This file was added.

				; NOTE: Assertions have been autogenerated by utils/update_test_checks.py
				; RUN: opt < %s -basicaa -dse -S \| FileCheck %s
				; RUN: opt < %s -aa-pipeline=basic-aa -passes=dse -S \| FileCheck %s

				target datalayout = "e-m:o-p270:32:32-p271:32:32-p272:64:64-i64:64-f80:128-n8:16:32:64-S128"

				%struct.Village = type { [4 x %struct.Village], %struct.Village, %struct.List, %struct.Hosp, i32, i64 }
				%struct.List = type { %struct.List, %struct.Patient, %struct.List* }
				%struct.Patient = type { i32, i32, i32, %struct.Village* }
				%struct.Hosp = type { i32, i32, i32, %struct.List, %struct.List, %struct.List, %struct.List }

				declare %struct.Village* @alloc(%struct.Village*)

				define i8* @alloc_tree() {
				; CHECK-LABEL: @alloc_tree(
				; CHECK-NEXT: entry:
				; CHECK-NEXT: [[FVAL:%.]] = alloca [4 x %struct.Village], align 16
				; CHECK-NEXT: [[TMP0:%.]] = bitcast [4 x %struct.Village]* [[FVAL]] to i8*
				; CHECK-NEXT: call void @llvm.lifetime.start.p0i8(i64 32, i8* nonnull [[TMP0]])
				; CHECK-NEXT: [[CALL:%.]] = tail call dereferenceable_or_null(192) i8 @malloc(i64 192)
				; CHECK-NEXT: [[TMP1:%.]] = bitcast i8 [[CALL]] to %struct.Village*
				; CHECK-NEXT: [[CALL3:%.]] = tail call %struct.Village @alloc(%struct.Village* [[TMP1]])
				; CHECK-NEXT: [[ARRAYIDX:%.]] = getelementptr inbounds [4 x %struct.Village], [4 x %struct.Village] [[FVAL]], i64 0, i64 3
				; CHECK-NEXT: store %struct.Village* [[CALL3]], %struct.Village** [[ARRAYIDX]], align 8
				; CHECK-NEXT: [[CALL3_1:%.]] = tail call %struct.Village @alloc(%struct.Village* [[TMP1]])
				; CHECK-NEXT: [[ARRAYIDX_1:%.]] = getelementptr inbounds [4 x %struct.Village], [4 x %struct.Village] [[FVAL]], i64 0, i64 2
				; CHECK-NEXT: store %struct.Village* [[CALL3_1]], %struct.Village** [[ARRAYIDX_1]], align 16
				; CHECK-NEXT: [[CALL3_2:%.]] = tail call %struct.Village @alloc(%struct.Village* [[TMP1]])
				; CHECK-NEXT: [[ARRAYIDX_2:%.]] = getelementptr inbounds [4 x %struct.Village], [4 x %struct.Village] [[FVAL]], i64 0, i64 1
				; CHECK-NEXT: store %struct.Village* [[CALL3_2]], %struct.Village** [[ARRAYIDX_2]], align 8
				; CHECK-NEXT: [[CALL3_3:%.]] = tail call %struct.Village @alloc(%struct.Village* [[TMP1]])
				; CHECK-NEXT: [[ARRAYIDX_3:%.]] = getelementptr inbounds [4 x %struct.Village], [4 x %struct.Village] [[FVAL]], i64 0, i64 0
				; CHECK-NEXT: store %struct.Village* [[CALL3_3]], %struct.Village** [[ARRAYIDX_3]], align 16
				; CHECK-NEXT: call void @llvm.memcpy.p0i8.p0i8.i64(i8* nonnull align 8 dereferenceable(32) [[CALL]], i8* nonnull align 16 dereferenceable(32) [[TMP0]], i64 32, i1 false)
				; CHECK-NEXT: call void @llvm.lifetime.end.p0i8(i64 32, i8* nonnull [[TMP0]])
				; CHECK-NEXT: ret i8* [[CALL]]
				;
				entry:
				%fval = alloca [4 x %struct.Village*], align 16
				%0 = bitcast [4 x %struct.Village] %fval to i8*
				call void @llvm.lifetime.start.p0i8(i64 32, i8* nonnull %0) #7
				%call = tail call dereferenceable_or_null(192) i8* @malloc(i64 192) #8
				%1 = bitcast i8* %call to %struct.Village*
				%call3 = tail call %struct.Village* @alloc(%struct.Village* %1)
				%arrayidx = getelementptr inbounds [4 x %struct.Village], [4 x %struct.Village]* %fval, i64 0, i64 3
				store %struct.Village* %call3, %struct.Village** %arrayidx, align 8
				%call3.1 = tail call %struct.Village* @alloc(%struct.Village* %1)
				%arrayidx.1 = getelementptr inbounds [4 x %struct.Village], [4 x %struct.Village]* %fval, i64 0, i64 2
				store %struct.Village* %call3.1, %struct.Village** %arrayidx.1, align 16
				%call3.2 = tail call %struct.Village* @alloc(%struct.Village* %1)
				%arrayidx.2 = getelementptr inbounds [4 x %struct.Village], [4 x %struct.Village]* %fval, i64 0, i64 1
				store %struct.Village* %call3.2, %struct.Village** %arrayidx.2, align 8
				%call3.3 = tail call %struct.Village* @alloc(%struct.Village* %1)
				%arrayidx.3 = getelementptr inbounds [4 x %struct.Village], [4 x %struct.Village]* %fval, i64 0, i64 0
				store %struct.Village* %call3.3, %struct.Village** %arrayidx.3, align 16
				call void @llvm.memcpy.p0i8.p0i8.i64(i8* nonnull align 8 dereferenceable(32) %call, i8* nonnull align 16 dereferenceable(32) %0, i64 32, i1 false)
				call void @llvm.lifetime.end.p0i8(i64 32, i8* nonnull %0) #7
				ret i8* %call
				}

				declare void @llvm.lifetime.start.p0i8(i64 immarg, i8* nocapture)
				declare noalias i8* @malloc(i64)
				declare void @llvm.lifetime.end.p0i8(i64 immarg, i8* nocapture)
				declare void @llvm.memcpy.p0i8.p0i8.i64(i8* noalias nocapture writeonly, i8* noalias nocapture readonly, i64, i1 immarg)

llvm/test/Transforms/DeadStoreElimination/MSSA/memoryssa-scan-limit.ll

This file was added.

				; NOTE: Assertions have been autogenerated by utils/update_test_checks.py
				; RUN: opt < %s -basicaa -dse -enable-dse-memoryssa -S \| FileCheck --check-prefix=NO-LIMIT %s
				; RUN: opt < %s -basicaa -dse -enable-dse-memoryssa -dse-memoryssa-scanlimit=0 -S \| FileCheck --check-prefix=LIMIT-0 %s
				; RUN: opt < %s -basicaa -dse -enable-dse-memoryssa -dse-memoryssa-scanlimit=2 -S \| FileCheck --check-prefix=LIMIT-2 %s
				; RUN: opt < %s -basicaa -dse -enable-dse-memoryssa -dse-memoryssa-scanlimit=3 -S \| FileCheck --check-prefix=LIMIT-3 %s

				target datalayout = "e-m:e-p:32:32-i64:64-v128:64:128-a:0:32-n32-S64"


				define void @test2(i32* noalias %P, i32* noalias %Q, i32* noalias %R) {
				; CHECK-LABEL: @test2(
				; CHECK-NEXT: br i1 true, label [[BB1:%.]], label [[BB2:%.]]
				; CHECK: bb1:
				; CHECK-NEXT: br label [[BB3:%.*]]
				; CHECK: bb2:
				; CHECK-NEXT: br label [[BB3]]
				; CHECK: bb3:
				; CHECK-NEXT: store i32 0, i32* [[P:%.*]]
				; CHECK-NEXT: ret void
				;
				; NO-LIMIT-LABEL: @test2(
				; NO-LIMIT-NEXT: br i1 true, label [[BB1:%.]], label [[BB2:%.]]
				; NO-LIMIT: bb1:
				; NO-LIMIT-NEXT: br label [[BB3:%.*]]
				; NO-LIMIT: bb2:
				; NO-LIMIT-NEXT: br label [[BB3]]
				; NO-LIMIT: bb3:
				; NO-LIMIT-NEXT: store i32 0, i32* [[Q:%.*]]
				; NO-LIMIT-NEXT: store i32 0, i32* [[R:%.*]]
				; NO-LIMIT-NEXT: store i32 0, i32* [[P:%.*]]
				; NO-LIMIT-NEXT: ret void
				;
				; LIMIT-0-LABEL: @test2(
				; LIMIT-0-NEXT: store i32 1, i32* [[P:%.*]]
				; LIMIT-0-NEXT: br i1 true, label [[BB1:%.]], label [[BB2:%.]]
				; LIMIT-0: bb1:
				; LIMIT-0-NEXT: br label [[BB3:%.*]]
				; LIMIT-0: bb2:
				; LIMIT-0-NEXT: br label [[BB3]]
				; LIMIT-0: bb3:
				; LIMIT-0-NEXT: store i32 0, i32* [[Q:%.*]]
				; LIMIT-0-NEXT: store i32 0, i32* [[R:%.*]]
				; LIMIT-0-NEXT: store i32 0, i32* [[P]]
				; LIMIT-0-NEXT: ret void
				;
				; LIMIT-2-LABEL: @test2(
				; LIMIT-2-NEXT: store i32 1, i32* [[P:%.*]]
				; LIMIT-2-NEXT: br i1 true, label [[BB1:%.]], label [[BB2:%.]]
				; LIMIT-2: bb1:
				; LIMIT-2-NEXT: br label [[BB3:%.*]]
				; LIMIT-2: bb2:
				; LIMIT-2-NEXT: br label [[BB3]]
				; LIMIT-2: bb3:
				; LIMIT-2-NEXT: store i32 0, i32* [[Q:%.*]]
				; LIMIT-2-NEXT: store i32 0, i32* [[R:%.*]]
				; LIMIT-2-NEXT: store i32 0, i32* [[P]]
				; LIMIT-2-NEXT: ret void
				;
				; LIMIT-3-LABEL: @test2(
				; LIMIT-3-NEXT: br i1 true, label [[BB1:%.]], label [[BB2:%.]]
				; LIMIT-3: bb1:
				; LIMIT-3-NEXT: br label [[BB3:%.*]]
				; LIMIT-3: bb2:
				; LIMIT-3-NEXT: br label [[BB3]]
				; LIMIT-3: bb3:
				; LIMIT-3-NEXT: store i32 0, i32* [[Q:%.*]]
				; LIMIT-3-NEXT: store i32 0, i32* [[R:%.*]]
				; LIMIT-3-NEXT: store i32 0, i32* [[P:%.*]]
				; LIMIT-3-NEXT: ret void
				;
				store i32 1, i32* %P
				br i1 true, label %bb1, label %bb2
				bb1:
				br label %bb3
				bb2:
				br label %bb3
				bb3:
				store i32 0, i32* %Q
				store i32 0, i32* %R
				store i32 0, i32* %P
				ret void
				}

llvm/test/Transforms/DeadStoreElimination/MSSA/memset-and-memcpy.ll

This file was added.

				; NOTE: Assertions have been autogenerated by utils/update_test_checks.py
				; RUN: opt < %s -basicaa -dse -enable-dse-memoryssa -S \| FileCheck %s
				; RUN: opt < %s -aa-pipeline=basic-aa -passes=dse -enable-dse-memoryssa -S \| FileCheck %s
				target datalayout = "E-p:64:64:64-a0:0:8-f32:32:32-f64:64:64-i1:8:8-i8:8:8-i16:16:16-i32:32:32-i64:32:64-v64:64:64-v128:128:128"

				declare void @llvm.memset.p0i8.i64(i8* nocapture, i8, i64, i1) nounwind
				declare void @llvm.memset.element.unordered.atomic.p0i8.i64(i8* nocapture, i8, i64, i32) nounwind
				declare void @llvm.memcpy.p0i8.p0i8.i64(i8* nocapture, i8* nocapture, i64, i1) nounwind
				declare void @llvm.memcpy.element.unordered.atomic.p0i8.p0i8.i64(i8* nocapture, i8* nocapture, i64, i32) nounwind
				declare void @llvm.init.trampoline(i8, i8, i8*)


				;; Overwrite of memset by memcpy.
				define void @test17(i8* %P, i8* noalias %Q) nounwind ssp {
				; CHECK-LABEL: @test17(
				; CHECK-NEXT: tail call void @llvm.memcpy.p0i8.p0i8.i64(i8* [[P:%.]], i8 [[Q:%.*]], i64 12, i1 false)
				; CHECK-NEXT: ret void
				;
				tail call void @llvm.memset.p0i8.i64(i8* %P, i8 42, i64 8, i1 false)
				tail call void @llvm.memcpy.p0i8.p0i8.i64(i8* %P, i8* %Q, i64 12, i1 false)
				ret void
				}

				;; Overwrite of memset by memcpy.
				define void @test17_atomic(i8* %P, i8* noalias %Q) nounwind ssp {
				; CHECK-LABEL: @test17_atomic(
				; CHECK-NEXT: tail call void @llvm.memcpy.element.unordered.atomic.p0i8.p0i8.i64(i8* align 1 [[P:%.]], i8 align 1 [[Q:%.*]], i64 12, i32 1)
				; CHECK-NEXT: ret void
				;
				tail call void @llvm.memset.element.unordered.atomic.p0i8.i64(i8* align 1 %P, i8 42, i64 8, i32 1)
				tail call void @llvm.memcpy.element.unordered.atomic.p0i8.p0i8.i64(i8* align 1 %P, i8* align 1 %Q, i64 12, i32 1)
				ret void
				}

				;; Overwrite of memset by memcpy. Overwrite is stronger atomicity. We can
				;; remove the memset.
				define void @test17_atomic_weaker(i8* %P, i8* noalias %Q) nounwind ssp {
				; CHECK-LABEL: @test17_atomic_weaker(
				; CHECK-NEXT: tail call void @llvm.memcpy.element.unordered.atomic.p0i8.p0i8.i64(i8* align 1 [[P:%.]], i8 align 1 [[Q:%.*]], i64 12, i32 1)
				; CHECK-NEXT: ret void
				;
				tail call void @llvm.memset.p0i8.i64(i8* align 1 %P, i8 42, i64 8, i1 false)
				tail call void @llvm.memcpy.element.unordered.atomic.p0i8.p0i8.i64(i8* align 1 %P, i8* align 1 %Q, i64 12, i32 1)
				ret void
				}

				;; Overwrite of memset by memcpy. Overwrite is weaker atomicity. We can remove
				;; the memset.
				define void @test17_atomic_weaker_2(i8* %P, i8* noalias %Q) nounwind ssp {
				; CHECK-LABEL: @test17_atomic_weaker_2(
				; CHECK-NEXT: tail call void @llvm.memcpy.p0i8.p0i8.i64(i8* align 1 [[P:%.]], i8 align 1 [[Q:%.*]], i64 12, i1 false)
				; CHECK-NEXT: ret void
				;
				tail call void @llvm.memset.element.unordered.atomic.p0i8.i64(i8* align 1 %P, i8 42, i64 8, i32 1)
				tail call void @llvm.memcpy.p0i8.p0i8.i64(i8* align 1 %P, i8* align 1 %Q, i64 12, i1 false)
				ret void
				}

				; Should not delete the volatile memset.
				define void @test17v(i8* %P, i8* %Q) nounwind ssp {
				; CHECK-LABEL: @test17v(
				; CHECK-NEXT: tail call void @llvm.memcpy.p0i8.p0i8.i64(i8* [[P:%.]], i8 [[Q:%.*]], i64 12, i1 false)
				; CHECK-NEXT: ret void
				;
				tail call void @llvm.memset.p0i8.i64(i8* %P, i8 42, i64 8, i1 true)
				tail call void @llvm.memcpy.p0i8.p0i8.i64(i8* %P, i8* %Q, i64 12, i1 false)
				ret void
				}

				; According to the current LangRef, memcpy's source and destination cannot
				; overlap, hence the first memcpy is dead.
				;
				; Previously this was not allowed (PR8728), also discussed in PR11763.
				define void @test18(i8* %P, i8* %Q, i8* %R) nounwind ssp {
				; CHECK-LABEL: @test18(
				; CHECK-NEXT: tail call void @llvm.memcpy.p0i8.p0i8.i64(i8* [[P:%.]], i8 [[R:%.*]], i64 12, i1 false)
				; CHECK-NEXT: ret void
				;
				tail call void @llvm.memcpy.p0i8.p0i8.i64(i8* %P, i8* %Q, i64 12, i1 false)
				tail call void @llvm.memcpy.p0i8.p0i8.i64(i8* %P, i8* %R, i64 12, i1 false)
				ret void
				}

				define void @test18_atomic(i8* %P, i8* %Q, i8* %R) nounwind ssp {
				; CHECK-LABEL: @test18_atomic(
				; CHECK-NEXT: tail call void @llvm.memcpy.element.unordered.atomic.p0i8.p0i8.i64(i8* align 1 [[P:%.]], i8 align 1 [[R:%.*]], i64 12, i32 1)
				; CHECK-NEXT: ret void
				;
				tail call void @llvm.memcpy.element.unordered.atomic.p0i8.p0i8.i64(i8* align 1 %P, i8* align 1 %Q, i64 12, i32 1)
				tail call void @llvm.memcpy.element.unordered.atomic.p0i8.p0i8.i64(i8* align 1 %P, i8* align 1 %R, i64 12, i32 1)
				ret void
				}

llvm/test/Transforms/DeadStoreElimination/MSSA/multiblock-captures.ll

This file was added.

				; NOTE: Assertions have been autogenerated by utils/update_test_checks.py
				; RUN: opt < %s -basicaa -dse -enable-dse-memoryssa -S \| FileCheck %s

				target datalayout = "e-m:e-p:32:32-i64:64-v128:64:128-a:0:32-n32-S64"

				declare noalias i8* @malloc(i64)

				declare void @foo()
				declare void @capture(i8*)

				; Check that we do not remove the second store, as %m is returned.
				define i8* @test_return_captures_1() {
				; CHECK-LABEL: @test_return_captures_1(
				; CHECK-NEXT: [[M:%.]] = call i8 @malloc(i64 24)
				; CHECK-NEXT: store i8 1, i8* [[M]]
				; CHECK-NEXT: ret i8* [[M]]
				;
				%m = call i8* @malloc(i64 24)
				store i8 0, i8* %m
				store i8 1, i8* %m
				ret i8* %m
				}

				; Same as @test_return_captures_1, but across BBs.
				define i8* @test_return_captures_2() {
				; CHECK-LABEL: @test_return_captures_2(
				; CHECK-NEXT: [[M:%.]] = call i8 @malloc(i64 24)
				; CHECK-NEXT: br label [[EXIT:%.*]]
				; CHECK: exit:
				; CHECK-NEXT: store i8 1, i8* [[M]]
				; CHECK-NEXT: ret i8* [[M]]
				;
				%m = call i8* @malloc(i64 24)
				store i8 0, i8* %m
				br label %exit

				exit:
				store i8 1, i8* %m
				ret i8* %m
				}


				%S1 = type { i8 * }

				; We cannot remove the last store to %m, because it escapes by storing it to %E.
				define void @test_malloc_capture_1(%S1* %E) {
				; CHECK-LABEL: @test_malloc_capture_1(
				; CHECK-NEXT: [[M:%.]] = call i8 @malloc(i64 24)
				; CHECK-NEXT: br label [[EXIT:%.*]]
				; CHECK: exit:
				; CHECK-NEXT: [[F_PTR:%.]] = getelementptr [[S1:%.]], %S1* [[E:%.*]], i32 0, i32 0
				; CHECK-NEXT: store i8* [[M]], i8** [[F_PTR]]
				; CHECK-NEXT: store i8 1, i8* [[M]]
				; CHECK-NEXT: ret void
				;
				%m = call i8* @malloc(i64 24)
				br label %exit

				exit:
				%f.ptr = getelementptr %S1, %S1* %E, i32 0, i32 0
				store i8* %m, i8** %f.ptr
				store i8 1, i8* %m
				ret void
				}

				; Check we do not eliminate either store. The first one cannot be eliminated,
				; due to the call of @capture. The second one because %m escapes.
				define i8* @test_malloc_capture_2() {
				; CHECK-LABEL: @test_malloc_capture_2(
				; CHECK-NEXT: [[M:%.]] = call i8 @malloc(i64 24)
				; CHECK-NEXT: store i8 0, i8* [[M]]
				; CHECK-NEXT: call void @capture(i8* [[M]])
				; CHECK-NEXT: br label [[EXIT:%.*]]
				; CHECK: exit:
				; CHECK-NEXT: store i8 1, i8* [[M]]
				; CHECK-NEXT: ret i8* [[M]]
				;
				%m = call i8* @malloc(i64 24)
				store i8 0, i8* %m
				call void @capture(i8* %m)
				br label %exit

				exit:
				store i8 1, i8* %m
				ret i8* %m
				}

				; We can remove the first store store i8 0, i8* %m because there are no throwing
				; instructions between the 2 stores and also %m escapes after the killing store.
				define i8* @test_malloc_capture_3() {
				; CHECK-LABEL: @test_malloc_capture_3(
				; CHECK-NEXT: [[M:%.]] = call i8 @malloc(i64 24)
				; CHECK-NEXT: br label [[EXIT:%.*]]
				; CHECK: exit:
				; CHECK-NEXT: store i8 1, i8* [[M]]
				; CHECK-NEXT: call void @capture(i8* [[M]])
				; CHECK-NEXT: ret i8* [[M]]
				;
				%m = call i8* @malloc(i64 24)
				store i8 0, i8* %m
				br label %exit

				exit:
				store i8 1, i8* %m
				call void @capture(i8* %m)
				ret i8* %m
				}

				; TODO: We could remove the first store store i8 0, i8* %m because %m escapes
				; after the killing store.
				define i8* @test_malloc_capture_4() {
				; CHECK-LABEL: @test_malloc_capture_4(
				; CHECK-NEXT: [[M:%.]] = call i8 @malloc(i64 24)
				; CHECK-NEXT: store i8 0, i8* [[M]]
				; CHECK-NEXT: call void @may_throw_readnone()
				; CHECK-NEXT: br label [[EXIT:%.*]]
				; CHECK: exit:
				; CHECK-NEXT: store i8 1, i8* [[M]]
				; CHECK-NEXT: call void @capture(i8* [[M]])
				; CHECK-NEXT: ret i8* [[M]]
				;

				%m = call i8* @malloc(i64 24)
				store i8 0, i8* %m
				call void @may_throw_readnone()
				br label %exit

				exit:
				store i8 1, i8* %m
				call void @capture(i8* %m)
				ret i8* %m
				}


				; We cannot remove the first store store i8 0, i8* %m because %m escapes
				; before the killing store and we may throw in between.
				define i8* @test_malloc_capture_5() {
				; CHECK-LABEL: @test_malloc_capture_5(
				; CHECK-NEXT: [[M:%.]] = call i8 @malloc(i64 24)
				; CHECK-NEXT: call void @capture(i8* [[M]])
				; CHECK-NEXT: store i8 0, i8* [[M]]
				; CHECK-NEXT: call void @may_throw_readnone()
				; CHECK-NEXT: br label [[EXIT:%.*]]
				; CHECK: exit:
				; CHECK-NEXT: store i8 1, i8* [[M]]
				; CHECK-NEXT: ret i8* [[M]]
				;

				%m = call i8* @malloc(i64 24)
				call void @capture(i8* %m)
				store i8 0, i8* %m
				call void @may_throw_readnone()
				br label %exit

				exit:
				store i8 1, i8* %m
				ret i8* %m
				}


				; TODO: We could remove the first store 'store i8 0, i8* %m' even though there
				; is a throwing instruction between them, because %m escapes after the killing
				; store.
				define i8* @test_malloc_capture_6() {
				; CHECK-LABEL: @test_malloc_capture_6(
				; CHECK-NEXT: [[M:%.]] = call i8 @malloc(i64 24)
				; CHECK-NEXT: store i8 0, i8* [[M]]
				; CHECK-NEXT: call void @may_throw_readnone()
				; CHECK-NEXT: br label [[EXIT:%.*]]
				; CHECK: exit:
				; CHECK-NEXT: store i8 1, i8* [[M]]
				; CHECK-NEXT: call void @capture(i8* [[M]])
				; CHECK-NEXT: ret i8* [[M]]
				;

				%m = call i8* @malloc(i64 24)
				store i8 0, i8* %m
				call void @may_throw_readnone()
				br label %exit

				exit:
				store i8 1, i8* %m
				call void @capture(i8* %m)
				ret i8* %m
				}

				; We can remove the first store 'store i8 0, i8* %m' even though there is a
				; throwing instruction between them, because %m escapes after the killing store.
				define i8* @test_malloc_capture_7() {
				; CHECK-LABEL: @test_malloc_capture_7(
				; CHECK-NEXT: [[M:%.]] = call i8 @malloc(i64 24)
				; CHECK-NEXT: call void @may_throw()
				; CHECK-NEXT: br label [[EXIT:%.*]]
				; CHECK: exit:
				; CHECK-NEXT: store i8 1, i8* [[M]]
				; CHECK-NEXT: call void @capture(i8* [[M]])
				; CHECK-NEXT: ret i8* [[M]]
				;

				%m = call i8* @malloc(i64 24)
				store i8 0, i8* %m
				call void @may_throw()
				br label %exit

				exit:
				store i8 1, i8* %m
				call void @capture(i8* %m)
				ret i8* %m
				}
				; TODO: Remove store in exit.
				; Stores to stack objects can be eliminated if they are not captured inside the function.
				define void @test_alloca_nocapture_1() {
				; CHECK-LABEL: @test_alloca_nocapture_1(
				; CHECK-NEXT: [[M:%.*]] = alloca i8
				; CHECK-NEXT: call void @foo()
				; CHECK-NEXT: br label [[EXIT:%.*]]
				; CHECK: exit:
				; CHECK-NEXT: store i8 1, i8* [[M]]
				; CHECK-NEXT: ret void
				;
				%m = alloca i8
				store i8 0, i8* %m
				call void @foo()
				br label %exit

				exit:
				store i8 1, i8* %m
				ret void
				}

				; TODO: Remove store in exit.
				; Cannot remove first store i8 0, i8* %m, as the call to @capture captures the object.
				define void @test_alloca_capture_1() {
				; CHECK-LABEL: @test_alloca_capture_1(
				; CHECK-NEXT: [[M:%.*]] = alloca i8
				; CHECK-NEXT: store i8 0, i8* [[M]]
				; CHECK-NEXT: call void @capture(i8* [[M]])
				; CHECK-NEXT: br label [[EXIT:%.*]]
				; CHECK: exit:
				; CHECK-NEXT: store i8 1, i8* [[M]]
				; CHECK-NEXT: ret void
				;
				%m = alloca i8
				store i8 0, i8* %m
				call void @capture(i8* %m)
				br label %exit

				exit:
				store i8 1, i8* %m
				ret void
				}

				; TODO: Remove store at exit.
				; We can remove the last store to %m, even though it escapes because the alloca
				; becomes invalid after the function returns.
				define void @test_alloca_capture_2(%S1* %E) {
				; CHECK-LABEL: @test_alloca_capture_2(
				; CHECK-NEXT: [[M:%.*]] = alloca i8
				; CHECK-NEXT: br label [[EXIT:%.*]]
				; CHECK: exit:
				; CHECK-NEXT: [[F_PTR:%.]] = getelementptr [[S1:%.]], %S1* [[E:%.*]], i32 0, i32 0
				; CHECK-NEXT: store i8* [[M]], i8** [[F_PTR]]
				; CHECK-NEXT: store i8 1, i8* [[M]]
				; CHECK-NEXT: ret void
				;
				%m = alloca i8
				br label %exit

				exit:
				%f.ptr = getelementptr %S1, %S1* %E, i32 0, i32 0
				store i8* %m, i8** %f.ptr
				store i8 1, i8* %m
				ret void
				}

				; Readnone functions are not modeled in MemorySSA, but could throw.
				; Make sure we do not eliminate the first store 'store i8 2, i8* %call'
				define void @malloc_capture_throw_1() {
				; CHECK-LABEL: @malloc_capture_throw_1(
				; CHECK-NEXT: [[CALL:%.]] = call i8 @malloc(i64 1)
				; CHECK-NEXT: call void @may_capture(i8* [[CALL]])
				; CHECK-NEXT: store i8 2, i8* [[CALL]], align 1
				; CHECK-NEXT: call void @may_throw_readnone()
				; CHECK-NEXT: store i8 3, i8* [[CALL]], align 1
				; CHECK-NEXT: ret void
				;
				%call = call i8* @malloc(i64 1)
				call void @may_capture(i8* %call)
				store i8 2, i8* %call, align 1
				call void @may_throw_readnone()
				store i8 3, i8* %call, align 1
				ret void
				}

				; Readnone functions are not modeled in MemorySSA, but could throw.
				; Make sure we do not eliminate the first store 'store i8 2, i8* %call'
				define void @malloc_capture_throw_2() {
				; CHECK-LABEL: @malloc_capture_throw_2(
				; CHECK-NEXT: [[CALL:%.]] = call i8 @malloc(i64 1)
				; CHECK-NEXT: call void @may_capture(i8* [[CALL]])
				; CHECK-NEXT: store i8 2, i8* [[CALL]], align 1
				; CHECK-NEXT: br label [[BB:%.*]]
				; CHECK: bb:
				; CHECK-NEXT: call void @may_throw_readnone()
				; CHECK-NEXT: store i8 3, i8* [[CALL]], align 1
				; CHECK-NEXT: ret void
				;
				%call = call i8* @malloc(i64 1)
				call void @may_capture(i8* %call)
				store i8 2, i8* %call, align 1
				br label %bb

				bb:
				call void @may_throw_readnone()
				store i8 3, i8* %call, align 1
				ret void
				}


				declare void @may_capture(i8*)
				declare void @may_throw_readnone() readnone
				declare void @may_throw()

llvm/test/Transforms/DeadStoreElimination/MSSA/multiblock-exceptions.ll

This file was added.

				; NOTE: Assertions have been autogenerated by utils/update_test_checks.py
				; RUN: opt < %s -basicaa -dse -enable-dse-memoryssa -S \| FileCheck %s
				target datalayout = "e-p:64:64:64-i1:8:8-i8:8:8-i16:16:16-i32:32:32-i64:64:64-f32:32:32-f64:64:64-v64:64:64-v128:128:128-a0:0:64-s0:64:64-f80:128:128-n8:16:32:64"

				declare void @f()
				declare i32 @__CxxFrameHandler3(...)


				; Make sure we do not eliminate `store i32 20, i32* %sv`. Even though it is a store
				; to a stack object, we can read it in the landing/catchpad.
				define void @test12(i32* %p) personality i32 (...)* @__CxxFrameHandler3 {
				; CHECK-LABEL: @test12(
				; CHECK-NEXT: block1:
				; CHECK-NEXT: [[SV:%.*]] = alloca i32
				; CHECK-NEXT: br label [[BLOCK2:%.*]]
				; CHECK: block2:
				; CHECK-NEXT: store i32 20, i32* [[SV]]
				; CHECK-NEXT: invoke void @f()
				; CHECK-NEXT: to label [[BLOCK3:%.]] unwind label [[CATCH_DISPATCH:%.]]
				; CHECK: block3:
				; CHECK-NEXT: store i32 30, i32* [[SV]]
				; CHECK-NEXT: br label [[EXIT:%.*]]
				; CHECK: catch.dispatch:
				; CHECK-NEXT: [[CS1:%.]] = catchswitch within none [label %catch] unwind label [[CLEANUP:%.]]
				; CHECK: catch:
				; CHECK-NEXT: [[C:%.*]] = catchpad within [[CS1]] []
				; CHECK-NEXT: [[LV:%.]] = load i32, i32 [[SV]]
				; CHECK-NEXT: br label [[EXIT]]
				; CHECK: cleanup:
				; CHECK-NEXT: [[C1:%.*]] = cleanuppad within none []
				; CHECK-NEXT: br label [[EXIT]]
				; CHECK: exit:
				; CHECK-NEXT: store i32 40, i32* [[SV]]
				; CHECK-NEXT: ret void
				;
				block1:
				%sv = alloca i32
				br label %block2

				block2:
				store i32 20, i32* %sv
				invoke void @f()
				to label %block3 unwind label %catch.dispatch

				block3:
				store i32 30, i32* %sv
				br label %exit

				catch.dispatch:
				%cs1 = catchswitch within none [label %catch] unwind label %cleanup

				catch:
				%c = catchpad within %cs1 []
				%lv = load i32, i32* %sv
				br label %exit

				cleanup:
				%c1 = cleanuppad within none []
				br label %exit

				exit:
				store i32 40, i32* %sv
				ret void
				}

llvm/test/Transforms/DeadStoreElimination/MSSA/multiblock-loops.ll

This file was added.

				; NOTE: Assertions have been autogenerated by utils/update_test_checks.py
				; RUN: opt < %s -basicaa -dse -enable-dse-memoryssa -S \| FileCheck %s

				target datalayout = "e-m:e-p:32:32-i64:64-v128:64:128-a:0:32-n32-S64"
				declare void @llvm.memset.p0i8.i64(i8* nocapture, i8, i64, i32, i1) nounwind

				define void @test13(i32* noalias %P) {
				; CHECK-LABEL: @test13(
				; CHECK-NEXT: entry:
				; CHECK-NEXT: br label [[FOR:%.*]]
				; CHECK: for:
				; CHECK-NEXT: store i32 0, i32* [[P:%.*]]
				; CHECK-NEXT: br i1 false, label [[FOR]], label [[END:%.*]]
				; CHECK: end:
				; CHECK-NEXT: ret void
				;
				entry:
				br label %for
				for:
				store i32 0, i32* %P
				br i1 false, label %for, label %end
				end:
				ret void
				}


				define void @test14(i32* noalias %P) {
				; CHECK-LABEL: @test14(
				; CHECK-NEXT: entry:
				; CHECK-NEXT: store i32 1, i32* [[P:%.*]]
				; CHECK-NEXT: br label [[FOR:%.*]]
				; CHECK: for:
				; CHECK-NEXT: store i32 0, i32* [[P]]
				; CHECK-NEXT: br i1 false, label [[FOR]], label [[END:%.*]]
				; CHECK: end:
				; CHECK-NEXT: ret void
				;
				entry:
				store i32 1, i32* %P
				br label %for
				for:
				store i32 0, i32* %P
				br i1 false, label %for, label %end
				end:
				ret void
				}

				define void @test18(i32* noalias %P) {
				; CHECK-LABEL: @test18(
				; CHECK-NEXT: entry:
				; CHECK-NEXT: [[P2:%.]] = bitcast i32 [[P:%.]] to i8
				; CHECK-NEXT: store i32 0, i32* [[P]]
				; CHECK-NEXT: br label [[FOR:%.*]]
				; CHECK: for:
				; CHECK-NEXT: store i8 1, i8* [[P2]]
				; CHECK-NEXT: [[X:%.]] = load i32, i32 [[P]]
				; CHECK-NEXT: store i8 2, i8* [[P2]]
				; CHECK-NEXT: br i1 false, label [[FOR]], label [[END:%.*]]
				; CHECK: end:
				; CHECK-NEXT: ret void
				;
				entry:
				%P2 = bitcast i32* %P to i8*
				store i32 0, i32* %P
				br label %for
				for:
				store i8 1, i8* %P2
				%x = load i32, i32* %P
				store i8 2, i8* %P2
				br i1 false, label %for, label %end
				end:
				ret void
				}

				define void @test21(i32* noalias %P) {
				; CHECK-LABEL: @test21(
				; CHECK-NEXT: entry:
				; CHECK-NEXT: [[ARRAYIDX0:%.]] = getelementptr inbounds i32, i32 [[P:%.*]], i64 1
				; CHECK-NEXT: [[P3:%.]] = bitcast i32 [[ARRAYIDX0]] to i8*
				; CHECK-NEXT: call void @llvm.memset.p0i8.i64(i8* align 4 [[P3]], i8 0, i64 28, i1 false)
				; CHECK-NEXT: br label [[FOR:%.*]]
				; CHECK: for:
				; CHECK-NEXT: [[ARRAYIDX1:%.]] = getelementptr inbounds i32, i32 [[P]], i64 1
				; CHECK-NEXT: store i32 1, i32* [[ARRAYIDX1]], align 4
				; CHECK-NEXT: br i1 false, label [[FOR]], label [[END:%.*]]
				; CHECK: end:
				; CHECK-NEXT: ret void
				;
				entry:
				%arrayidx0 = getelementptr inbounds i32, i32* %P, i64 1
				%p3 = bitcast i32* %arrayidx0 to i8*
				call void @llvm.memset.p0i8.i64(i8* %p3, i8 0, i64 28, i32 4, i1 false)
				br label %for
				for:
				%arrayidx1 = getelementptr inbounds i32, i32* %P, i64 1
				store i32 1, i32* %arrayidx1, align 4
				br i1 false, label %for, label %end
				end:
				ret void
				}

				define void @test_loop(i32 %N, i32* noalias nocapture readonly %A, i32* noalias nocapture readonly %x, i32* noalias nocapture %b) local_unnamed_addr {
				; CHECK-LABEL: @test_loop(
				; CHECK-NEXT: entry:
				; CHECK-NEXT: [[CMP27:%.]] = icmp sgt i32 [[N:%.]], 0
				; CHECK-NEXT: br i1 [[CMP27]], label [[FOR_BODY4_LR_PH_PREHEADER:%.]], label [[FOR_COND_CLEANUP:%.]]
				; CHECK: for.body4.lr.ph.preheader:
				; CHECK-NEXT: br label [[FOR_BODY4_LR_PH:%.*]]
				; CHECK: for.cond.cleanup:
				; CHECK-NEXT: ret void
				; CHECK: for.body4.lr.ph:
				; CHECK-NEXT: [[I_028:%.]] = phi i32 [ [[INC11:%.]], [[FOR_COND_CLEANUP3:%.*]] ], [ 0, [[FOR_BODY4_LR_PH_PREHEADER]] ]
				; CHECK-NEXT: [[ARRAYIDX:%.]] = getelementptr inbounds i32, i32 [[B:%.*]], i32 [[I_028]]
				; CHECK-NEXT: [[MUL:%.*]] = mul nsw i32 [[I_028]], [[N]]
				; CHECK-NEXT: br label [[FOR_BODY4:%.*]]
				; CHECK: for.body4:
				; CHECK-NEXT: [[TMP0:%.]] = phi i32 [ 0, [[FOR_BODY4_LR_PH]] ], [ [[ADD9:%.]], [[FOR_BODY4]] ]
				; CHECK-NEXT: [[J_026:%.]] = phi i32 [ 0, [[FOR_BODY4_LR_PH]] ], [ [[INC:%.]], [[FOR_BODY4]] ]
				; CHECK-NEXT: [[ADD:%.*]] = add nsw i32 [[J_026]], [[MUL]]
				; CHECK-NEXT: [[ARRAYIDX5:%.]] = getelementptr inbounds i32, i32 [[A:%.*]], i32 [[ADD]]
				; CHECK-NEXT: [[TMP1:%.]] = load i32, i32 [[ARRAYIDX5]], align 4
				; CHECK-NEXT: [[ARRAYIDX6:%.]] = getelementptr inbounds i32, i32 [[X:%.*]], i32 [[J_026]]
				; CHECK-NEXT: [[TMP2:%.]] = load i32, i32 [[ARRAYIDX6]], align 4
				; CHECK-NEXT: [[MUL7:%.*]] = mul nsw i32 [[TMP2]], [[TMP1]]
				; CHECK-NEXT: [[ADD9]] = add nsw i32 [[MUL7]], [[TMP0]]
				; CHECK-NEXT: [[INC]] = add nuw nsw i32 [[J_026]], 1
				; CHECK-NEXT: [[EXITCOND:%.*]] = icmp eq i32 [[INC]], [[N]]
				; CHECK-NEXT: br i1 [[EXITCOND]], label [[FOR_COND_CLEANUP3]], label [[FOR_BODY4]]
				; CHECK: for.cond.cleanup3:
				; CHECK-NEXT: store i32 [[ADD9]], i32* [[ARRAYIDX]], align 4
				; CHECK-NEXT: [[INC11]] = add nuw nsw i32 [[I_028]], 1
				; CHECK-NEXT: [[EXITCOND29:%.*]] = icmp eq i32 [[INC11]], [[N]]
				; CHECK-NEXT: br i1 [[EXITCOND29]], label [[FOR_COND_CLEANUP]], label [[FOR_BODY4_LR_PH]]
				;
				entry:
				%cmp27 = icmp sgt i32 %N, 0
				br i1 %cmp27, label %for.body4.lr.ph.preheader, label %for.cond.cleanup

				for.body4.lr.ph.preheader: ; preds = %entry
				br label %for.body4.lr.ph

				for.cond.cleanup: ; preds = %for.cond.cleanup3, %entry
				ret void

				for.body4.lr.ph: ; preds = %for.body4.lr.ph.preheader, %for.cond.cleanup3
				%i.028 = phi i32 [ %inc11, %for.cond.cleanup3 ], [ 0, %for.body4.lr.ph.preheader ]
				%arrayidx = getelementptr inbounds i32, i32* %b, i32 %i.028
				store i32 0, i32* %arrayidx, align 4
				%mul = mul nsw i32 %i.028, %N
				br label %for.body4

				for.body4: ; preds = %for.body4, %for.body4.lr.ph
				%0 = phi i32 [ 0, %for.body4.lr.ph ], [ %add9, %for.body4 ]
				%j.026 = phi i32 [ 0, %for.body4.lr.ph ], [ %inc, %for.body4 ]
				%add = add nsw i32 %j.026, %mul
				%arrayidx5 = getelementptr inbounds i32, i32* %A, i32 %add
				%1 = load i32, i32* %arrayidx5, align 4
				%arrayidx6 = getelementptr inbounds i32, i32* %x, i32 %j.026
				%2 = load i32, i32* %arrayidx6, align 4
				%mul7 = mul nsw i32 %2, %1
				%add9 = add nsw i32 %mul7, %0
				%inc = add nuw nsw i32 %j.026, 1
				%exitcond = icmp eq i32 %inc, %N
				br i1 %exitcond, label %for.cond.cleanup3, label %for.body4

				for.cond.cleanup3: ; preds = %for.body4
				store i32 %add9, i32* %arrayidx, align 4
				%inc11 = add nuw nsw i32 %i.028, 1
				%exitcond29 = icmp eq i32 %inc11, %N
				br i1 %exitcond29, label %for.cond.cleanup, label %for.body4.lr.ph
				}

llvm/test/Transforms/DeadStoreElimination/MSSA/multiblock-malloc-free.ll

This file was added.

				; NOTE: Assertions have been autogenerated by utils/update_test_checks.py

				; XFAIL: *
				; TODO: Handling of free not implemented yet.

				; RUN: opt < %s -basicaa -dse -enable-dse-memoryssa -S \| FileCheck %s

				target datalayout = "e-m:e-p:32:32-i64:64-v128:64:128-a:0:32-n32-S64"
				declare void @unknown_func()
				declare void @llvm.lifetime.start.p0i8(i64, i8* nocapture) nounwind
				declare void @llvm.lifetime.end.p0i8(i64, i8* nocapture) nounwind
				declare void @llvm.memcpy.p0i8.p0i8.i64(i8* nocapture, i8* nocapture, i64, i1) nounwind
				declare void @llvm.memset.p0i8.i64(i8* nocapture, i8, i64, i32, i1) nounwind
				declare noalias i8* @malloc(i32)
				declare void @free(i8* nocapture)

				define void @test16(i32* noalias %P) {
				; CHECK-LABEL: @test16(
				; CHECK-NEXT: [[P2:%.]] = bitcast i32 [[P:%.]] to i8
				; CHECK-NEXT: br i1 true, label [[BB1:%.]], label [[BB3:%.]]
				; CHECK: bb1:
				; CHECK-NEXT: br label [[BB3]]
				; CHECK: bb3:
				; CHECK-NEXT: call void @free(i8* [[P2]])
				; CHECK-NEXT: ret void
				;
				%P2 = bitcast i32* %P to i8*
				store i32 1, i32* %P
				br i1 true, label %bb1, label %bb3
				bb1:
				store i32 1, i32* %P
				br label %bb3
				bb3:
				call void @free(i8* %P2)
				ret void
				}


				define void @test17(i32* noalias %P) {
				; CHECK-LABEL: @test17(
				; CHECK-NEXT: [[P2:%.]] = bitcast i32 [[P:%.]] to i8
				; CHECK-NEXT: store i32 1, i32* [[P]]
				; CHECK-NEXT: br i1 true, label [[BB1:%.]], label [[BB3:%.]]
				; CHECK: bb1:
				; CHECK-NEXT: call void @unknown_func()
				; CHECK-NEXT: br label [[BB3]]
				; CHECK: bb3:
				; CHECK-NEXT: call void @free(i8* [[P2]])
				; CHECK-NEXT: ret void
				;
				%P2 = bitcast i32* %P to i8*
				store i32 1, i32* %P
				br i1 true, label %bb1, label %bb3
				bb1:
				call void @unknown_func()
				store i32 1, i32* %P
				br label %bb3
				bb3:
				call void @free(i8* %P2)
				ret void
				}

				define void @test6(i32* noalias %P) {
				; CHECK-LABEL: @test6(
				; CHECK-NEXT: store i32 0, i32* [[P:%.*]]
				; CHECK-NEXT: br i1 true, label [[BB1:%.]], label [[BB2:%.]]
				; CHECK: bb1:
				; CHECK-NEXT: br label [[BB3:%.*]]
				; CHECK: bb2:
				; CHECK-NEXT: call void @unknown_func()
				; CHECK-NEXT: br label [[BB3]]
				; CHECK: bb3:
				; CHECK-NEXT: store i32 0, i32* [[P]]
				; CHECK-NEXT: ret void
				;
				store i32 0, i32* %P
				br i1 true, label %bb1, label %bb2
				bb1:
				br label %bb3
				bb2:
				call void @unknown_func()
				br label %bb3
				bb3:
				store i32 0, i32* %P
				ret void
				}

				define void @test19(i32* noalias %P) {
				; CHECK-LABEL: @test19(
				; CHECK-NEXT: entry:
				; CHECK-NEXT: [[ARRAYIDX0:%.]] = getelementptr inbounds i32, i32 [[P:%.*]], i64 1
				; CHECK-NEXT: [[P3:%.]] = bitcast i32 [[ARRAYIDX0]] to i8*
				; CHECK-NEXT: call void @llvm.memset.p0i8.i64(i8* align 4 [[P3]], i8 0, i64 28, i1 false)
				; CHECK-NEXT: br i1 true, label [[BB1:%.]], label [[BB2:%.]]
				; CHECK: bb1:
				; CHECK-NEXT: br label [[BB3:%.*]]
				; CHECK: bb2:
				; CHECK-NEXT: [[ARRAYIDX1:%.]] = getelementptr inbounds i32, i32 [[P]], i64 1
				; CHECK-NEXT: store i32 1, i32* [[ARRAYIDX1]], align 4
				; CHECK-NEXT: br label [[BB3]]
				; CHECK: bb3:
				; CHECK-NEXT: ret void
				;
				entry:
				%arrayidx0 = getelementptr inbounds i32, i32* %P, i64 1
				%p3 = bitcast i32* %arrayidx0 to i8*
				call void @llvm.memset.p0i8.i64(i8* %p3, i8 0, i64 28, i32 4, i1 false)
				br i1 true, label %bb1, label %bb2
				bb1:
				br label %bb3
				bb2:
				%arrayidx1 = getelementptr inbounds i32, i32* %P, i64 1
				store i32 1, i32* %arrayidx1, align 4
				br label %bb3
				bb3:
				ret void
				}


				define void @test20(i32* noalias %P) {
				; CHECK-LABEL: @test20(
				; CHECK-NEXT: entry:
				; CHECK-NEXT: [[ARRAYIDX0:%.]] = getelementptr inbounds i32, i32 [[P:%.*]], i64 1
				; CHECK-NEXT: [[P3:%.]] = bitcast i32 [[ARRAYIDX0]] to i8*
				; CHECK-NEXT: [[TMP0:%.]] = getelementptr inbounds i8, i8 [[P3]], i64 4
				; CHECK-NEXT: call void @llvm.memset.p0i8.i64(i8* align 4 [[TMP0]], i8 0, i64 24, i1 false)
				; CHECK-NEXT: br i1 true, label [[BB1:%.]], label [[BB2:%.]]
				; CHECK: bb1:
				; CHECK-NEXT: br label [[BB3:%.*]]
				; CHECK: bb2:
				; CHECK-NEXT: br label [[BB3]]
				; CHECK: bb3:
				; CHECK-NEXT: [[ARRAYIDX1:%.]] = getelementptr inbounds i32, i32 [[P]], i64 1
				; CHECK-NEXT: store i32 1, i32* [[ARRAYIDX1]], align 4
				; CHECK-NEXT: ret void
				;
				entry:
				%arrayidx0 = getelementptr inbounds i32, i32* %P, i64 1
				%p3 = bitcast i32* %arrayidx0 to i8*
				call void @llvm.memset.p0i8.i64(i8* %p3, i8 0, i64 28, i32 4, i1 false)
				br i1 true, label %bb1, label %bb2
				bb1:
				br label %bb3
				bb2:
				br label %bb3
				bb3:
				%arrayidx1 = getelementptr inbounds i32, i32* %P, i64 1
				store i32 1, i32* %arrayidx1, align 4
				ret void
				}


				define i32 @test22(i32* %P, i32* noalias %Q, i32* %R) {
				; CHECK-LABEL: @test22(
				; CHECK-NEXT: store i32 2, i32* [[P:%.*]]
				; CHECK-NEXT: store i32 3, i32* [[Q:%.*]]
				; CHECK-NEXT: [[L:%.]] = load i32, i32 [[R:%.*]]
				; CHECK-NEXT: ret i32 [[L]]
				;
				store i32 1, i32* %Q
				store i32 2, i32* %P
				store i32 3, i32* %Q
				%l = load i32, i32* %R
				ret i32 %l
				}


				define void @test23(i32* noalias %P) {
				; CHECK-LABEL: @test23(
				; CHECK-NEXT: br i1 true, label [[BB1:%.]], label [[BB2:%.]]
				; CHECK: bb1:
				; CHECK-NEXT: br label [[BB3:%.*]]
				; CHECK: bb2:
				; CHECK-NEXT: call void @unknown_func()
				; CHECK-NEXT: br label [[BB3]]
				; CHECK: bb3:
				; CHECK-NEXT: store i32 0, i32* [[P:%.*]]
				; CHECK-NEXT: ret void
				;
				br i1 true, label %bb1, label %bb2
				bb1:
				store i32 0, i32* %P
				br label %bb3
				bb2:
				call void @unknown_func()
				br label %bb3
				bb3:
				store i32 0, i32* %P
				ret void
				}


				define void @test24(i32* noalias %P) {
				; CHECK-LABEL: @test24(
				; CHECK-NEXT: br i1 true, label [[BB2:%.]], label [[BB1:%.]]
				; CHECK: bb1:
				; CHECK-NEXT: br label [[BB3:%.*]]
				; CHECK: bb2:
				; CHECK-NEXT: call void @unknown_func()
				; CHECK-NEXT: br label [[BB3]]
				; CHECK: bb3:
				; CHECK-NEXT: store i32 0, i32* [[P:%.*]]
				; CHECK-NEXT: ret void
				;
				br i1 true, label %bb2, label %bb1
				bb1:
				store i32 0, i32* %P
				br label %bb3
				bb2:
				call void @unknown_func()
				br label %bb3
				bb3:
				store i32 0, i32* %P
				ret void
				}

				define i8* @test26() {
				; CHECK-LABEL: @test26(
				; CHECK-NEXT: bb1:
				; CHECK-NEXT: br i1 true, label [[BB2:%.]], label [[BB3:%.]]
				; CHECK: bb2:
				; CHECK-NEXT: [[M:%.]] = call noalias i8 @malloc(i32 10)
				; CHECK-NEXT: store i8 1, i8* [[M]]
				; CHECK-NEXT: br label [[BB3]]
				; CHECK: bb3:
				; CHECK-NEXT: [[R:%.]] = phi i8 [ null, [[BB1:%.*]] ], [ [[M]], [[BB2]] ]
				; CHECK-NEXT: ret i8* [[R]]
				;
				bb1:
				br i1 true, label %bb2, label %bb3
				bb2:
				%m = call noalias i8* @malloc(i32 10)
				store i8 1, i8* %m
				br label %bb3
				bb3:
				%r = phi i8* [ null, %bb1 ], [ %m, %bb2 ]
				ret i8* %r
				}


				define void @test27() {
				; CHECK-LABEL: @test27(
				; CHECK-NEXT: bb1:
				; CHECK-NEXT: br i1 true, label [[BB2:%.]], label [[BB3:%.]]
				; CHECK: bb2:
				; CHECK-NEXT: [[M:%.]] = call noalias i8 @malloc(i32 10)
				; CHECK-NEXT: br label [[BB3]]
				; CHECK: bb3:
				; CHECK-NEXT: [[R:%.]] = phi i8 [ null, [[BB1:%.*]] ], [ [[M]], [[BB2]] ]
				; CHECK-NEXT: ret void
				;
				bb1:
				br i1 true, label %bb2, label %bb3
				bb2:
				%m = call noalias i8* @malloc(i32 10)
				store i8 1, i8* %m
				br label %bb3
				bb3:
				%r = phi i8* [ null, %bb1 ], [ %m, %bb2 ]
				ret void
				}


				define i8* @test28() {
				; CHECK-LABEL: @test28(
				; CHECK-NEXT: bb0:
				; CHECK-NEXT: [[M:%.]] = call noalias i8 @malloc(i32 10)
				; CHECK-NEXT: [[MC0:%.]] = bitcast i8 [[M]] to i8*
				; CHECK-NEXT: [[MC1:%.]] = bitcast i8 [[MC0]] to i8*
				; CHECK-NEXT: [[MC2:%.]] = bitcast i8 [[MC1]] to i8*
				; CHECK-NEXT: [[MC3:%.]] = bitcast i8 [[MC2]] to i8*
				; CHECK-NEXT: [[MC4:%.]] = bitcast i8 [[MC3]] to i8*
				; CHECK-NEXT: [[MC5:%.]] = bitcast i8 [[MC4]] to i8*
				; CHECK-NEXT: [[MC6:%.]] = bitcast i8 [[MC5]] to i8*
				; CHECK-NEXT: [[M0:%.]] = bitcast i8 [[MC6]] to i8*
				; CHECK-NEXT: store i8 2, i8* [[M]]
				; CHECK-NEXT: ret i8* [[M0]]
				;
				bb0:
				%m = call noalias i8* @malloc(i32 10)
				%mc0 = bitcast i8* %m to i8*
				%mc1 = bitcast i8* %mc0 to i8*
				%mc2 = bitcast i8* %mc1 to i8*
				%mc3 = bitcast i8* %mc2 to i8*
				%mc4 = bitcast i8* %mc3 to i8*
				%mc5 = bitcast i8* %mc4 to i8*
				%mc6 = bitcast i8* %mc5 to i8*
				%m0 = bitcast i8* %mc6 to i8*
				store i8 2, i8* %m
				ret i8* %m0
				}

llvm/test/Transforms/DeadStoreElimination/MSSA/multiblock-memintrinsics.ll

This file was added.

				; NOTE: Assertions have been autogenerated by utils/update_test_checks.py
				; RUN: opt < %s -basicaa -dse -enable-dse-memoryssa -S \| FileCheck %s

				target datalayout = "e-m:e-p:32:32-i64:64-v128:64:128-a:0:32-n32-S64"
				declare void @unknown_func()
				declare void @llvm.memcpy.p0i8.p0i8.i64(i8* nocapture, i8* nocapture, i64, i1) nounwind
				declare void @llvm.memset.p0i8.i64(i8* nocapture, i8, i64, i32, i1) nounwind

				define void @test19(i32* noalias %P) {
				; CHECK-LABEL: @test19(
				; CHECK-NEXT: entry:
				; CHECK-NEXT: [[ARRAYIDX0:%.]] = getelementptr inbounds i32, i32 [[P:%.*]], i64 1
				; CHECK-NEXT: [[P3:%.]] = bitcast i32 [[ARRAYIDX0]] to i8*
				; CHECK-NEXT: call void @llvm.memset.p0i8.i64(i8* align 4 [[P3]], i8 0, i64 28, i1 false)
				; CHECK-NEXT: br i1 true, label [[BB1:%.]], label [[BB2:%.]]
				; CHECK: bb1:
				; CHECK-NEXT: br label [[BB3:%.*]]
				; CHECK: bb2:
				; CHECK-NEXT: [[ARRAYIDX1:%.]] = getelementptr inbounds i32, i32 [[P]], i64 1
				; CHECK-NEXT: store i32 1, i32* [[ARRAYIDX1]], align 4
				; CHECK-NEXT: br label [[BB3]]
				; CHECK: bb3:
				; CHECK-NEXT: ret void
				;
				entry:
				%arrayidx0 = getelementptr inbounds i32, i32* %P, i64 1
				%p3 = bitcast i32* %arrayidx0 to i8*
				call void @llvm.memset.p0i8.i64(i8* %p3, i8 0, i64 28, i32 4, i1 false)
				br i1 true, label %bb1, label %bb2
				bb1:
				br label %bb3
				bb2:
				%arrayidx1 = getelementptr inbounds i32, i32* %P, i64 1
				store i32 1, i32* %arrayidx1, align 4
				br label %bb3
				bb3:
				ret void
				}


				define void @test20(i32* noalias %P) {
				; CHECK-LABEL: @test20(
				; CHECK-NEXT: entry:
				; CHECK-NEXT: [[ARRAYIDX0:%.]] = getelementptr inbounds i32, i32 [[P:%.*]], i64 1
				; CHECK-NEXT: [[P3:%.]] = bitcast i32 [[ARRAYIDX0]] to i8*
				; CHECK-NEXT: call void @llvm.memset.p0i8.i64(i8* align 4 [[P3]], i8 0, i64 28, i1 false)
				; CHECK-NEXT: br i1 true, label [[BB1:%.]], label [[BB2:%.]]
				; CHECK: bb1:
				; CHECK-NEXT: br label [[BB3:%.*]]
				; CHECK: bb2:
				; CHECK-NEXT: br label [[BB3]]
				; CHECK: bb3:
				; CHECK-NEXT: [[ARRAYIDX1:%.]] = getelementptr inbounds i32, i32 [[P]], i64 1
				; CHECK-NEXT: store i32 1, i32* [[ARRAYIDX1]], align 4
				; CHECK-NEXT: ret void
				;
				entry:
				%arrayidx0 = getelementptr inbounds i32, i32* %P, i64 1
				%p3 = bitcast i32* %arrayidx0 to i8*
				call void @llvm.memset.p0i8.i64(i8* %p3, i8 0, i64 28, i32 4, i1 false)
				br i1 true, label %bb1, label %bb2
				bb1:
				br label %bb3
				bb2:
				br label %bb3
				bb3:
				%arrayidx1 = getelementptr inbounds i32, i32* %P, i64 1
				store i32 1, i32* %arrayidx1, align 4
				ret void
				}

llvm/test/Transforms/DeadStoreElimination/MSSA/multiblock-memoryphis.ll

This file was added.

				; NOTE: Assertions have been autogenerated by utils/update_test_checks.py
				; RUN: opt < %s -basicaa -dse -enable-dse-memoryssa -S \| FileCheck %s

				target datalayout = "e-m:e-p:32:32-i64:64-v128:64:128-a:0:32-n32-S64"


				define void @test4(i32* noalias %P) {
				; CHECK-LABEL: @test4(
				; CHECK-NEXT: store i32 0, i32* [[P:%.*]]
				; CHECK-NEXT: br i1 true, label [[BB1:%.]], label [[BB2:%.]]
				; CHECK: bb1:
				; CHECK-NEXT: br label [[BB3:%.*]]
				; CHECK: bb2:
				; CHECK-NEXT: [[X:%.]] = load i32, i32 [[P]]
				; CHECK-NEXT: br label [[BB3]]
				; CHECK: bb3:
				; CHECK-NEXT: store i32 0, i32* [[P]]
				; CHECK-NEXT: ret void
				;
				store i32 0, i32* %P
				br i1 true, label %bb1, label %bb2
				bb1:
				br label %bb3
				bb2:
				%x = load i32, i32* %P
				br label %bb3
				bb3:
				store i32 0, i32* %P
				ret void
				}

				define void @test5(i32* noalias %P) {
				; CHECK-LABEL: @test5(
				; CHECK-NEXT: br i1 true, label [[BB1:%.]], label [[BB2:%.]]
				; CHECK: bb1:
				; CHECK-NEXT: store i32 1, i32* [[P:%.*]]
				; CHECK-NEXT: br label [[BB3:%.*]]
				; CHECK: bb2:
				; CHECK-NEXT: store i32 1, i32* [[P]]
				; CHECK-NEXT: br label [[BB3]]
				; CHECK: bb3:
				; CHECK-NEXT: store i32 0, i32* [[P]]
				; CHECK-NEXT: ret void
				;
				br i1 true, label %bb1, label %bb2
				bb1:
				store i32 1, i32* %P
				br label %bb3
				bb2:
				store i32 1, i32* %P
				br label %bb3
				bb3:
				store i32 0, i32* %P
				ret void
				}

				define void @test8(i32* %P, i32* %Q) {
				; CHECK-LABEL: @test8(
				; CHECK-NEXT: br i1 true, label [[BB1:%.]], label [[BB2:%.]]
				; CHECK: bb1:
				; CHECK-NEXT: store i32 1, i32* [[P:%.*]]
				; CHECK-NEXT: br label [[BB3:%.*]]
				; CHECK: bb2:
				; CHECK-NEXT: store i32 1, i32* [[Q:%.*]]
				; CHECK-NEXT: br label [[BB3]]
				; CHECK: bb3:
				; CHECK-NEXT: store i32 0, i32* [[P]]
				; CHECK-NEXT: ret void
				;
				br i1 true, label %bb1, label %bb2
				bb1:
				store i32 1, i32* %P
				br label %bb3
				bb2:
				store i32 1, i32* %Q
				br label %bb3
				bb3:
				store i32 0, i32* %P
				ret void
				}

				define void @test10(i32* noalias %P) {
				; CHECK-LABEL: @test10(
				; CHECK-NEXT: [[P2:%.]] = bitcast i32 [[P:%.]] to i8
				; CHECK-NEXT: store i32 0, i32* [[P]]
				; CHECK-NEXT: br i1 true, label [[BB1:%.]], label [[BB2:%.]]
				; CHECK: bb1:
				; CHECK-NEXT: br label [[BB3:%.*]]
				; CHECK: bb2:
				; CHECK-NEXT: br label [[BB3]]
				; CHECK: bb3:
				; CHECK-NEXT: store i8 1, i8* [[P2]]
				; CHECK-NEXT: ret void
				;
				%P2 = bitcast i32* %P to i8*
				store i32 0, i32* %P
				br i1 true, label %bb1, label %bb2
				bb1:
				br label %bb3
				bb2:
				br label %bb3
				bb3:
				store i8 1, i8* %P2
				ret void
				}

llvm/test/Transforms/DeadStoreElimination/MSSA/multiblock-partial.ll

This file was added.

				; NOTE: Assertions have been autogenerated by utils/update_test_checks.py
				; RUN: opt < %s -basicaa -dse -enable-dse-memoryssa -S \| FileCheck %s

				target datalayout = "e-m:e-p:32:32-i64:64-v128:64:128-a:0:32-n32-S64"


				define void @second_store_smaller(i32* noalias %P) {
				; CHECK-LABEL: @second_store_smaller(
				; CHECK-NEXT: store i32 1, i32* [[P:%.*]]
				; CHECK-NEXT: br i1 true, label [[BB1:%.]], label [[BB2:%.]]
				; CHECK: bb1:
				; CHECK-NEXT: br label [[BB3:%.*]]
				; CHECK: bb2:
				; CHECK-NEXT: br label [[BB3]]
				; CHECK: bb3:
				; CHECK-NEXT: [[P_I16:%.]] = bitcast i32 [[P]] to i16*
				; CHECK-NEXT: store i16 0, i16* [[P_I16]]
				; CHECK-NEXT: ret void
				;
				store i32 1, i32* %P
				br i1 true, label %bb1, label %bb2
				bb1:
				br label %bb3
				bb2:
				br label %bb3
				bb3:
				%P.i16 = bitcast i32* %P to i16*
				store i16 0, i16* %P.i16
				ret void
				}


				define void @second_store_bigger(i32* noalias %P) {
				; CHECK-LABEL: @second_store_bigger(
				; CHECK-NEXT: br i1 true, label [[BB1:%.]], label [[BB2:%.]]
				; CHECK: bb1:
				; CHECK-NEXT: br label [[BB3:%.*]]
				; CHECK: bb2:
				; CHECK-NEXT: br label [[BB3]]
				; CHECK: bb3:
				; CHECK-NEXT: [[P_I64:%.]] = bitcast i32 [[P:%.]] to i64
				; CHECK-NEXT: store i64 0, i64* [[P_I64]]
				; CHECK-NEXT: ret void
				;
				store i32 1, i32* %P
				br i1 true, label %bb1, label %bb2
				bb1:
				br label %bb3
				bb2:
				br label %bb3
				bb3:
				%P.i64 = bitcast i32* %P to i64*
				store i64 0, i64* %P.i64
				ret void
				}

llvm/test/Transforms/DeadStoreElimination/MSSA/multiblock-simple.ll

This file was added.

				; NOTE: Assertions have been autogenerated by utils/update_test_checks.py
				; RUN: opt < %s -basicaa -dse -enable-dse-memoryssa -S \| FileCheck %s

				target datalayout = "e-m:e-p:32:32-i64:64-v128:64:128-a:0:32-n32-S64"


				define void @test2(i32* noalias %P) {
				; CHECK-LABEL: @test2(
				; CHECK-NEXT: br i1 true, label [[BB1:%.]], label [[BB2:%.]]
				; CHECK: bb1:
				; CHECK-NEXT: br label [[BB3:%.*]]
				; CHECK: bb2:
				; CHECK-NEXT: br label [[BB3]]
				; CHECK: bb3:
				; CHECK-NEXT: store i32 0, i32* [[P:%.*]]
				; CHECK-NEXT: ret void
				;
				store i32 1, i32* %P
				br i1 true, label %bb1, label %bb2
				bb1:
				br label %bb3
				bb2:
				br label %bb3
				bb3:
				store i32 0, i32* %P
				ret void
				}

				define void @test3(i32* noalias %P) {
				; CHECK-LABEL: @test3(
				; CHECK-NEXT: store i32 0, i32* [[P:%.*]]
				; CHECK-NEXT: br i1 true, label [[BB1:%.]], label [[BB2:%.]]
				; CHECK: bb1:
				; CHECK-NEXT: br label [[BB3:%.*]]
				; CHECK: bb2:
				; CHECK-NEXT: store i32 0, i32* [[P]]
				; CHECK-NEXT: br label [[BB3]]
				; CHECK: bb3:
				; CHECK-NEXT: ret void
				;
				store i32 0, i32* %P
				br i1 true, label %bb1, label %bb2
				bb1:
				br label %bb3
				bb2:
				store i32 0, i32* %P
				br label %bb3
				bb3:
				ret void
				}


				define void @test7(i32* noalias %P, i32* noalias %Q) {
				; CHECK-LABEL: @test7(
				; CHECK-NEXT: br i1 true, label [[BB1:%.]], label [[BB2:%.]]
				; CHECK: bb1:
				; CHECK-NEXT: [[TMP1:%.]] = load i32, i32 [[P:%.*]]
				; CHECK-NEXT: br label [[BB3:%.*]]
				; CHECK: bb2:
				; CHECK-NEXT: br label [[BB3]]
				; CHECK: bb3:
				; CHECK-NEXT: store i32 0, i32* [[Q:%.*]]
				; CHECK-NEXT: store i32 0, i32* [[P]]
				; CHECK-NEXT: ret void
				;
				store i32 1, i32* %Q
				br i1 true, label %bb1, label %bb2
				bb1:
				load i32, i32* %P
				br label %bb3
				bb2:
				br label %bb3
				bb3:
				store i32 0, i32* %Q
				store i32 0, i32* %P
				ret void
				}

				define i32 @test22(i32* %P, i32* noalias %Q, i32* %R) {
				; CHECK-LABEL: @test22(
				; CHECK-NEXT: store i32 2, i32* [[P:%.*]]
				; CHECK-NEXT: store i32 3, i32* [[Q:%.*]]
				; CHECK-NEXT: [[L:%.]] = load i32, i32 [[R:%.*]]
				; CHECK-NEXT: ret i32 [[L]]
				;
				store i32 1, i32* %Q
				store i32 2, i32* %P
				store i32 3, i32* %Q
				%l = load i32, i32* %R
				ret i32 %l
				}

				define void @test9(i32* noalias %P) {
				; CHECK-LABEL: @test9(
				; CHECK-NEXT: store i32 0, i32* [[P:%.*]]
				; CHECK-NEXT: br i1 true, label [[BB1:%.]], label [[BB2:%.]]
				; CHECK: bb1:
				; CHECK-NEXT: br label [[BB3:%.*]]
				; CHECK: bb2:
				; CHECK-NEXT: ret void
				; CHECK: bb3:
				; CHECK-NEXT: store i32 0, i32* [[P]]
				; CHECK-NEXT: ret void
				;
				store i32 0, i32* %P
				br i1 true, label %bb1, label %bb2
				bb1:
				br label %bb3
				bb2:
				ret void
				bb3:
				store i32 0, i32* %P
				ret void
				}

llvm/test/Transforms/DeadStoreElimination/MSSA/multiblock-throwing.ll

This file was added.

				; NOTE: Assertions have been autogenerated by utils/update_test_checks.py
				; XFAIL: *
				; RUN: opt < %s -basicaa -dse -enable-dse-memoryssa -S \| FileCheck %s

				target datalayout = "e-m:e-p:32:32-i64:64-v128:64:128-a:0:32-n32-S64"
				declare void @unknown_func()

				define void @test6(i32* noalias %P) {
				; CHECK-LABEL: @test6(
				; CHECK-NEXT: store i32 0, i32* [[P:%.*]]
				; CHECK-NEXT: br i1 true, label [[BB1:%.]], label [[BB2:%.]]
				; CHECK: bb1:
				; CHECK-NEXT: br label [[BB3:%.*]]
				; CHECK: bb2:
				; CHECK-NEXT: call void @unknown_func()
				; CHECK-NEXT: br label [[BB3]]
				; CHECK: bb3:
				; CHECK-NEXT: store i32 0, i32* [[P]]
				; CHECK-NEXT: ret void
				;
				store i32 0, i32* %P
				br i1 true, label %bb1, label %bb2
				bb1:
				br label %bb3
				bb2:
				call void @unknown_func()
				br label %bb3
				bb3:
				store i32 0, i32* %P
				ret void
				}

				define i32 @test22(i32* %P, i32* noalias %Q, i32* %R) {
				; CHECK-LABEL: @test22(
				; CHECK-NEXT: store i32 2, i32* [[P:%.*]]
				; CHECK-NEXT: store i32 3, i32* [[Q:%.*]]
				; CHECK-NEXT: [[L:%.]] = load i32, i32 [[R:%.*]]
				; CHECK-NEXT: ret i32 [[L]]
				;
				store i32 1, i32* %Q
				store i32 2, i32* %P
				store i32 3, i32* %Q
				%l = load i32, i32* %R
				ret i32 %l
				}


				define void @test23(i32* noalias %P) {
				; CHECK-LABEL: @test23(
				; CHECK-NEXT: br i1 true, label [[BB1:%.]], label [[BB2:%.]]
				; CHECK: bb1:
				; CHECK-NEXT: br label [[BB3:%.*]]
				; CHECK: bb2:
				; CHECK-NEXT: call void @unknown_func()
				; CHECK-NEXT: br label [[BB3]]
				; CHECK: bb3:
				; CHECK-NEXT: store i32 0, i32* [[P:%.*]]
				; CHECK-NEXT: ret void
				;
				br i1 true, label %bb1, label %bb2
				bb1:
				store i32 0, i32* %P
				br label %bb3
				bb2:
				call void @unknown_func()
				br label %bb3
				bb3:
				store i32 0, i32* %P
				ret void
				}


				define void @test24(i32* noalias %P) {
				; CHECK-LABEL: @test24(
				; CHECK-NEXT: br i1 true, label [[BB2:%.]], label [[BB1:%.]]
				; CHECK: bb1:
				; CHECK-NEXT: br label [[BB3:%.*]]
				; CHECK: bb2:
				; CHECK-NEXT: call void @unknown_func()
				; CHECK-NEXT: br label [[BB3]]
				; CHECK: bb3:
				; CHECK-NEXT: store i32 0, i32* [[P:%.*]]
				; CHECK-NEXT: ret void
				;
				br i1 true, label %bb2, label %bb1
				bb1:
				store i32 0, i32* %P
				br label %bb3
				bb2:
				call void @unknown_func()
				br label %bb3
				bb3:
				store i32 0, i32* %P
				ret void
				}

llvm/test/Transforms/DeadStoreElimination/MSSA/simple.ll

Show First 20 Lines • Show All 62 Lines • ▼ Show 20 Lines	;
%tmp.0 = va_arg i8** %X_addr, double		%tmp.0 = va_arg i8** %X_addr, double
ret double %tmp.0		ret double %tmp.0
}		}


declare noalias i8* @malloc(i32)		declare noalias i8* @malloc(i32)
declare noalias i8* @calloc(i32, i32)		declare noalias i8* @calloc(i32, i32)


; PR8701

;; Fully dead overwrite of memcpy.
define void @test15(i8* %P, i8* %Q) nounwind ssp {
; CHECK-LABEL: @test15(
; CHECK-NEXT: tail call void @llvm.memcpy.p0i8.p0i8.i64(i8* [[P:%.]], i8 [[Q:%.*]], i64 12, i1 false)
; CHECK-NEXT: ret void
;
tail call void @llvm.memcpy.p0i8.p0i8.i64(i8* %P, i8* %Q, i64 12, i1 false)
tail call void @llvm.memcpy.p0i8.p0i8.i64(i8* %P, i8* %Q, i64 12, i1 false)
ret void
}

;; Fully dead overwrite of memcpy.
define void @test15_atomic(i8* %P, i8* %Q) nounwind ssp {
; CHECK-LABEL: @test15_atomic(
; CHECK-NEXT: tail call void @llvm.memcpy.element.unordered.atomic.p0i8.p0i8.i64(i8* align 1 [[P:%.]], i8 align 1 [[Q:%.*]], i64 12, i32 1)
; CHECK-NEXT: ret void
;
tail call void @llvm.memcpy.element.unordered.atomic.p0i8.p0i8.i64(i8* align 1 %P, i8* align 1 %Q, i64 12, i32 1)
tail call void @llvm.memcpy.element.unordered.atomic.p0i8.p0i8.i64(i8* align 1 %P, i8* align 1 %Q, i64 12, i32 1)
ret void
}

;; Fully dead overwrite of memcpy.
define void @test15_atomic_weaker(i8* %P, i8* %Q) nounwind ssp {
; CHECK-LABEL: @test15_atomic_weaker(
; CHECK-NEXT: tail call void @llvm.memcpy.element.unordered.atomic.p0i8.p0i8.i64(i8* align 1 [[P:%.]], i8 align 1 [[Q:%.*]], i64 12, i32 1)
; CHECK-NEXT: ret void
;
tail call void @llvm.memcpy.p0i8.p0i8.i64(i8* align 1 %P, i8* align 1 %Q, i64 12, i1 false)
tail call void @llvm.memcpy.element.unordered.atomic.p0i8.p0i8.i64(i8* align 1 %P, i8* align 1 %Q, i64 12, i32 1)
ret void
}

;; Fully dead overwrite of memcpy.
define void @test15_atomic_weaker_2(i8* %P, i8* %Q) nounwind ssp {
; CHECK-LABEL: @test15_atomic_weaker_2(
; CHECK-NEXT: tail call void @llvm.memcpy.p0i8.p0i8.i64(i8* align 1 [[P:%.]], i8 align 1 [[Q:%.*]], i64 12, i1 false)
; CHECK-NEXT: ret void
;
tail call void @llvm.memcpy.element.unordered.atomic.p0i8.p0i8.i64(i8* align 1 %P, i8* align 1 %Q, i64 12, i32 1)
tail call void @llvm.memcpy.p0i8.p0i8.i64(i8* align 1 %P, i8* align 1 %Q, i64 12, i1 false)
ret void
}

;; Full overwrite of smaller memcpy.
define void @test16(i8* %P, i8* %Q) nounwind ssp {
; CHECK-LABEL: @test16(
; CHECK-NEXT: tail call void @llvm.memcpy.p0i8.p0i8.i64(i8* [[P:%.]], i8 [[Q:%.*]], i64 12, i1 false)
; CHECK-NEXT: ret void
;
tail call void @llvm.memcpy.p0i8.p0i8.i64(i8* %P, i8* %Q, i64 8, i1 false)
tail call void @llvm.memcpy.p0i8.p0i8.i64(i8* %P, i8* %Q, i64 12, i1 false)
ret void
}

;; Full overwrite of smaller memcpy.
define void @test16_atomic(i8* %P, i8* %Q) nounwind ssp {
; CHECK-LABEL: @test16_atomic(
; CHECK-NEXT: tail call void @llvm.memcpy.element.unordered.atomic.p0i8.p0i8.i64(i8* align 1 [[P:%.]], i8 align 1 [[Q:%.*]], i64 12, i32 1)
; CHECK-NEXT: ret void
;
tail call void @llvm.memcpy.element.unordered.atomic.p0i8.p0i8.i64(i8* align 1 %P, i8* align 1 %Q, i64 8, i32 1)
tail call void @llvm.memcpy.element.unordered.atomic.p0i8.p0i8.i64(i8* align 1 %P, i8* align 1 %Q, i64 12, i32 1)
ret void
}

;; Full overwrite of smaller memory where overwrite has stronger atomicity
define void @test16_atomic_weaker(i8* %P, i8* %Q) nounwind ssp {
; CHECK-LABEL: @test16_atomic_weaker(
; CHECK-NEXT: tail call void @llvm.memcpy.element.unordered.atomic.p0i8.p0i8.i64(i8* align 1 [[P:%.]], i8 align 1 [[Q:%.*]], i64 12, i32 1)
; CHECK-NEXT: ret void
;
tail call void @llvm.memcpy.p0i8.p0i8.i64(i8* align 1 %P, i8* align 1 %Q, i64 8, i1 false)
tail call void @llvm.memcpy.element.unordered.atomic.p0i8.p0i8.i64(i8* align 1 %P, i8* align 1 %Q, i64 12, i32 1)
ret void
}

;; Full overwrite of smaller memory where overwrite has weaker atomicity.
define void @test16_atomic_weaker_2(i8* %P, i8* %Q) nounwind ssp {
; CHECK-LABEL: @test16_atomic_weaker_2(
; CHECK-NEXT: tail call void @llvm.memcpy.p0i8.p0i8.i64(i8* align 1 [[P:%.]], i8 align 1 [[Q:%.*]], i64 12, i1 false)
; CHECK-NEXT: ret void
;
tail call void @llvm.memcpy.element.unordered.atomic.p0i8.p0i8.i64(i8* align 1 %P, i8* align 1 %Q, i64 8, i32 1)
tail call void @llvm.memcpy.p0i8.p0i8.i64(i8* align 1 %P, i8* align 1 %Q, i64 12, i1 false)
ret void
}

;; Overwrite of memset by memcpy.
define void @test17(i8* %P, i8* noalias %Q) nounwind ssp {
; CHECK-LABEL: @test17(
; CHECK-NEXT: tail call void @llvm.memcpy.p0i8.p0i8.i64(i8* [[P:%.]], i8 [[Q:%.*]], i64 12, i1 false)
; CHECK-NEXT: ret void
;
tail call void @llvm.memset.p0i8.i64(i8* %P, i8 42, i64 8, i1 false)
tail call void @llvm.memcpy.p0i8.p0i8.i64(i8* %P, i8* %Q, i64 12, i1 false)
ret void
}

;; Overwrite of memset by memcpy.
define void @test17_atomic(i8* %P, i8* noalias %Q) nounwind ssp {
; CHECK-LABEL: @test17_atomic(
; CHECK-NEXT: tail call void @llvm.memcpy.element.unordered.atomic.p0i8.p0i8.i64(i8* align 1 [[P:%.]], i8 align 1 [[Q:%.*]], i64 12, i32 1)
; CHECK-NEXT: ret void
;
tail call void @llvm.memset.element.unordered.atomic.p0i8.i64(i8* align 1 %P, i8 42, i64 8, i32 1)
tail call void @llvm.memcpy.element.unordered.atomic.p0i8.p0i8.i64(i8* align 1 %P, i8* align 1 %Q, i64 12, i32 1)
ret void
}

;; Overwrite of memset by memcpy. Overwrite is stronger atomicity. We can
;; remove the memset.
define void @test17_atomic_weaker(i8* %P, i8* noalias %Q) nounwind ssp {
; CHECK-LABEL: @test17_atomic_weaker(
; CHECK-NEXT: tail call void @llvm.memcpy.element.unordered.atomic.p0i8.p0i8.i64(i8* align 1 [[P:%.]], i8 align 1 [[Q:%.*]], i64 12, i32 1)
; CHECK-NEXT: ret void
;
tail call void @llvm.memset.p0i8.i64(i8* align 1 %P, i8 42, i64 8, i1 false)
tail call void @llvm.memcpy.element.unordered.atomic.p0i8.p0i8.i64(i8* align 1 %P, i8* align 1 %Q, i64 12, i32 1)
ret void
}

;; Overwrite of memset by memcpy. Overwrite is weaker atomicity. We can remove
;; the memset.
define void @test17_atomic_weaker_2(i8* %P, i8* noalias %Q) nounwind ssp {
; CHECK-LABEL: @test17_atomic_weaker_2(
; CHECK-NEXT: tail call void @llvm.memcpy.p0i8.p0i8.i64(i8* align 1 [[P:%.]], i8 align 1 [[Q:%.*]], i64 12, i1 false)
; CHECK-NEXT: ret void
;
tail call void @llvm.memset.element.unordered.atomic.p0i8.i64(i8* align 1 %P, i8 42, i64 8, i32 1)
tail call void @llvm.memcpy.p0i8.p0i8.i64(i8* align 1 %P, i8* align 1 %Q, i64 12, i1 false)
ret void
}

; Should not delete the volatile memset.
define void @test17v(i8* %P, i8* %Q) nounwind ssp {
; CHECK-LABEL: @test17v(
; CHECK-NEXT: tail call void @llvm.memset.p0i8.i64(i8* [[P:%.*]], i8 42, i64 8, i1 true)
; CHECK-NEXT: tail call void @llvm.memcpy.p0i8.p0i8.i64(i8* [[P]], i8* [[Q:%.*]], i64 12, i1 false)
; CHECK-NEXT: ret void
;
tail call void @llvm.memset.p0i8.i64(i8* %P, i8 42, i64 8, i1 true)
tail call void @llvm.memcpy.p0i8.p0i8.i64(i8* %P, i8* %Q, i64 12, i1 false)
ret void
}

; PR8728
; Do not delete instruction where possible situation is:
; A = B
; A = A
;
; NB! See PR11763 - currently LLVM allows memcpy's source and destination to be
; equal (but not inequal and overlapping).
define void @test18(i8* %P, i8* %Q, i8* %R) nounwind ssp {
; CHECK-LABEL: @test18(
; CHECK-NEXT: tail call void @llvm.memcpy.p0i8.p0i8.i64(i8* [[P:%.]], i8 [[Q:%.*]], i64 12, i1 false)
; CHECK-NEXT: tail call void @llvm.memcpy.p0i8.p0i8.i64(i8* [[P]], i8* [[R:%.*]], i64 12, i1 false)
; CHECK-NEXT: ret void
;
tail call void @llvm.memcpy.p0i8.p0i8.i64(i8* %P, i8* %Q, i64 12, i1 false)
tail call void @llvm.memcpy.p0i8.p0i8.i64(i8* %P, i8* %R, i64 12, i1 false)
ret void
}

define void @test18_atomic(i8* %P, i8* %Q, i8* %R) nounwind ssp {
; CHECK-LABEL: @test18_atomic(
; CHECK-NEXT: tail call void @llvm.memcpy.element.unordered.atomic.p0i8.p0i8.i64(i8* align 1 [[P:%.]], i8 align 1 [[Q:%.*]], i64 12, i32 1)
; CHECK-NEXT: tail call void @llvm.memcpy.element.unordered.atomic.p0i8.p0i8.i64(i8* align 1 [[P]], i8* align 1 [[R:%.*]], i64 12, i32 1)
; CHECK-NEXT: ret void
;
tail call void @llvm.memcpy.element.unordered.atomic.p0i8.p0i8.i64(i8* align 1 %P, i8* align 1 %Q, i64 12, i32 1)
tail call void @llvm.memcpy.element.unordered.atomic.p0i8.p0i8.i64(i8* align 1 %P, i8* align 1 %R, i64 12, i32 1)
ret void
}


; The store here is not dead because the byval call reads it.		; The store here is not dead because the byval call reads it.
declare void @test19f({i32}* byval align 4 %P)		declare void @test19f({i32}* byval align 4 %P)

define void @test19({i32} * nocapture byval align 4 %arg5) nounwind ssp {		define void @test19({i32} * nocapture byval align 4 %arg5) nounwind ssp {
; CHECK-LABEL: @test19(		; CHECK-LABEL: @test19(
; CHECK-NEXT: bb:		; CHECK-NEXT: bb:
; CHECK-NEXT: [[TMP7:%.]] = getelementptr inbounds { i32 }, { i32 } [[ARG5:%.*]], i32 0, i32 0		; CHECK-NEXT: [[TMP7:%.]] = getelementptr inbounds { i32 }, { i32 } [[ARG5:%.*]], i32 0, i32 0
; CHECK-NEXT: store i32 912, i32* [[TMP7]]		; CHECK-NEXT: store i32 912, i32* [[TMP7]]
▲ Show 20 Lines • Show All 134 Lines • ▼ Show 20 Lines
}		}

; TODO		; TODO
; We can remove redundant store, as noalias %p guarantees that the function does		; We can remove redundant store, as noalias %p guarantees that the function does
; only access it via %p. This also holds for the call to unknown_func even though		; only access it via %p. This also holds for the call to unknown_func even though
; it could unwind		; it could unwind
define void @test34(i32* noalias %p) {		define void @test34(i32* noalias %p) {
; CHECK-LABEL: @test34(		; CHECK-LABEL: @test34(
; CHECK-NEXT: store i32 1, i32* [[P]]		; CHECK-NEXT: store i32 1, i32* [[P:%.*]]
; CHECK-NEXT: call void @unknown_func()		; CHECK-NEXT: call void @unknown_func()
; CHECK-NEXT: store i32 0, i32* [[P]]		; CHECK-NEXT: store i32 0, i32* [[P]]
; CHECK-NEXT: ret void		; CHECK-NEXT: ret void
;		;
store i32 1, i32* %p		store i32 1, i32* %p
call void @unknown_func()		call void @unknown_func()
store i32 0, i32* %p		store i32 0, i32* %p
ret void		ret void
}		}

; TODO
; Remove redundant store even with an unwinding function in the same block		; Remove redundant store even with an unwinding function in the same block
define void @test35(i32* noalias %p) {		define void @test35(i32* noalias %p) {
; CHECK-LABEL: @test35(		; CHECK-LABEL: @test35(
; CHECK-NEXT: call void @unknown_func()		; CHECK-NEXT: call void @unknown_func()
; CHECK-NEXT: store i32 1, i32* [[P:%.*]]
; CHECK-NEXT: store i32 0, i32* [[P:%.*]]		; CHECK-NEXT: store i32 0, i32* [[P:%.*]]
; CHECK-NEXT: ret void		; CHECK-NEXT: ret void
;		;
call void @unknown_func()		call void @unknown_func()
store i32 1, i32* %p		store i32 1, i32* %p
store i32 0, i32* %p		store i32 0, i32* %p
ret void		ret void
}		}
▲ Show 20 Lines • Show All 42 Lines • ▼ Show 20 Lines
; CHECK-NEXT: ret void		; CHECK-NEXT: ret void
;		;

tail call void @llvm.memcpy.element.unordered.atomic.p0i8.p0i8.i64(i8* align 1 %P, i8* align 1 %Q, i64 12, i32 1)		tail call void @llvm.memcpy.element.unordered.atomic.p0i8.p0i8.i64(i8* align 1 %P, i8* align 1 %Q, i64 12, i32 1)
tail call void @llvm.memmove.element.unordered.atomic.p0i8.p0i8.i64(i8* align 1 %P, i8* align 1 %R, i64 12, i32 1)		tail call void @llvm.memmove.element.unordered.atomic.p0i8.p0i8.i64(i8* align 1 %P, i8* align 1 %R, i64 12, i32 1)
ret void		ret void
}		}

; Same caveat about memcpy as in @test18 applies here.		; The memmove is dead, because memcpy arguments cannot overlap.
define void @test38(i8* %P, i8* %Q, i8* %R) {		define void @test38(i8* %P, i8* %Q, i8* %R) {
; CHECK-LABEL: @test38(		; CHECK-LABEL: @test38(
; CHECK-NEXT: tail call void @llvm.memmove.p0i8.p0i8.i64(i8* [[P:%.]], i8 [[Q:%.*]], i64 12, i1 false)		; CHECK-NEXT: tail call void @llvm.memcpy.p0i8.p0i8.i64(i8* [[P:%.]], i8 [[R:%.*]], i64 12, i1 false)
; CHECK-NEXT: tail call void @llvm.memcpy.p0i8.p0i8.i64(i8* [[P]], i8* [[R:%.*]], i64 12, i1 false)
; CHECK-NEXT: ret void		; CHECK-NEXT: ret void
;		;

tail call void @llvm.memmove.p0i8.p0i8.i64(i8* %P, i8* %Q, i64 12, i1 false)		tail call void @llvm.memmove.p0i8.p0i8.i64(i8* %P, i8* %Q, i64 12, i1 false)
tail call void @llvm.memcpy.p0i8.p0i8.i64(i8* %P, i8* %R, i64 12, i1 false)		tail call void @llvm.memcpy.p0i8.p0i8.i64(i8* %P, i8* %R, i64 12, i1 false)
ret void		ret void
}		}

		; The memmove is dead, because memcpy arguments cannot overlap.
define void @test38_atomic(i8* %P, i8* %Q, i8* %R) {		define void @test38_atomic(i8* %P, i8* %Q, i8* %R) {
; CHECK-LABEL: @test38_atomic(		; CHECK-LABEL: @test38_atomic(
; CHECK-NEXT: tail call void @llvm.memmove.element.unordered.atomic.p0i8.p0i8.i64(i8* align 1 [[P:%.]], i8 align 1 [[Q:%.*]], i64 12, i32 1)		; CHECK-NEXT: tail call void @llvm.memcpy.element.unordered.atomic.p0i8.p0i8.i64(i8* align 1 [[P:%.]], i8 align 1 [[R:%.*]], i64 12, i32 1)
; CHECK-NEXT: tail call void @llvm.memcpy.element.unordered.atomic.p0i8.p0i8.i64(i8* align 1 [[P]], i8* align 1 [[R:%.*]], i64 12, i32 1)
; CHECK-NEXT: ret void		; CHECK-NEXT: ret void
;		;

tail call void @llvm.memmove.element.unordered.atomic.p0i8.p0i8.i64(i8* align 1 %P, i8* align 1 %Q, i64 12, i32 1)		tail call void @llvm.memmove.element.unordered.atomic.p0i8.p0i8.i64(i8* align 1 %P, i8* align 1 %Q, i64 12, i32 1)
tail call void @llvm.memcpy.element.unordered.atomic.p0i8.p0i8.i64(i8* align 1 %P, i8* align 1 %R, i64 12, i32 1)		tail call void @llvm.memcpy.element.unordered.atomic.p0i8.p0i8.i64(i8* align 1 %P, i8* align 1 %R, i64 12, i32 1)
ret void		ret void
}		}

▲ Show 20 Lines • Show All 52 Lines • ▼ Show 20 Lines	entry:
call void @llvm.lifetime.end.p0i8(i64 4, i8* nonnull %Ac)		call void @llvm.lifetime.end.p0i8(i64 4, i8* nonnull %Ac)
ret void		ret void
}		}

; I think this case is currently handled incorrectly by memdeps dse		; I think this case is currently handled incorrectly by memdeps dse
; throwing should leave store i32 1, not remove from the free.		; throwing should leave store i32 1, not remove from the free.
declare void @free(i8* nocapture)		declare void @free(i8* nocapture)
define void @test41(i32* noalias %P) {		define void @test41(i32* noalias %P) {
; NOCHECK-LABEL: @test41(		; CHECK-LABEL: @test41(
; NOCHECK-NEXT: [[P2:%.]] = bitcast i32 [[P:%.]] to i8		; CHECK-NEXT: [[P2:%.]] = bitcast i32 [[P:%.]] to i8
; NOCHECK-NEXT: store i32 1, i32* [[P]]		; CHECK-NEXT: store i32 1, i32* [[P]]
; NOCHECK-NEXT: call void @unknown_func()		; CHECK-NEXT: call void @unknown_func()
; NOCHECK-NEXT: call void @free(i8* [[P2]])		; CHECK-NEXT: store i32 2, i32* [[P]]
; NOCHECK-NEXT: ret void		; CHECK-NEXT: call void @free(i8* [[P2]])
		; CHECK-NEXT: ret void
;		;
%P2 = bitcast i32* %P to i8*		%P2 = bitcast i32* %P to i8*
store i32 1, i32* %P		store i32 1, i32* %P
call void @unknown_func()		call void @unknown_func()
store i32 2, i32* %P		store i32 2, i32* %P
call void @free(i8* %P2)		call void @free(i8* %P2)
ret void		ret void
}		}

define void @test42(i32* %P, i32* %Q) {		define void @test42(i32* %P, i32* %Q) {
; NOCHECK-LABEL: @test42(		; CHECK-LABEL: @test42(
; NOCHECK-NEXT: store i32 1, i32* [[P:%.*]]		; CHECK-NEXT: store i32 1, i32* [[P:%.*]]
; NOCHECK-NEXT: [[P2:%.]] = bitcast i32 [[P]] to i8*		; CHECK-NEXT: [[P2:%.]] = bitcast i32 [[P]] to i8*
; NOCHECK-NEXT: store i32 2, i32* [[Q:%.*]]		; CHECK-NEXT: store i32 2, i32* [[Q:%.*]]
; NOCHECK-NEXT: store i8 3, i8* [[P2]]		; CHECK-NEXT: store i8 3, i8* [[P2]]
; NOCHECK-NEXT: ret void		; CHECK-NEXT: ret void
;		;
store i32 1, i32* %P		store i32 1, i32* %P
%P2 = bitcast i32* %P to i8*		%P2 = bitcast i32* %P to i8*
store i32 2, i32* %Q		store i32 2, i32* %Q
store i8 3, i8* %P2		store i8 3, i8* %P2
ret void		ret void
}		}

define void @test42a(i32* %P, i32* %Q) {		define void @test42a(i32* %P, i32* %Q) {
; NOCHECK-LABEL: @test42a(		; CHECK-LABEL: @test42a(
; NOCHECK-NEXT: store atomic i32 1, i32* [[P:%.*]] unordered, align 4		; CHECK-NEXT: store atomic i32 1, i32* [[P:%.*]] unordered, align 4
; NOCHECK-NEXT: [[P2:%.]] = bitcast i32 [[P]] to i8*		; CHECK-NEXT: [[P2:%.]] = bitcast i32 [[P]] to i8*
; NOCHECK-NEXT: store atomic i32 2, i32* [[Q:%.*]] unordered, align 4		; CHECK-NEXT: store atomic i32 2, i32* [[Q:%.*]] unordered, align 4
; NOCHECK-NEXT: store atomic i8 3, i8* [[P2]] unordered, align 4		; CHECK-NEXT: store atomic i8 3, i8* [[P2]] unordered, align 4
; NOCHECK-NEXT: ret void		; CHECK-NEXT: ret void
;		;
store atomic i32 1, i32* %P unordered, align 4		store atomic i32 1, i32* %P unordered, align 4
%P2 = bitcast i32* %P to i8*		%P2 = bitcast i32* %P to i8*
store atomic i32 2, i32* %Q unordered, align 4		store atomic i32 2, i32* %Q unordered, align 4
store atomic i8 3, i8* %P2 unordered, align 4		store atomic i8 3, i8* %P2 unordered, align 4
ret void		ret void
}		}

This is an archive of the discontinued LLVM Phabricator instance.

[DSE] Add first version of MemorySSA-backed DSE (Bottom up walk).ClosedPublic

Details

Diff Detail

Event Timeline

Revision Contents

Diff 237958

llvm/lib/Transforms/Scalar/DeadStoreElimination.cpp

llvm/test/Transforms/DeadStoreElimination/MSSA/PartialStore.ll

llvm/test/Transforms/DeadStoreElimination/MSSA/fence-todo.ll

llvm/test/Transforms/DeadStoreElimination/MSSA/fence.ll

llvm/test/Transforms/DeadStoreElimination/MSSA/memcpy-complete-overwrite.ll

llvm/test/Transforms/DeadStoreElimination/MSSA/memcpy-lifetimes.ll

llvm/test/Transforms/DeadStoreElimination/MSSA/memoryssa-scan-limit.ll

llvm/test/Transforms/DeadStoreElimination/MSSA/memset-and-memcpy.ll

llvm/test/Transforms/DeadStoreElimination/MSSA/multiblock-captures.ll

llvm/test/Transforms/DeadStoreElimination/MSSA/multiblock-exceptions.ll

llvm/test/Transforms/DeadStoreElimination/MSSA/multiblock-loops.ll

llvm/test/Transforms/DeadStoreElimination/MSSA/multiblock-malloc-free.ll

llvm/test/Transforms/DeadStoreElimination/MSSA/multiblock-memintrinsics.ll

llvm/test/Transforms/DeadStoreElimination/MSSA/multiblock-memoryphis.ll

llvm/test/Transforms/DeadStoreElimination/MSSA/multiblock-partial.ll

llvm/test/Transforms/DeadStoreElimination/MSSA/multiblock-simple.ll

llvm/test/Transforms/DeadStoreElimination/MSSA/multiblock-throwing.ll

llvm/test/Transforms/DeadStoreElimination/MSSA/simple.ll

[DSE] Add first version of MemorySSA-backed DSE (Bottom up walk).
ClosedPublic