This is an archive of the discontinued LLVM Phabricator instance.

Paths

Table of Contentst

-
llvm/
-
lib/Transforms/Scalar/
-
Transforms/
-
Scalar/
29/33
DeadStoreElimination.cpp
-
test/Transforms/DeadStoreElimination/MSSA/
-
Transforms/
-
DeadStoreElimination/
-
MSSA/
-
2011-09-06-EndOfFunction.ll
-
OverwriteStoreBegin.ll
-
OverwriteStoreEnd.ll
-
atomic.ll
-
calloc-store.ll
-
fence-todo.ll
-
fence.ll
-
free.ll
-
inst-limits.ll
-
lifetime.ll
-
memcpy-complete-overwrite.ll
-
memintrinsics.ll
-
memoryssa-scan-limit.ll
-
memset-and-memcpy.ll
-
memset-missing-debugloc.ll
-
merge-stores-big-endian.ll
-
merge-stores.ll
-
multiblock-captures.ll
-
multiblock-exceptions.ll
-
multiblock-loops.ll
-
multiblock-memoryphis.ll
-
multiblock-partial.ll
-
multiblock-simple.ll
-
operand-bundles.ll
-
simple-todo.ll
-
simple.ll

Differential D72700

[DSE] Add first version of MemorySSA-backed DSE (Bottom up walk).
ClosedPublic

Authored by fhahn on Jan 14 2020, 6:12 AM.

Download Raw Diff

Details

Reviewers

dmgreen
rnk
efriedma
bryant
asbirlea
Tyker

Commits

rGd0c4d4fe0929: [DSE] Add first version of MemorySSA-backed DSE (Bottom up walk).

Summary

This patch adds a first version of a MemorySSA based DSE. It is missing
a lot of features, which will get added as follow-ups, to help to keep
the review manageable.

The patch uses the following general approach: given a MemoryDef, walk
upwards to find clobbering MemoryDefs that may be killed by the
starting def. Then check that there are no uses that may read the
location of the original MemoryDef in between both MemoryDefs. A bit
more concretely:

For all MemoryDefs StartDef:

Get the next dominating clobbering MemoryDef (DomAccess) by walking upwards.
Check that there no reads between DomAccess and the StartDef by checking all uses starting at DomAccess and walking until we see StartDef.
For each found DomDef, check that:
1. There are no barrier instructions between DomDef and StartDef (like throws or stores with ordering constraints).
2. StartDef is executed whenever DomDef is executed.
StartDef completely overwrites DomDef.
Erase DomDef from the function and MemorySSA.

The patch uses a very simple approach to guarantee that no throwing
instructions are between 2 stores: We only allow accesses to stack
objects, access that are in the same basic block if the block does not
contain any throwing instructions or accesses in functions that do
not contain any throwing instructions. This will get lifted later.

Besides adding support for the missing cases, there is plenty of additional
potential for improvements as follow-up work, e.g. the way we visit stores
(could be just a traversal of the MemorySSA, rather than collecting them
up-front), using the alias information discovered during walking to optimize
the MemorySSA.

This is loosely based on D40480 by Dave Green.

Diff Detail

Repository: rG LLVM Github Monorepo

Event Timeline

fhahn created this revision.Jan 14 2020, 6:12 AM

Herald added a project: Restricted Project. · View Herald TranscriptJan 14 2020, 6:12 AM

Herald added subscribers: asbirlea, jfb, george.burgess.iv and 2 others. · View Herald Transcript

Unit tests: unknown.

clang-tidy: unknown.

clang-format: unknown.

Build artifacts: diff.json, console-log.txt

Harbormaster failed remote builds in B43933: Diff 237958!Jan 14 2020, 6:17 AM

fhahn mentioned this in D72146: [DSE] Add first version of MemorySSA-backed DSE..Jan 14 2020, 6:31 AM

asbirlea added inline comments.Jan 14 2020, 12:37 PM

llvm/lib/Transforms/Scalar/DeadStoreElimination.cpp
1477	MemorySSA should handle this, if it doesn't we should fix that.
1541	Nit: Use only DomDef, known to be non-null outside the loop. Restrict scope of DomAccess to inside the loop above.
1544	Walking uses may miss cases due to aliasing not being transitive. This needs to be throughly analyzed. Here's a very rough example. 1 = Def(LoE) ; <----- DomDef stores [0,3] 2 = Def(1) ; (2, 1)=NoAlias, stores [4,7] Use(2) ; MayAlias 2 and 1, the Use points to the first Def it may alias, loads [0, 7]. 3 = Def(1) ; <---- Current (3, 2)=NoAlias, (3,1)=MayAlias, stores [0,3] The situation may be simplified due to handling stores, but all Uses may need looking at. Note this will not work to recurse on uses of the uses either. Rough example why: 1 = Def(LoE) 2 = Def(1) ; <----- DomDef 3 = Def(1) ; (3, 2)=NoAlias Use(3) ; MayAlias 3 and 2, the Use points to the first Def it may alias. 4 = Def(2) ; <---- Current (4, 3)=NoAlias, (4,2)=MayAlias
1546	You're right this can be partially lifted. Here's an example: %a: 1 = Def(LoE) ; <----- DomDef br %cond1, %b, %c %b: 2 = Def(1) br %cond2, %d, %e %c: br %e %d: br %f %e: 3 = phi(1,2) br %f %f: 4 = Phi(2,3) 5 = Def(4) ; <---- Current
1749	I don't see how this being in a loop will work. Shouldn't this be a "give up"? Example: ; 1 = MemoryDef (LoE) store a ; 2 = MemoryDef(1) call_reading_a_and_overwriting_a ; 3 = MemoryDef(2) store a Once the getClobbering found a mayAlias of 3 with 2, even if an overwrite is not proven, def 1 cannot be removed. I may have missed such a check in `getDomMemoryDef`.

I did a few experiment with bottom-up algorithm before the patch i showed on phabricator. my implementation of the bottom-up had similar average complie-time to the current pass on the test-suite but it was only barely removing more stores than the current pass.
i gave up on it because i didn't found a good way forward to make it deal with cases like to the following:

   store i32 0, i32* %Ptr.  ; DEAD
   br i1 %cond, label %a, label %b
a:
  store i32 2, i32* %Ptr
   br label %c
b: 
  store i32 2, i32* %Ptr
  br label %c

which is IMO definitely something we want to do. by the way gcc does this optimization. https://godbolt.org/z/VFBf-c
maybe there is a bottom-up way to deal with this that i didn't thought about. any thought ?

would you be interested in a port of my top-down algorithm from D72182 to work with this patch series's framework ?
i have gotten since i wrote it many idea of how to improve it mostly on the compile-time side.

In D72700#1821571, @Tyker wrote:
I did a few experiment with bottom-up algorithm before the patch i showed on phabricator. my implementation of the bottom-up had similar average complie-time to the current pass on the test-suite but it was only barely removing more stores than the current pass.
i gave up on it because i didn't found a good way forward to make it deal with cases like to the following:
   store i32 0, i32* %Ptr.  ; DEAD
   br i1 %cond, label %a, label %b
a:
  store i32 2, i32* %Ptr
   br label %c
b: 
  store i32 2, i32* %Ptr
  br label %c
which is IMO definitely something we want to do. by the way gcc does this optimization. https://godbolt.org/z/VFBf-c
maybe there is a bottom-up way to deal with this that i didn't thought about. any thought ?

Yes I think we should make sure to cover this case. One approach could be to optimize the defining accesses of the accesses we start our bottom-up walks with. After visiting all MemoryDefs, to detect cases like the one above, we would just have to look for MemoryDefs D with multiple MemoryDefs as users (which we could probably collect along the way). Then we would just have to check that D is not read in between. Does that sound sensible?

would you be interested in a port of my top-down algorithm from D72182 to work with this patch series's framework ?
i have gotten since i wrote it many idea of how to improve it mostly on the compile-time side.

Yes, but I think it would be good to settle the question bottom-up vs top-down.

fhahn marked 2 inline comments as done.Jan 15 2020, 6:44 AM

fhahn added inline comments.

llvm/lib/Transforms/Scalar/DeadStoreElimination.cpp
1544	Thanks for the example. I think I might be missing something for the optimized version though. From the example in MemorySSA.h: /// Given this form, all the stores that could ever effect the load at %8 can be /// gotten by using the MemoryUse associated with it, and walking from use to /// def until you hit the top of the function. From the comment above, shouldn't we visit both 2 and 3 when walking up from Use(3), as both may change the read location? In the example we would only see 3 and 1. But even assuming we would visit both 2 and 3, I think I can see how we could end up with scenarios we could miss overlapping reads. I think we would have to take a look at all users of DomDef and we specifically cannot skip any non-aliasing MemoryDefs for the read-checks. That would make things more expensive, but would be something we have to do regardless of going bottom-up/top-down. Does that make sense to you? However I think that would mean that in the worst-case, we would have to do a top-down walk similar to the general top-down approach for the read checks.
1749	I had another look at the getClobberingMemoryAccess, and we indeed need an additional check ensuring that the memorydef does not also read the original location!

Regarding handling the branching example, the solution fhahn proposes sounds reasonable to me.
I was thinking something similar, along the lines of: do the checks bottom-up from both second and third stores during the main algorithm, both find the dead store but do not postdominate it; don't abandon them, but keep this info (include the checks for no intervening reads) for after the main search. The data structure could be a Hashmap<PotentialDeadStore, ListOfDefsOrBlocksWhoFoundThisPotentialDeadStore>. If all paths are covered in the List for a PotentialDeadStore, then it's truly dead.

llvm/lib/Transforms/Scalar/DeadStoreElimination.cpp
1544	I think I know what I missed. MemoryDefs keep two fields, the defining and the optimized access. So `3 = Def(1)` actually looks like this: `3 = Def(2) - Optimized 1` , and is a user of both 1 and 2. Yes, I think you're right that the read checks for all accesses are needed regardless of which approach is taken. And yes, it will be expensive. I came across something similar in LICM, and I limited or outright avoided analyzing all uses against a store to avoid the cost of analyzing all of them (see ~LICM.cpp:1300)

In D72700#1822594, @asbirlea wrote:

Regarding handling the branching example, the solution fhahn proposes sounds reasonable to me.
I was thinking something similar, along the lines of: do the checks bottom-up from both second and third stores during the main algorithm, both find the dead store but do not postdominate it; don't abandon them, but keep this info (include the checks for no intervening reads) for after the main search. The data structure could be a Hashmap<PotentialDeadStore, ListOfDefsOrBlocksWhoFoundThisPotentialDeadStore>. If all paths are covered in the List for a PotentialDeadStore, then it's truly dead.

this seems like it should work.

by the way do you have any thought on adding all calls that may throw in the memory ssa graph even if there are specified to not access memory ?
being able to rely on this would simplify greatly and probably speedup any DSE algorithm. and it is a corner case, readnone functions that may throw can't occur in C++ and i expect most other languages to be the same.

In D72700#1824724, @Tyker wrote:

In D72700#1822594, @asbirlea wrote:

Regarding handling the branching example, the solution fhahn proposes sounds reasonable to me.
I was thinking something similar, along the lines of: do the checks bottom-up from both second and third stores during the main algorithm, both find the dead store but do not postdominate it; don't abandon them, but keep this info (include the checks for no intervening reads) for after the main search. The data structure could be a Hashmap<PotentialDeadStore, ListOfDefsOrBlocksWhoFoundThisPotentialDeadStore>. If all paths are covered in the List for a PotentialDeadStore, then it's truly dead.

this seems like it should work.

Sounds good! I'll update this patch in the next few days to improve the read-clobber checks as suggested. I'll update the follow on patches to work with the bottom-up approach as well.

Address comments, rework to correctly check for reads between Current and DomAccess by visiting all uses (including non-aliasing MemoryDefs) until we reach the original def.

There are some cases where we might visit more uses than necessary (e.g. if Current does not post-dominate DomAccess), but I think it is better to get a correct version and fairly complete version in and then tune for compile-time. I will also post a bunch of follow-up changes that implement various additional cases. I measured compile-time on X86 for the full patch series (covering a lot more cases) and without much tuning the worst compile-time difference is around ~1.5%.

It would be great if you could have another look.

Unit tests: unknown.

clang-tidy: unknown.

clang-format: unknown.

Build artifacts: diff.json, console-log.txt

Pre-merge checks is in beta. Report issue. Please join beta or enable it for your project.

Harbormaster failed remote builds in B45379: Diff 241544!Jan 30 2020, 12:34 PM

fhahn added reviewers: dmgreen, rnk, efriedma, bryant, asbirlea, Tyker.Jan 30 2020, 5:07 PM

fhahn edited the summary of this revision. (Show Details)

fhahn added a child revision: D72148: [DSE] Support traversing MemoryPhis..Jan 30 2020, 5:44 PM

fhahn mentioned this in D72628: [DSE] Move state for MemorySSA-drive DSE to DSEState..

In D72700#1824724, @Tyker wrote:

In D72700#1822594, @asbirlea wrote:

Regarding handling the branching example, the solution fhahn proposes sounds reasonable to me.
I was thinking something similar, along the lines of: do the checks bottom-up from both second and third stores during the main algorithm, both find the dead store but do not postdominate it; don't abandon them, but keep this info (include the checks for no intervening reads) for after the main search. The data structure could be a Hashmap<PotentialDeadStore, ListOfDefsOrBlocksWhoFoundThisPotentialDeadStore>. If all paths are covered in the List for a PotentialDeadStore, then it's truly dead.

this seems like it should work.

I've experimented with a different, more lightweight approach to support that case, which seems to fit more natural: if we insert MemoryUses that clobber all locations visible to the caller just before each exit, we should be able to cover this case directly with the read checks: D73763

DaniilSuchkov added a subscriber: DaniilSuchkov.Jan 30 2020, 9:14 PM

Some small comments.

llvm/lib/Transforms/Scalar/DeadStoreElimination.cpp
1468	s/modeled/modelled s/MemroySSA/MemorySSA
1470	Consider adding a cl::opt limit on the things you look to process in F. We've had this issue with generated code where a BB has order of thousands stores.
1592	Reduce scope of DomAccess to inside the do-while loop and use DomDef here, hence no need for the `if (isa<MemoryDef>(DomAccess))` check - see previous comment (please mark them as done when updating).
1597	SmallSetVector to avoid processing duplicates?
1648	Can this happen? Wouldn't the getClobbering above have found UseAccess /UseInst instead of DomDef?
1691	s/arn't/aren't

Address comments, thanks!

llvm/lib/Transforms/Scalar/DeadStoreElimination.cpp
1468	I'm not a native speaker, but I think modeled is the US spelling and modelled is the UK spelling. I thought we prefer the US spelling, but I don't mind either way.
1470	I've added a rather generous limit for Defs per BB, but we can always adjust it later. We still have to check the whole basic block for throwing instructions.
1477	it's handled by MemorySSA, I've dropped that.
1544	I've added a test for the scenario: overlapping_read in multiblock-simple.ll
1546	Restrictions related to MemoryPhis are lifted in D72148
1597	Done. I think in the initial version the only nodes we might add multiple times are MemoryPhis and we bail out on the first one we see. I've added such a test to llvm/test/Transforms/DeadStoreElimination/MSSA/multiblock-memoryphis.ll
1648	I think it can happen in loops where a store in a the header is killed by a store in the exit, but it is also stored to in the loop body. I've added a few additional test cases to llvm/test/Transforms/DeadStoreElimination/MSSA/multiblock-loops.ll loop_multiple_def* . I've also moved the post dominance check in into the function and also updated the check here to skip PushMemUses for MemoryDefs that completely overwrite DefLoc. This helps with avoiding unnecessary checks.

Unit tests: unknown.

clang-tidy: unknown.

clang-format: unknown.

Build artifacts: diff.json, console-log.txt

Pre-merge checks is in beta. Report issue. Please join beta or enable it for your project.

Harbormaster failed remote builds in B45591: Diff 242070!Feb 3 2020, 7:44 AM

asbirlea marked 4 inline comments as done.Feb 3 2020, 11:48 AM

asbirlea added inline comments.

llvm/lib/Transforms/Scalar/DeadStoreElimination.cpp
1468	Yes, you're right, I'm not a native speaker either. Thank you for the correction!
1470	sgtm
1597	I think when you process uses, you may find 2 different operands for a Def, if that Def is optimized. Something like: 1 = Def(LoE) 2 = Def (1) 3 = Def (2) - Optimized field = 1 So if adding uses of 1, one may add (2, 3), then when processing 2, (3) is a use and added again. This may be too convoluted to actually happen, but thought it's worth to have the safety net. Thank you for the update!
1648	I see, yes, I wasn't thinking with the MemoryPhi condition lifted.
1684	s/it's/its

Change it's -> its and regenerate check lines.

llvm/lib/Transforms/Scalar/DeadStoreElimination.cpp
1597	Thanks for the example! That should be handled now :)
1684	Updated, thanks!

fhahn mentioned this in D72148: [DSE] Support traversing MemoryPhis..Feb 3 2020, 1:36 PM

Unit tests: unknown.

clang-tidy: unknown.

clang-format: unknown.

Build artifacts: diff.json, console-log.txt

Pre-merge checks is in beta. Report issue. Please join beta or enable it for your project.

Harbormaster failed remote builds in B45622: Diff 242172!Feb 3 2020, 1:52 PM

Some more minor comments, but I think this is a reasonable first version to check in.

llvm/lib/Transforms/Scalar/DeadStoreElimination.cpp
1383	s/there/there are
1523	Can you add a comment here, along the lines: `UseInst has a MemoryDef associated in MemorySSA. It's possible for a MemoryDef to not write to memory, e.g. a volatile load is modeled as a MemoryDef.`
1692	The MemDep variant of DSE also attempts to keep debug info. Does this also make sense here? // Try to preserve debug information attached to the dead instruction. salvageDebugInfo(*DeadInst);
1769	Nit: Move declaration of Next inside while condition?

Just curious..

Can you compare this solution vs. GCC's solution vs. PDSE (https://reviews.llvm.org/D29866)?

Addressed latest comments, thanks!

In D72700#1857699, @xbolva00 wrote:

Just curious..

Can you compare this solution vs. GCC's solution vs. PDSE (https://reviews.llvm.org/D29866)?

I unfortunately do not have in-depth knowledge of either, but I think GCC uses a similar approach (I think GCC's virtual operands are similar to MemorySSA).

IIUC PDSE tries to remove partially redundant stores, by inserting/moving stores to split points. I think that is mostly orthogonal to the MemorySSA-backed DSE in this patch, although there might be some overlap in the handled cases.

fhahn added inline comments.Feb 4 2020, 1:46 PM

llvm/lib/Transforms/Scalar/DeadStoreElimination.cpp
1692	Yes, we should definitely do this! I've updated it and there's a debug info test that passes now.

Unit tests: unknown.

clang-tidy: unknown.

clang-format: unknown.

Build artifacts: diff.json, console-log.txt

Pre-merge checks is in beta. Report issue. Please join beta or enable it for your project.

Harbormaster failed remote builds in B45717: Diff 242423!Feb 4 2020, 1:55 PM

In D72700#1857691, @asbirlea wrote:

Some more minor comments, but I think this is a reasonable first version to check in.

That's great, thanks for all the feedback!

I do think there are outstanding issues that need answers, but I believe the way to make progress is to have an initial good version and iterate on it.

The major issue is performance, and to start testing this out we need a working version in tree.

There are many MemorySSA and AA calls in this variant, that we may be able to do better. For example: build a walker to do a single getClobbering with all the preconditions on what instructions are safe to skip, instead of doing this in a loop here.

Add all missing cases from add-on patches and get parity or better stores eliminated than current DSE. Merge tests when this happens to make this clear.
The performance problem's scope goes beyond DSE. The current pass pipeline for both pass managers has a sequence of (gvn, memcpyopt, dse), where all of these use MemDepAnalysis. Switching DSE to MemorySSA may initially get worse compile times, as we need to build both MemDepAnalysis and MemorySSA, but switching all three (use newgvn and port memcpyopt and dse), may be worthwhile. This is the long term goal I have in mind.

Thanks for all the work on this!

This revision is now accepted and ready to land.Feb 7 2020, 3:06 PM

Closed by commit rGd0c4d4fe0929: [DSE] Add first version of MemorySSA-backed DSE (Bottom up walk). (authored by fhahn). · Explain WhyFeb 10 2020, 3:52 AM

This revision was automatically updated to reflect the committed changes.

In D72700#1865066, @asbirlea wrote:

I do think there are outstanding issues that need answers, but I believe the way to make progress is to have an initial good version and iterate on it.

Great, thanks for all the helpful comments!

The major issue is performance, and to start testing this out we need a working version in tree.

There are many MemorySSA and AA calls in this variant, that we may be able to do better. For example: build a walker to do a single getClobbering with all the preconditions on what instructions are safe to skip, instead of doing this in a loop here.

Agreed! I think it makes sense to look into perf-tuning once we cover most cases legacy DSE handles. Otherwise perf comparisons might be a bit skewed.

Add all missing cases from add-on patches and get parity or better stores eliminated than current DSE. Merge tests when this happens to make this clear.

One missing piece that is not covered yet in the patch series is load->store forwarding. If anybody is interested in pushing this forward, that would be great! Otherwise I'll get to that in a bit.

The performance problem's scope goes beyond DSE. The current pass pipeline for both pass managers has a sequence of (gvn, memcpyopt, dse), where all of these use MemDepAnalysis. Switching DSE to MemorySSA may initially get worse compile times, as we need to build both MemDepAnalysis and MemorySSA, but switching all three (use newgvn and port memcpyopt and dse), may be worthwhile. This is the long term goal I have in mind.

Yes, one follow-up to MSSA backed DSE is MSSA backed MemCopyOpt. My hope is that we can re-use/share some/most of the walking strategy & safety checks between DSE and MemCpyOpt.

Thanks for all the work on this!

fhahn mentioned this in D73763: [DSE] Lift post-dominance restriction..Feb 10 2020, 7:29 AM

fhahn mentioned this in D72410: [DSE] Eliminate stores by terminators (free,lifetime.end)..Feb 10 2020, 8:02 AM

fhahn mentioned this in D72631: [DSE] Eliminate stores at the end of the function..Feb 10 2020, 8:05 AM

Reverting in 42f8b915eb72364cc5e84adf58a2c2d4947e8b10 as this results in a use-after-free, see http://lab.llvm.org:8011/builders/sanitizer-x86_64-linux-fast/builds/38628/steps/check-llvm%20asan/logs/stdio

Revision Contents

Path

Size

llvm/

lib/

Transforms/

Scalar/

DeadStoreElimination.cpp

538 lines

test/

Transforms/

DeadStoreElimination/

MSSA/

2011-09-06-EndOfFunction.ll

1 line

OverwriteStoreBegin.ll

1 line

OverwriteStoreEnd.ll

1 line

atomic.ll

1 line

calloc-store.ll

2 lines

	fence-todo.ll
	fence.ll

50 lines

48 lines

2 lines

9 lines

2 lines

memcpy-complete-overwrite.ll

2 lines

memintrinsics.ll

1 line

memoryssa-scan-limit.ll

72 lines

memset-and-memcpy.ll

9 lines

memset-missing-debugloc.ll

1 line

merge-stores-big-endian.ll

1 line

merge-stores.ll

1 line

multiblock-captures.ll

7 lines

multiblock-exceptions.ll

1 line

multiblock-loops.ll

114 lines

multiblock-memoryphis.ll

70 lines

multiblock-partial.ll

3 lines

41 lines

1 line

1 line

8 lines

Diff 243504

llvm/lib/Transforms/Scalar/DeadStoreElimination.cpp

Show All 23 Lines
#include "llvm/ADT/Statistic.h"		#include "llvm/ADT/Statistic.h"
#include "llvm/ADT/StringRef.h"		#include "llvm/ADT/StringRef.h"
#include "llvm/Analysis/AliasAnalysis.h"		#include "llvm/Analysis/AliasAnalysis.h"
#include "llvm/Analysis/CaptureTracking.h"		#include "llvm/Analysis/CaptureTracking.h"
#include "llvm/Analysis/GlobalsModRef.h"		#include "llvm/Analysis/GlobalsModRef.h"
#include "llvm/Analysis/MemoryBuiltins.h"		#include "llvm/Analysis/MemoryBuiltins.h"
#include "llvm/Analysis/MemoryDependenceAnalysis.h"		#include "llvm/Analysis/MemoryDependenceAnalysis.h"
#include "llvm/Analysis/MemoryLocation.h"		#include "llvm/Analysis/MemoryLocation.h"
		#include "llvm/Analysis/MemorySSA.h"
		#include "llvm/Analysis/MemorySSAUpdater.h"
#include "llvm/Analysis/OrderedBasicBlock.h"		#include "llvm/Analysis/OrderedBasicBlock.h"
		#include "llvm/Analysis/PostDominators.h"
#include "llvm/Analysis/TargetLibraryInfo.h"		#include "llvm/Analysis/TargetLibraryInfo.h"
#include "llvm/Analysis/ValueTracking.h"		#include "llvm/Analysis/ValueTracking.h"
#include "llvm/IR/Argument.h"		#include "llvm/IR/Argument.h"
#include "llvm/IR/BasicBlock.h"		#include "llvm/IR/BasicBlock.h"
#include "llvm/IR/CallSite.h"		#include "llvm/IR/CallSite.h"
#include "llvm/IR/Constant.h"		#include "llvm/IR/Constant.h"
#include "llvm/IR/Constants.h"		#include "llvm/IR/Constants.h"
#include "llvm/IR/DataLayout.h"		#include "llvm/IR/DataLayout.h"
#include "llvm/IR/Dominators.h"		#include "llvm/IR/Dominators.h"
#include "llvm/IR/Function.h"		#include "llvm/IR/Function.h"
		#include "llvm/IR/InstIterator.h"
#include "llvm/IR/InstrTypes.h"		#include "llvm/IR/InstrTypes.h"
#include "llvm/IR/Instruction.h"		#include "llvm/IR/Instruction.h"
#include "llvm/IR/Instructions.h"		#include "llvm/IR/Instructions.h"
#include "llvm/IR/IntrinsicInst.h"		#include "llvm/IR/IntrinsicInst.h"
#include "llvm/IR/Intrinsics.h"		#include "llvm/IR/Intrinsics.h"
#include "llvm/IR/LLVMContext.h"		#include "llvm/IR/LLVMContext.h"
#include "llvm/IR/Module.h"		#include "llvm/IR/Module.h"
#include "llvm/IR/PassManager.h"		#include "llvm/IR/PassManager.h"
Show All 31 Lines	EnablePartialOverwriteTracking("enable-dse-partial-overwrite-tracking",
cl::init(true), cl::Hidden,		cl::init(true), cl::Hidden,
cl::desc("Enable partial-overwrite tracking in DSE"));		cl::desc("Enable partial-overwrite tracking in DSE"));

static cl::opt<bool>		static cl::opt<bool>
EnablePartialStoreMerging("enable-dse-partial-store-merging",		EnablePartialStoreMerging("enable-dse-partial-store-merging",
cl::init(true), cl::Hidden,		cl::init(true), cl::Hidden,
cl::desc("Enable partial store merging in DSE"));		cl::desc("Enable partial store merging in DSE"));

// Temporary dummy option for tests.
static cl::opt<bool>		static cl::opt<bool>
EnableMemorySSA("enable-dse-memoryssa", cl::init(false), cl::Hidden,		EnableMemorySSA("enable-dse-memoryssa", cl::init(false), cl::Hidden,
cl::desc("Use the new MemorySSA-backed DSE."));		cl::desc("Use the new MemorySSA-backed DSE."));

		static cl::opt<unsigned>
		MemorySSAScanLimit("dse-memoryssa-scanlimit", cl::init(100), cl::Hidden,
		cl::desc("The number of memory instructions to scan for "
		"dead store elimination (default = 100)"));

		static cl::opt<unsigned> MemorySSADefsPerBlockLimit(
		"dse-memoryssa-defs-per-block-limit", cl::init(5000), cl::Hidden,
		cl::desc("The number of MemoryDefs we consider as candidates to eliminated "
		"other stores per basic block (default = 5000)"));

//===----------------------------------------------------------------------===//		//===----------------------------------------------------------------------===//
// Helper functions		// Helper functions
//===----------------------------------------------------------------------===//		//===----------------------------------------------------------------------===//
using OverlapIntervalsTy = std::map<int64_t, int64_t>;		using OverlapIntervalsTy = std::map<int64_t, int64_t>;
using InstOverlapIntervalsTy = DenseMap<Instruction *, OverlapIntervalsTy>;		using InstOverlapIntervalsTy = DenseMap<Instruction *, OverlapIntervalsTy>;

/// Delete this instruction. Before we do, go through and zero out all the		/// Delete this instruction. Before we do, go through and zero out all the
/// operands of this instruction. If any of them become dead, delete them and		/// operands of this instruction. If any of them become dead, delete them and
▲ Show 20 Lines • Show All 1,246 Lines • ▼ Show 20 Lines	for (BasicBlock &BB : F)
// Only check non-dead blocks. Dead blocks may have strange pointer		// Only check non-dead blocks. Dead blocks may have strange pointer
// cycles that will confuse alias analysis.		// cycles that will confuse alias analysis.
if (DT->isReachableFromEntry(&BB))		if (DT->isReachableFromEntry(&BB))
MadeChange \|= eliminateDeadStores(BB, AA, MD, DT, TLI);		MadeChange \|= eliminateDeadStores(BB, AA, MD, DT, TLI);

return MadeChange;		return MadeChange;
}		}

		namespace {
		//=============================================================================
		// MemorySSA backed dead store elimination.
		//
		// The code below implements dead store elimination using MemorySSA. It uses
		// the following general approach: given a MemoryDef, walk upwards to find
		// clobbering MemoryDefs that may be killed by the starting def. Then check
		// that there are no uses that may read the location of the original MemoryDef
		// in between both MemoryDefs. A bit more concretely:
		//
		// For all MemoryDefs StartDef:
		// 1. Get the next dominating clobbering MemoryDef (DomAccess) by walking
		// upwards.
		// 2. Check that there are no reads between DomAccess and the StartDef by
		asbirleaUnsubmitted Done Reply Inline Actions s/there/there are asbirlea: s/there/there are
		// checking all uses starting at DomAccess and walking until we see StartDef.
		// 3. For each found DomDef, check that:
		// 1. There are no barrier instructions between DomDef and StartDef (like
		// throws or stores with ordering constraints).
		// 2. StartDef is executed whenever DomDef is executed.
		// 3. StartDef completely overwrites DomDef.
		// 4. Erase DomDef from the function and MemorySSA.

		// Returns true if \p M is an intrisnic that does not read or write memory.
		bool isNoopIntrinsic(MemoryUseOrDef *M) {
		if (const IntrinsicInst *II = dyn_cast<IntrinsicInst>(M->getMemoryInst())) {
		switch (II->getIntrinsicID()) {
		case Intrinsic::lifetime_start:
		case Intrinsic::lifetime_end:
		case Intrinsic::invariant_end:
		case Intrinsic::launder_invariant_group:
		case Intrinsic::assume:
		return true;
		case Intrinsic::dbg_addr:
		case Intrinsic::dbg_declare:
		case Intrinsic::dbg_label:
		case Intrinsic::dbg_value:
		llvm_unreachable("Intrinsic should not be modeled in MemorySSA");
		default:
		return false;
		}
		}
		return false;
		}

		// Check if we can ignore \p D for DSE.
		bool canSkipDef(MemoryDef *D, bool DefVisibleToCaller) {
		Instruction *DI = D->getMemoryInst();
		// Calls that only access inaccessible memory cannot read or write any memory
		// locations we consider for elimination.
		if (auto CS = CallSite(DI))
		if (CS.onlyAccessesInaccessibleMemory())
		return true;

		// We can eliminate stores to locations not visible to the caller across
		// throwing instructions.
		if (DI->mayThrow() && !DefVisibleToCaller)
		return true;

		// We can remove the dead stores, irrespective of the fence and its ordering
		// (release/acquire/seq_cst). Fences only constraints the ordering of
		// already visible stores, it does not make a store visible to other
		// threads. So, skipping over a fence does not change a store from being
		// dead.
		if (isa<FenceInst>(DI))
		return true;

		// Skip intrinsics that do not really read or modify memory.
		if (isNoopIntrinsic(D))
		return true;

		return false;
		}

		struct DSEState {
		Function &F;
		AliasAnalysis &AA;
		MemorySSA &MSSA;
		DominatorTree &DT;
		PostDominatorTree &PDT;
		const TargetLibraryInfo &TLI;

		// All MemoryDefs that potentially could kill other MemDefs.
		SmallVector<MemoryDef *, 64> MemDefs;
		// Any that should be skipped as they are already deleted
		SmallPtrSet<MemoryAccess *, 4> SkipStores;
		// Keep track of all of the objects that are invisible to the caller until the
		// function returns.
		SmallPtrSet<const Value *, 16> InvisibleToCaller;
		// Keep track of blocks with throwing instructions not modeled in MemorySSA.
		SmallPtrSet<BasicBlock *, 16> ThrowingBlocks;

		DSEState(Function &F, AliasAnalysis &AA, MemorySSA &MSSA, DominatorTree &DT,
		PostDominatorTree &PDT, const TargetLibraryInfo &TLI)
		: F(F), AA(AA), MSSA(MSSA), DT(DT), PDT(PDT), TLI(TLI) {}

		static DSEState get(Function &F, AliasAnalysis &AA, MemorySSA &MSSA,
		DominatorTree &DT, PostDominatorTree &PDT,
		const TargetLibraryInfo &TLI) {
		DSEState State(F, AA, MSSA, DT, PDT, TLI);
		asbirleaUnsubmitted Done Reply Inline Actions s/modeled/modelled s/MemroySSA/MemorySSA asbirlea: s/modeled/modelled s/MemroySSA/MemorySSA
		fhahnAuthorUnsubmitted Done Reply Inline Actions I'm not a native speaker, but I think modeled is the US spelling and modelled is the UK spelling. I thought we prefer the US spelling, but I don't mind either way. fhahn: I'm not a native speaker, but I think modeled is the US spelling and modelled is the UK…
		asbirleaUnsubmitted Done Reply Inline Actions Yes, you're right, I'm not a native speaker either. Thank you for the correction! asbirlea: Yes, you're right, I'm not a native speaker either. Thank you for the correction!
		// Collect blocks with throwing instructions not modeled in MemorySSA and
		// alloc-like objects.
		asbirleaUnsubmitted Not Done Reply Inline Actions Consider adding a cl::opt limit on the things you look to process in F. We've had this issue with generated code where a BB has order of thousands stores. asbirlea: Consider adding a cl::opt limit on the things you look to process in F. We've had this issue…
		fhahnAuthorUnsubmitted Done Reply Inline Actions I've added a rather generous limit for Defs per BB, but we can always adjust it later. We still have to check the whole basic block for throwing instructions. fhahn: I've added a rather generous limit for Defs per BB, but we can always adjust it later. We still…
		asbirleaUnsubmitted Done Reply Inline Actions sgtm asbirlea: sgtm
		for (Instruction &I : instructions(F)) {
		if (I.mayThrow() && !MSSA.getMemoryAccess(&I))
		State.ThrowingBlocks.insert(I.getParent());

		auto *MD = dyn_cast_or_null<MemoryDef>(MSSA.getMemoryAccess(&I));
		if (MD && State.MemDefs.size() < MemorySSADefsPerBlockLimit &&
		hasAnalyzableMemoryWrite(&I, TLI) && isRemovable(&I))
		asbirleaUnsubmitted Done Reply Inline Actions MemorySSA should handle this, if it doesn't we should fix that. asbirlea: MemorySSA should handle this, if it doesn't we should fix that.
		fhahnAuthorUnsubmitted Done Reply Inline Actions it's handled by MemorySSA, I've dropped that. fhahn: it's handled by MemorySSA, I've dropped that.
		State.MemDefs.push_back(MD);

		// Track alloca and alloca-like objects. Here we care about objects not
		// visible to the caller during function execution. Alloca objects are
		// invalid in the caller, for alloca-like objects we ensure that they are
		// not captured throughout the function.
		if (isa<AllocaInst>(&I) \|\|
		(isAllocLikeFn(&I, &TLI) && !PointerMayBeCaptured(&I, false, true)))
		State.InvisibleToCaller.insert(&I);
		}
		// Treat byval or inalloca arguments the same as Allocas, stores to them are
		// dead at the end of the function.
		for (Argument &AI : F.args())
		if (AI.hasByValOrInAllocaAttr())
		State.InvisibleToCaller.insert(&AI);
		return State;
		}

		Optional<MemoryLocation> getLocForWriteEx(Instruction *I) const {
		if (!I->mayWriteToMemory())
		return None;

		if (auto *MTI = dyn_cast<AnyMemIntrinsic>(I))
		return {MemoryLocation::getForDest(MTI)};

		if (auto CS = CallSite(I)) {
		if (Function *F = CS.getCalledFunction()) {
		StringRef FnName = F->getName();
		if (TLI.has(LibFunc_strcpy) && FnName == TLI.getName(LibFunc_strcpy))
		return {MemoryLocation(CS.getArgument(0))};
		if (TLI.has(LibFunc_strncpy) && FnName == TLI.getName(LibFunc_strncpy))
		return {MemoryLocation(CS.getArgument(0))};
		if (TLI.has(LibFunc_strcat) && FnName == TLI.getName(LibFunc_strcat))
		return {MemoryLocation(CS.getArgument(0))};
		if (TLI.has(LibFunc_strncat) && FnName == TLI.getName(LibFunc_strncat))
		return {MemoryLocation(CS.getArgument(0))};
		}
		return None;
		}

		return MemoryLocation::getOrNone(I);
		}

		/// Returns true if \p Use completely overwrites \p DefLoc.
		bool isCompleteOverwrite(MemoryLocation DefLoc, Instruction *UseInst) const {
		// UseInst has a MemoryDef associated in MemorySSA. It's possible for a
		asbirleaUnsubmitted Done Reply Inline Actions Can you add a comment here, along the lines: `UseInst has a MemoryDef associated in MemorySSA. It's possible for a MemoryDef to not write to memory, e.g. a volatile load is modeled as a MemoryDef.` asbirlea: Can you add a comment here, along the lines: `UseInst has a MemoryDef associated in MemorySSA.
		// MemoryDef to not write to memory, e.g. a volatile load is modeled as a
		// MemoryDef.
		if (!UseInst->mayWriteToMemory())
		return false;

		if (auto CS = CallSite(UseInst))
		if (CS.onlyAccessesInaccessibleMemory())
		return false;

		ModRefInfo MR = AA.getModRefInfo(UseInst, DefLoc);
		// If necessary, perform additional analysis.
		if (isModSet(MR))
		MR = AA.callCapturesBefore(UseInst, DefLoc, &DT);

		Optional<MemoryLocation> UseLoc = getLocForWriteEx(UseInst);
		return isModSet(MR) && isMustSet(MR) &&
		UseLoc->Size.getValue() >= DefLoc.Size.getValue();
		}
		asbirleaUnsubmitted Done Reply Inline Actions Nit: Use only DomDef, known to be non-null outside the loop. Restrict scope of DomAccess to inside the loop above. asbirlea: Nit: Use only DomDef, known to be non-null outside the loop. Restrict scope of DomAccess to…

		/// Returns true if \p Use may read from \p DefLoc.
		bool isReadClobber(MemoryLocation DefLoc, Instruction *UseInst) const {
		asbirleaUnsubmitted Not Done Reply Inline Actions Walking uses may miss cases due to aliasing not being transitive. This needs to be throughly analyzed. Here's a very rough example. 1 = Def(LoE) ; <----- DomDef stores [0,3] 2 = Def(1) ; (2, 1)=NoAlias, stores [4,7] Use(2) ; MayAlias 2 and 1, the Use points to the first Def it may alias, loads [0, 7]. 3 = Def(1) ; <---- Current (3, 2)=NoAlias, (3,1)=MayAlias, stores [0,3] The situation may be simplified due to handling stores, but all Uses may need looking at. Note this will not work to recurse on uses of the uses either. Rough example why: 1 = Def(LoE) 2 = Def(1) ; <----- DomDef 3 = Def(1) ; (3, 2)=NoAlias Use(3) ; MayAlias 3 and 2, the Use points to the first Def it may alias. 4 = Def(2) ; <---- Current (4, 3)=NoAlias, (4,2)=MayAlias asbirlea: Walking uses may miss cases due to aliasing not being transitive. This needs to be throughly…
		fhahnAuthorUnsubmitted Done Reply Inline Actions Thanks for the example. I think I might be missing something for the optimized version though. From the example in MemorySSA.h: /// Given this form, all the stores that could ever effect the load at %8 can be /// gotten by using the MemoryUse associated with it, and walking from use to /// def until you hit the top of the function. From the comment above, shouldn't we visit both 2 and 3 when walking up from Use(3), as both may change the read location? In the example we would only see 3 and 1. But even assuming we would visit both 2 and 3, I think I can see how we could end up with scenarios we could miss overlapping reads. I think we would have to take a look at all users of DomDef and we specifically cannot skip any non-aliasing MemoryDefs for the read-checks. That would make things more expensive, but would be something we have to do regardless of going bottom-up/top-down. Does that make sense to you? However I think that would mean that in the worst-case, we would have to do a top-down walk similar to the general top-down approach for the read checks. fhahn: Thanks for the example. I think I might be missing something for the optimized version though.
		asbirleaUnsubmitted Done Reply Inline Actions I think I know what I missed. MemoryDefs keep two fields, the defining and the optimized access. So `3 = Def(1)` actually looks like this: `3 = Def(2) - Optimized 1` , and is a user of both 1 and 2. Yes, I think you're right that the read checks for all accesses are needed regardless of which approach is taken. And yes, it will be expensive. I came across something similar in LICM, and I limited or outright avoided analyzing all uses against a store to avoid the cost of analyzing all of them (see ~LICM.cpp:1300) asbirlea: I think I know what I missed. MemoryDefs keep two fields, the defining and the optimized access.
		fhahnAuthorUnsubmitted Done Reply Inline Actions I've added a test for the scenario: overlapping_read in multiblock-simple.ll fhahn: I've added a test for the scenario: overlapping_read in multiblock-simple.ll
		if (!UseInst->mayReadFromMemory())
		return false;
		asbirleaUnsubmitted Done Reply Inline Actions You're right this can be partially lifted. Here's an example: %a: 1 = Def(LoE) ; <----- DomDef br %cond1, %b, %c %b: 2 = Def(1) br %cond2, %d, %e %c: br %e %d: br %f %e: 3 = phi(1,2) br %f %f: 4 = Phi(2,3) 5 = Def(4) ; <---- Current asbirlea: You're right this can be partially lifted. Here's an example: ``` %a: 1 = Def(LoE) ; <…
		fhahnAuthorUnsubmitted Done Reply Inline Actions Restrictions related to MemoryPhis are lifted in D72148 fhahn: Restrictions related to MemoryPhis are lifted in D72148

		if (auto CS = CallSite(UseInst))
		if (CS.onlyAccessesInaccessibleMemory())
		return false;

		ModRefInfo MR = AA.getModRefInfo(UseInst, DefLoc);
		// If necessary, perform additional analysis.
		if (isRefSet(MR))
		MR = AA.callCapturesBefore(UseInst, DefLoc, &DT);
		return isRefSet(MR);
		}

		// Find a MemoryDef writing to \p DefLoc and dominating \p Current, with no
		// read access in between or return None otherwise. The returned value may not
		// (completely) overwrite \p DefLoc. Currently we bail out when we encounter
		// any of the following
		// * An aliasing MemoryUse (read).
		// * A MemoryPHI.
		Optional<MemoryAccess > getDomMemoryDef(MemoryDef KillingDef,
		MemoryAccess *Current,
		MemoryLocation DefLoc,
		bool DefVisibleToCaller,
		int &ScanLimit) const {
		MemoryDef *DomDef;
		MemoryAccess *StartDef = Current;
		bool StepAgain;
		LLVM_DEBUG(dbgs() << " trying to get dominating access for " << *Current
		<< "\n");
		// Find the next clobbering Mod access for DefLoc, starting at Current.
		do {
		StepAgain = false;
		// Reached TOP.
		if (MSSA.isLiveOnEntryDef(Current))
		return None;

		MemoryUseOrDef *CurrentUD = dyn_cast<MemoryUseOrDef>(Current);
		if (!CurrentUD)
		return None;

		// Look for access that clobber DefLoc.
		MemoryAccess *DomAccess =
		MSSA.getSkipSelfWalker()->getClobberingMemoryAccess(
		CurrentUD->getDefiningAccess(), DefLoc);
		DomDef = dyn_cast<MemoryDef>(DomAccess);
		if (!DomDef \|\| MSSA.isLiveOnEntryDef(DomDef))
		return None;
		asbirleaUnsubmitted Done Reply Inline Actions Reduce scope of DomAccess to inside the do-while loop and use DomDef here, hence no need for the `if (isa<MemoryDef>(DomAccess))` check - see previous comment (please mark them as done when updating). asbirlea: Reduce scope of DomAccess to inside the do-while loop and use DomDef here, hence no need for…

		// Check if we can skip DomDef for DSE. We also require the KillingDef
		// execute whenever DomDef executes and use post-dominance to ensure that.
		if (canSkipDef(DomDef, DefVisibleToCaller) \|\|
		!PDT.dominates(KillingDef->getBlock(), DomDef->getBlock())) {
		asbirleaUnsubmitted Done Reply Inline Actions SmallSetVector to avoid processing duplicates? asbirlea: SmallSetVector to avoid processing duplicates?
		fhahnAuthorUnsubmitted Not Done Reply Inline Actions Done. I think in the initial version the only nodes we might add multiple times are MemoryPhis and we bail out on the first one we see. I've added such a test to llvm/test/Transforms/DeadStoreElimination/MSSA/multiblock-memoryphis.ll fhahn: Done. I think in the initial version the only nodes we might add multiple times are…
		asbirleaUnsubmitted Done Reply Inline Actions I think when you process uses, you may find 2 different operands for a Def, if that Def is optimized. Something like: 1 = Def(LoE) 2 = Def (1) 3 = Def (2) - Optimized field = 1 So if adding uses of 1, one may add (2, 3), then when processing 2, (3) is a use and added again. This may be too convoluted to actually happen, but thought it's worth to have the safety net. Thank you for the update! asbirlea: I think when you process uses, you may find 2 different operands for a Def, if that Def is…
		fhahnAuthorUnsubmitted Done Reply Inline Actions Thanks for the example! That should be handled now :) fhahn: Thanks for the example! That should be handled now :)
		StepAgain = true;
		Current = DomDef;
		}

		} while (StepAgain);

		LLVM_DEBUG(dbgs() << " Checking for reads of " << *DomDef << " ("
		<< *DomDef->getMemoryInst() << ")\n");

		SmallSetVector<MemoryAccess *, 32> WorkList;
		auto PushMemUses = [&WorkList](MemoryAccess *Acc) {
		for (Use &U : Acc->uses())
		WorkList.insert(cast<MemoryAccess>(U.getUser()));
		};
		PushMemUses(DomDef);

		// Check if DomDef may be read.
		for (unsigned I = 0; I < WorkList.size(); I++) {
		MemoryAccess *UseAccess = WorkList[I];

		LLVM_DEBUG(dbgs() << " Checking use " << *UseAccess);
		if (--ScanLimit == 0) {
		LLVM_DEBUG(dbgs() << " ... hit scan limit\n");
		return None;
		}

		// Bail out on MemoryPhis for now.
		if (isa<MemoryPhi>(UseAccess)) {
		LLVM_DEBUG(dbgs() << " ... hit MemoryPhi\n");
		return None;
		}

		Instruction *UseInst = cast<MemoryUseOrDef>(UseAccess)->getMemoryInst();
		LLVM_DEBUG(dbgs() << " (" << *UseInst << ")\n");

		if (isNoopIntrinsic(cast<MemoryUseOrDef>(UseAccess))) {
		PushMemUses(UseAccess);
		continue;
		}

		// Uses which may read the original MemoryDef mean we cannot eliminate the
		// original MD. Stop walk.
		if (isReadClobber(DefLoc, UseInst)) {
		LLVM_DEBUG(dbgs() << " ... found read clobber\n");
		return None;
		}

		if (StartDef == UseAccess)
		continue;

		// Check all uses for MemoryDefs, except for defs completely overwriting
		asbirleaUnsubmitted Done Reply Inline Actions Can this happen? Wouldn't the getClobbering above have found UseAccess /UseInst instead of DomDef? asbirlea: Can this happen? Wouldn't the getClobbering above have found UseAccess /UseInst instead of…
		fhahnAuthorUnsubmitted Done Reply Inline Actions I think it can happen in loops where a store in a the header is killed by a store in the exit, but it is also stored to in the loop body. I've added a few additional test cases to llvm/test/Transforms/DeadStoreElimination/MSSA/multiblock-loops.ll loop_multiple_def* . I've also moved the post dominance check in into the function and also updated the check here to skip PushMemUses for MemoryDefs that completely overwrite DefLoc. This helps with avoiding unnecessary checks. fhahn: I think it can happen in loops where a store in a the header is killed by a store in the exit…
		asbirleaUnsubmitted Done Reply Inline Actions I see, yes, I wasn't thinking with the MemoryPhi condition lifted. asbirlea: I see, yes, I wasn't thinking with the MemoryPhi condition lifted.
		// the original location. Otherwise we have to check uses of all
		// MemoryDefs we discover, including non-aliasing ones. Otherwise we might
		// miss cases like the following
		// 1 = Def(LoE) ; <----- DomDef stores [0,1]
		// 2 = Def(1) ; (2, 1) = NoAlias, stores [2,3]
		// Use(2) ; MayAlias 2 and 1, loads [0, 3].
		// (The Use points to the first Def it may alias)
		// 3 = Def(1) ; <---- Current (3, 2) = NoAlias, (3,1) = MayAlias,
		// stores [0,1]
		if (MemoryDef *UseDef = dyn_cast<MemoryDef>(UseAccess)) {
		if (!isCompleteOverwrite(DefLoc, UseInst))
		PushMemUses(UseDef);
		}
		}

		// No aliasing MemoryUses of DomDef found, DomDef is potentially dead.
		return {DomDef};
		}

		// Delete dead memory defs
		void deleteDeadInstruction(Instruction *SI) {
		MemorySSAUpdater Updater(&MSSA);
		SmallVector<Instruction *, 32> NowDeadInsts;
		NowDeadInsts.push_back(SI);
		--NumFastOther;

		while (!NowDeadInsts.empty()) {
		Instruction *DeadInst = NowDeadInsts.pop_back_val();
		++NumFastOther;

		// Try to preserve debug information attached to the dead instruction.
		salvageDebugInfo(*DeadInst);

		// Remove the Instruction from MSSA.
		if (MemoryAccess *MA = MSSA.getMemoryAccess(DeadInst)) {
		Updater.removeMemoryAccess(MA);
		asbirleaUnsubmitted Done Reply Inline Actions s/it's/its asbirlea: s/it's/its
		fhahnAuthorUnsubmitted Done Reply Inline Actions Updated, thanks! fhahn: Updated, thanks!
		if (MemoryDef *MD = dyn_cast<MemoryDef>(MA)) {
		SkipStores.insert(MD);
		}
		}

		// Remove its operands
		for (Use &O : DeadInst->operands())
		asbirleaUnsubmitted Done Reply Inline Actions s/arn't/aren't asbirlea: s/arn't/aren't
		if (Instruction *OpI = dyn_cast<Instruction>(O)) {
		asbirleaUnsubmitted Not Done Reply Inline Actions The MemDep variant of DSE also attempts to keep debug info. Does this also make sense here? // Try to preserve debug information attached to the dead instruction. salvageDebugInfo(DeadInst); asbirlea:* The MemDep variant of DSE also attempts to keep debug info. Does this also make sense here? ```…
		fhahnAuthorUnsubmitted Done Reply Inline Actions Yes, we should definitely do this! I've updated it and there's a debug info test that passes now. fhahn: Yes, we should definitely do this! I've updated it and there's a debug info test that passes…
		O = nullptr;
		if (isInstructionTriviallyDead(OpI, &TLI))
		NowDeadInsts.push_back(OpI);
		}

		DeadInst->eraseFromParent();
		}
		}

		// Check for any extra throws between SI and NI that block DSE. This only
		// checks extra maythrows (those that aren't MemoryDef's). MemoryDef that may
		// throw are handled during the walk from one def to the next.
		bool mayThrowBetween(Instruction SI, Instruction NI,
		const Value *SILocUnd) const {
		// First see if we can ignore it by using the fact that SI is an
		// alloca/alloca like object that is not visible to the caller during
		// execution of the function.
		if (SILocUnd && InvisibleToCaller.count(SILocUnd))
		return false;

		if (SI->getParent() == NI->getParent())
		return ThrowingBlocks.find(SI->getParent()) != ThrowingBlocks.end();
		return !ThrowingBlocks.empty();
		}

		// Check if \p NI acts as a DSE barrier for \p SI. The following instructions
		// act as barriers:
		// * A memory instruction that may throw and \p SI accesses a non-stack
		// object.
		// * Atomic stores stronger that monotonic.
		bool isDSEBarrier(Instruction *SI, MemoryLocation &SILoc,
		const Value SILocUnd, Instruction NI,
		MemoryLocation &NILoc) const {
		// If NI may throw it acts as a barrier, unless we are to an alloca/alloca
		// like object that does not escape.
		if (NI->mayThrow() && !InvisibleToCaller.count(SILocUnd))
		return true;

		if (NI->isAtomic()) {
		if (auto *NSI = dyn_cast<StoreInst>(NI)) {
		if (isStrongerThanMonotonic(NSI->getOrdering()))
		return true;
		} else
		llvm_unreachable(
		"Other instructions should be modeled/skipped in MemorySSA");
		}

		return false;
		}
		};

		bool eliminateDeadStoresMemorySSA(Function &F, AliasAnalysis &AA,
		MemorySSA &MSSA, DominatorTree &DT,
		PostDominatorTree &PDT,
		const TargetLibraryInfo &TLI) {
		const DataLayout &DL = F.getParent()->getDataLayout();
		bool MadeChange = false;
		asbirleaUnsubmitted Done Reply Inline Actions I don't see how this being in a loop will work. Shouldn't this be a "give up"? Example: ; 1 = MemoryDef (LoE) store a ; 2 = MemoryDef(1) call_reading_a_and_overwriting_a ; 3 = MemoryDef(2) store a Once the getClobbering found a mayAlias of 3 with 2, even if an overwrite is not proven, def 1 cannot be removed. I may have missed such a check in `getDomMemoryDef`. asbirlea: I don't see how this being in a loop will work. Shouldn't this be a "give up"? Example: ``` ; 1…
		fhahnAuthorUnsubmitted Done Reply Inline Actions I had another look at the getClobberingMemoryAccess, and we indeed need an additional check ensuring that the memorydef does not also read the original location! fhahn: I had another look at the getClobberingMemoryAccess, and we indeed need an additional check…

		DSEState State = DSEState::get(F, AA, MSSA, DT, PDT, TLI);
		// For each store:
		for (unsigned I = 0; I < State.MemDefs.size(); I++) {
		MemoryDef *Current = State.MemDefs[I];
		if (State.SkipStores.count(Current))
		continue;
		Instruction *SI = cast<MemoryDef>(Current)->getMemoryInst();
		auto MaybeSILoc = State.getLocForWriteEx(SI);
		if (!MaybeSILoc) {
		LLVM_DEBUG(dbgs() << "Failed to find analyzable write location for "
		<< *SI << "\n");
		continue;
		}
		MemoryLocation SILoc = *MaybeSILoc;
		assert(SILoc.Ptr && "SILoc should not be null");
		const Value *SILocUnd = GetUnderlyingObject(SILoc.Ptr, DL);
		Instruction *DefObj =
		const_cast<Instruction *>(dyn_cast<Instruction>(SILocUnd));
		bool DefVisibleToCaller = !State.InvisibleToCaller.count(SILocUnd);
		asbirleaUnsubmitted Done Reply Inline Actions Nit: Move declaration of Next inside while condition? asbirlea: Nit: Move declaration of Next inside while condition?
		if (DefObj && ((isAllocLikeFn(DefObj, &TLI) &&
		!PointerMayBeCapturedBefore(DefObj, false, true, SI, &DT))))
		DefVisibleToCaller = false;

		LLVM_DEBUG(dbgs() << "Trying to eliminate MemoryDefs killed by " << *SI
		<< "\n");

		int ScanLimit = MemorySSAScanLimit;
		MemoryDef *StartDef = Current;
		// Walk MemorySSA upward to find MemoryDefs that might be killed by SI.
		while (Optional<MemoryAccess *> Next = State.getDomMemoryDef(
		StartDef, Current, SILoc, DefVisibleToCaller, ScanLimit)) {
		MemoryAccess DomAccess = Next;
		LLVM_DEBUG(dbgs() << " Checking if we can kill " << *DomAccess << "\n");
		MemoryDef *NextDef = dyn_cast<MemoryDef>(DomAccess);
		Instruction *NI = NextDef->getMemoryInst();
		LLVM_DEBUG(dbgs() << " def " << *NI << "\n");

		if (!hasAnalyzableMemoryWrite(NI, TLI))
		break;
		MemoryLocation NILoc = *State.getLocForWriteEx(NI);
		// Check for anything that looks like it will be a barrier to further
		// removal
		if (State.isDSEBarrier(SI, SILoc, SILocUnd, NI, NILoc)) {
		LLVM_DEBUG(dbgs() << " stop, barrier\n");
		break;
		}

		// Before we try to remove anything, check for any extra throwing
		// instructions that block us from DSEing
		if (State.mayThrowBetween(SI, NI, SILocUnd)) {
		LLVM_DEBUG(dbgs() << " stop, may throw!\n");
		break;
		}

		// Check if NI overwrites SI.
		int64_t InstWriteOffset, DepWriteOffset;
		InstOverlapIntervalsTy IOL;
		OverwriteResult OR = isOverwrite(SILoc, NILoc, DL, TLI, DepWriteOffset,
		InstWriteOffset, NI, IOL, AA, &F);

		if (OR == OW_Complete) {
		LLVM_DEBUG(dbgs() << "DSE: Remove Dead Store:\n DEAD: " << *NI
		<< "\n KILLER: " << *SI << '\n');
		State.deleteDeadInstruction(NI);
		++NumFastStores;
		MadeChange = true;
		} else
		Current = NextDef;
		}
		}

		return MadeChange;
		}
		} // end anonymous namespace

//===----------------------------------------------------------------------===//		//===----------------------------------------------------------------------===//
// DSE Pass		// DSE Pass
//===----------------------------------------------------------------------===//		//===----------------------------------------------------------------------===//
PreservedAnalyses DSEPass::run(Function &F, FunctionAnalysisManager &AM) {		PreservedAnalyses DSEPass::run(Function &F, FunctionAnalysisManager &AM) {
AliasAnalysis *AA = &AM.getResult<AAManager>(F);		AliasAnalysis &AA = AM.getResult<AAManager>(F);
DominatorTree *DT = &AM.getResult<DominatorTreeAnalysis>(F);		const TargetLibraryInfo &TLI = AM.getResult<TargetLibraryAnalysis>(F);
MemoryDependenceResults *MD = &AM.getResult<MemoryDependenceAnalysis>(F);		DominatorTree &DT = AM.getResult<DominatorTreeAnalysis>(F);
const TargetLibraryInfo *TLI = &AM.getResult<TargetLibraryAnalysis>(F);
		if (EnableMemorySSA) {
		MemorySSA &MSSA = AM.getResult<MemorySSAAnalysis>(F).getMSSA();
		PostDominatorTree &PDT = AM.getResult<PostDominatorTreeAnalysis>(F);

if (!eliminateDeadStores(F, AA, MD, DT, TLI))		if (!eliminateDeadStoresMemorySSA(F, AA, MSSA, DT, PDT, TLI))
return PreservedAnalyses::all();		return PreservedAnalyses::all();
		} else {
		MemoryDependenceResults &MD = AM.getResult<MemoryDependenceAnalysis>(F);

		if (!eliminateDeadStores(F, &AA, &MD, &DT, &TLI))
		return PreservedAnalyses::all();
		}

PreservedAnalyses PA;		PreservedAnalyses PA;
PA.preserveSet<CFGAnalyses>();		PA.preserveSet<CFGAnalyses>();
PA.preserve<GlobalsAA>();		PA.preserve<GlobalsAA>();
		if (EnableMemorySSA)
		PA.preserve<MemorySSAAnalysis>();
		else
PA.preserve<MemoryDependenceAnalysis>();		PA.preserve<MemoryDependenceAnalysis>();
return PA;		return PA;
}		}

namespace {		namespace {

/// A legacy pass for the legacy pass manager that wraps \c DSEPass.		/// A legacy pass for the legacy pass manager that wraps \c DSEPass.
class DSELegacyPass : public FunctionPass {		class DSELegacyPass : public FunctionPass {
public:		public:
static char ID; // Pass identification, replacement for typeid		static char ID; // Pass identification, replacement for typeid

DSELegacyPass() : FunctionPass(ID) {		DSELegacyPass() : FunctionPass(ID) {
initializeDSELegacyPassPass(*PassRegistry::getPassRegistry());		initializeDSELegacyPassPass(*PassRegistry::getPassRegistry());
}		}

bool runOnFunction(Function &F) override {		bool runOnFunction(Function &F) override {
if (skipFunction(F))		if (skipFunction(F))
return false;		return false;

DominatorTree *DT = &getAnalysis<DominatorTreeWrapperPass>().getDomTree();		AliasAnalysis &AA = getAnalysis<AAResultsWrapperPass>().getAAResults();
AliasAnalysis *AA = &getAnalysis<AAResultsWrapperPass>().getAAResults();		DominatorTree &DT = getAnalysis<DominatorTreeWrapperPass>().getDomTree();
MemoryDependenceResults *MD =		const TargetLibraryInfo &TLI =
&getAnalysis<MemoryDependenceWrapperPass>().getMemDep();		getAnalysis<TargetLibraryInfoWrapperPass>().getTLI(F);
const TargetLibraryInfo *TLI =
&getAnalysis<TargetLibraryInfoWrapperPass>().getTLI(F);		if (EnableMemorySSA) {
		MemorySSA &MSSA = getAnalysis<MemorySSAWrapperPass>().getMSSA();
		PostDominatorTree &PDT =
		getAnalysis<PostDominatorTreeWrapperPass>().getPostDomTree();

return eliminateDeadStores(F, AA, MD, DT, TLI);		return eliminateDeadStoresMemorySSA(F, AA, MSSA, DT, PDT, TLI);
		} else {
		MemoryDependenceResults &MD =
		getAnalysis<MemoryDependenceWrapperPass>().getMemDep();

		return eliminateDeadStores(F, &AA, &MD, &DT, &TLI);
		}
}		}

void getAnalysisUsage(AnalysisUsage &AU) const override {		void getAnalysisUsage(AnalysisUsage &AU) const override {
AU.setPreservesCFG();		AU.setPreservesCFG();
AU.addRequired<DominatorTreeWrapperPass>();
AU.addRequired<AAResultsWrapperPass>();		AU.addRequired<AAResultsWrapperPass>();
AU.addRequired<MemoryDependenceWrapperPass>();
AU.addRequired<TargetLibraryInfoWrapperPass>();		AU.addRequired<TargetLibraryInfoWrapperPass>();
AU.addPreserved<DominatorTreeWrapperPass>();
AU.addPreserved<GlobalsAAWrapperPass>();		AU.addPreserved<GlobalsAAWrapperPass>();
		AU.addRequired<DominatorTreeWrapperPass>();
		AU.addPreserved<DominatorTreeWrapperPass>();

		if (EnableMemorySSA) {
		AU.addRequired<PostDominatorTreeWrapperPass>();
		AU.addRequired<MemorySSAWrapperPass>();
		AU.addPreserved<PostDominatorTreeWrapperPass>();
		AU.addPreserved<MemorySSAWrapperPass>();
		} else {
		AU.addRequired<MemoryDependenceWrapperPass>();
AU.addPreserved<MemoryDependenceWrapperPass>();		AU.addPreserved<MemoryDependenceWrapperPass>();
}		}
		}
};		};

} // end anonymous namespace		} // end anonymous namespace

char DSELegacyPass::ID = 0;		char DSELegacyPass::ID = 0;

INITIALIZE_PASS_BEGIN(DSELegacyPass, "dse", "Dead Store Elimination", false,		INITIALIZE_PASS_BEGIN(DSELegacyPass, "dse", "Dead Store Elimination", false,
false)		false)
INITIALIZE_PASS_DEPENDENCY(DominatorTreeWrapperPass)		INITIALIZE_PASS_DEPENDENCY(DominatorTreeWrapperPass)
		INITIALIZE_PASS_DEPENDENCY(PostDominatorTreeWrapperPass)
INITIALIZE_PASS_DEPENDENCY(AAResultsWrapperPass)		INITIALIZE_PASS_DEPENDENCY(AAResultsWrapperPass)
INITIALIZE_PASS_DEPENDENCY(GlobalsAAWrapperPass)		INITIALIZE_PASS_DEPENDENCY(GlobalsAAWrapperPass)
		INITIALIZE_PASS_DEPENDENCY(MemorySSAWrapperPass)
INITIALIZE_PASS_DEPENDENCY(MemoryDependenceWrapperPass)		INITIALIZE_PASS_DEPENDENCY(MemoryDependenceWrapperPass)
INITIALIZE_PASS_DEPENDENCY(TargetLibraryInfoWrapperPass)		INITIALIZE_PASS_DEPENDENCY(TargetLibraryInfoWrapperPass)
INITIALIZE_PASS_END(DSELegacyPass, "dse", "Dead Store Elimination", false,		INITIALIZE_PASS_END(DSELegacyPass, "dse", "Dead Store Elimination", false,
false)		false)

FunctionPass *llvm::createDeadStoreEliminationPass() {		FunctionPass *llvm::createDeadStoreEliminationPass() {
return new DSELegacyPass();		return new DSELegacyPass();
}		}

llvm/test/Transforms/DeadStoreElimination/MSSA/2011-09-06-EndOfFunction.ll

				; XFAIL: *
	; RUN: opt -dse -enable-dse-memoryssa -S < %s \| FileCheck %s			; RUN: opt -dse -enable-dse-memoryssa -S < %s \| FileCheck %s

	target datalayout = "e-p:64:64:64-i1:8:8-i8:8:8-i16:16:16-i32:32:32-i64:64:64-f32:32:32-f64:64:64-v64:64:64-v128:128:128-a0:0:64-s0:64:64-f80:128:128-n8:16:32:64"			target datalayout = "e-p:64:64:64-i1:8:8-i8:8:8-i16:16:16-i32:32:32-i64:64:64-f32:32:32-f64:64:64-v64:64:64-v128:128:128-a0:0:64-s0:64:64-f80:128:128-n8:16:32:64"
	target triple = "x86_64-apple-darwin"			target triple = "x86_64-apple-darwin"

	%"class.std::auto_ptr" = type { i32* }			%"class.std::auto_ptr" = type { i32* }

	; CHECK-LABEL: @_Z3foov(			; CHECK-LABEL: @_Z3foov(
	Show All 15 Lines

llvm/test/Transforms/DeadStoreElimination/MSSA/OverwriteStoreBegin.ll

	; NOTE: Assertions have been autogenerated by utils/update_test_checks.py			; NOTE: Assertions have been autogenerated by utils/update_test_checks.py
				; XFAIL: *
	; RUN: opt < %s -basicaa -dse -enable-dse-memoryssa -S \| FileCheck %s			; RUN: opt < %s -basicaa -dse -enable-dse-memoryssa -S \| FileCheck %s

	define void @write4to7(i32* nocapture %p) {			define void @write4to7(i32* nocapture %p) {
	; CHECK-LABEL: @write4to7(			; CHECK-LABEL: @write4to7(
	; CHECK-NEXT: entry:			; CHECK-NEXT: entry:
	; CHECK-NEXT: [[ARRAYIDX0:%.]] = getelementptr inbounds i32, i32 [[P:%.*]], i64 1			; CHECK-NEXT: [[ARRAYIDX0:%.]] = getelementptr inbounds i32, i32 [[P:%.*]], i64 1
	; CHECK-NEXT: [[P3:%.]] = bitcast i32 [[ARRAYIDX0]] to i8*			; CHECK-NEXT: [[P3:%.]] = bitcast i32 [[ARRAYIDX0]] to i8*
	; CHECK-NEXT: [[TMP0:%.]] = getelementptr inbounds i8, i8 [[P3]], i64 4			; CHECK-NEXT: [[TMP0:%.]] = getelementptr inbounds i8, i8 [[P3]], i64 4
	▲ Show 20 Lines • Show All 384 Lines • Show Last 20 Lines

llvm/test/Transforms/DeadStoreElimination/MSSA/OverwriteStoreEnd.ll

	; NOTE: Assertions have been autogenerated by utils/update_test_checks.py			; NOTE: Assertions have been autogenerated by utils/update_test_checks.py
				; XFAIL: *
	; RUN: opt < %s -basicaa -dse -enable-dse-memoryssa -S \| FileCheck %s			; RUN: opt < %s -basicaa -dse -enable-dse-memoryssa -S \| FileCheck %s
	target datalayout = "e-p:64:64:64-i1:8:8-i8:8:8-i16:16:16-i32:32:32-i64:64:64-f32:32:32-f64:64:64-v64:64:64-v128:128:128-a0:0:64-s0:64:64-f80:128:128-n8:16:32:64-S128"			target datalayout = "e-p:64:64:64-i1:8:8-i8:8:8-i16:16:16-i32:32:32-i64:64:64-f32:32:32-f64:64:64-v64:64:64-v128:128:128-a0:0:64-s0:64:64-f80:128:128-n8:16:32:64-S128"

	%struct.vec2 = type { <4 x i32>, <4 x i32> }			%struct.vec2 = type { <4 x i32>, <4 x i32> }
	%struct.vec2plusi = type { <4 x i32>, <4 x i32>, i32 }			%struct.vec2plusi = type { <4 x i32>, <4 x i32>, i32 }

	@glob1 = global %struct.vec2 zeroinitializer, align 16			@glob1 = global %struct.vec2 zeroinitializer, align 16
	@glob2 = global %struct.vec2plusi zeroinitializer, align 16			@glob2 = global %struct.vec2plusi zeroinitializer, align 16
	▲ Show 20 Lines • Show All 381 Lines • Show Last 20 Lines

llvm/test/Transforms/DeadStoreElimination/MSSA/atomic.ll

				; XFAIL: *
	; RUN: opt -basicaa -dse -enable-dse-memoryssa -S < %s \| FileCheck %s			; RUN: opt -basicaa -dse -enable-dse-memoryssa -S < %s \| FileCheck %s

	target datalayout = "e-p:64:64:64-i1:8:8-i8:8:8-i16:16:16-i32:32:32-i64:64:64-f32:32:32-f64:64:64-v64:64:64-v128:128:128-a0:0:64-s0:64:64-f80:128:128-n8:16:32:64"			target datalayout = "e-p:64:64:64-i1:8:8-i8:8:8-i16:16:16-i32:32:32-i64:64:64-f32:32:32-f64:64:64-v64:64:64-v128:128:128-a0:0:64-s0:64:64-f80:128:128-n8:16:32:64"
	target triple = "x86_64-apple-macosx10.7.0"			target triple = "x86_64-apple-macosx10.7.0"

	; Sanity tests for atomic stores.			; Sanity tests for atomic stores.
	; Note that it turns out essentially every transformation DSE does is legal on			; Note that it turns out essentially every transformation DSE does is legal on
	; atomic ops, just some transformations are not allowed across release-acquire pairs.			; atomic ops, just some transformations are not allowed across release-acquire pairs.
	▲ Show 20 Lines • Show All 124 Lines • Show Last 20 Lines

llvm/test/Transforms/DeadStoreElimination/MSSA/calloc-store.ll

				; XFAIL: *

	; RUN: opt < %s -basicaa -dse -enable-dse-memoryssa -S \| FileCheck %s			; RUN: opt < %s -basicaa -dse -enable-dse-memoryssa -S \| FileCheck %s

	declare noalias i8* @calloc(i64, i64)			declare noalias i8* @calloc(i64, i64)

	define i32* @test1() {			define i32* @test1() {
	; CHECK-LABEL: test1			; CHECK-LABEL: test1
	%1 = tail call noalias i8* @calloc(i64 1, i64 4)			%1 = tail call noalias i8* @calloc(i64 1, i64 4)
	%2 = bitcast i8* %1 to i32*			%2 = bitcast i8* %1 to i32*
	▲ Show 20 Lines • Show All 57 Lines • Show Last 20 Lines

llvm/test/Transforms/DeadStoreElimination/MSSA/fence-todo.ll

This file was copied from llvm/test/Transforms/DeadStoreElimination/MSSA/fence.ll.

	; RUN: opt -S -basicaa -dse -enable-dse-memoryssa < %s \| FileCheck %s			; XFAIL: *

	; We conservative choose to prevent dead store elimination
	; across release or stronger fences. It's not required
	; (since the must still be a race on %addd.i), but
	; it is conservatively correct. A legal optimization
	; could hoist the second store above the fence, and then
	; DSE one of them.
	define void @test1(i32* %addr.i) {
	; CHECK-LABEL: @test1
	; CHECK: store i32 5
	; CHECK: fence
	; CHECK: store i32 5
	; CHECK: ret
	store i32 5, i32* %addr.i, align 4
	fence release
	store i32 5, i32* %addr.i, align 4
	ret void
	}

	; Same as previous, but with different values. If we ever optimize
	; this more aggressively, this allows us to check that the correct
	; store is retained (the 'i32 1' store in this case)
	define void @test1b(i32* %addr.i) {
	; CHECK-LABEL: @test1b
	; CHECK: store i32 42
	; CHECK: fence release
	; CHECK: store i32 1
	; CHECK: ret
	store i32 42, i32* %addr.i, align 4
	fence release
	store i32 1, i32* %addr.i, align 4
	ret void
	}

	; We could DSE across this fence, but don't. No other thread can			; RUN: opt -S -basicaa -dse -enable-dse-memoryssa < %s \| FileCheck %s
	; observe the order of the acquire fence and the store.
	define void @test2(i32* %addr.i) {
	; CHECK-LABEL: @test2
	; CHECK: store
	; CHECK: fence
	; CHECK: store
	; CHECK: ret
	store i32 5, i32* %addr.i, align 4
	fence acquire
	store i32 5, i32* %addr.i, align 4
	ret void
	}

	; We DSE stack alloc'ed and byval locations, in the presence of fences.			; We DSE stack alloc'ed and byval locations, in the presence of fences.
	; Fence does not make an otherwise thread local store visible.			; Fence does not make an otherwise thread local store visible.
	; Right now the DSE in presence of fence is only done in end blocks (with no successors),			; Right now the DSE in presence of fence is only done in end blocks (with no successors),
	; but the same logic applies to other basic blocks as well.			; but the same logic applies to other basic blocks as well.
	; The store to %addr.i can be removed since it is a byval attribute			; The store to %addr.i can be removed since it is a byval attribute
	define void @test3(i32* byval %addr.i) {			define void @test3(i32* byval %addr.i) {
	; CHECK-LABEL: @test3			; CHECK-LABEL: @test3
	Show All 31 Lines
	; CHECK-NEXT: fence seq_cst			; CHECK-NEXT: fence seq_cst
	; CHECK-NEXT: ret void			; CHECK-NEXT: ret void
	%P1 = alloca i32			%P1 = alloca i32
	store i32 0, i32* %P1, align 4			store i32 0, i32* %P1, align 4
	fence seq_cst			fence seq_cst
	store i32 4, i32* %P1, align 4			store i32 4, i32* %P1, align 4
	ret void			ret void
	}			}

llvm/test/Transforms/DeadStoreElimination/MSSA/fence.ll

This file was copied to llvm/test/Transforms/DeadStoreElimination/MSSA/fence-todo.ll.

	Show All 40 Lines
	; CHECK: fence			; CHECK: fence
	; CHECK: store			; CHECK: store
	; CHECK: ret			; CHECK: ret
	store i32 5, i32* %addr.i, align 4			store i32 5, i32* %addr.i, align 4
	fence acquire			fence acquire
	store i32 5, i32* %addr.i, align 4			store i32 5, i32* %addr.i, align 4
	ret void			ret void
	}			}

	; We DSE stack alloc'ed and byval locations, in the presence of fences.
	; Fence does not make an otherwise thread local store visible.
	; Right now the DSE in presence of fence is only done in end blocks (with no successors),
	; but the same logic applies to other basic blocks as well.
	; The store to %addr.i can be removed since it is a byval attribute
	define void @test3(i32* byval %addr.i) {
	; CHECK-LABEL: @test3
	; CHECK-NOT: store
	; CHECK: fence
	; CHECK: ret
	store i32 5, i32* %addr.i, align 4
	fence release
	ret void
	}

	declare void @foo(i8* nocapture %p)

	declare noalias i8* @malloc(i32)

	; DSE of stores in locations allocated through library calls.
	define void @test_nocapture() {
	; CHECK-LABEL: @test_nocapture
	; CHECK: malloc
	; CHECK: foo
	; CHECK-NOT: store
	; CHECK: fence
	%m = call i8* @malloc(i32 24)
	call void @foo(i8* %m)
	store i8 4, i8* %m
	fence release
	ret void
	}


	; This is a full fence, but it does not make a thread local store visible.
	; We can DSE the store in presence of the fence.
	define void @fence_seq_cst() {
	; CHECK-LABEL: @fence_seq_cst
	; CHECK-NEXT: fence seq_cst
	; CHECK-NEXT: ret void
	%P1 = alloca i32
	store i32 0, i32* %P1, align 4
	fence seq_cst
	store i32 4, i32* %P1, align 4
	ret void
	}

llvm/test/Transforms/DeadStoreElimination/MSSA/free.ll

				; XFAIL: *

	; RUN: opt < %s -basicaa -dse -enable-dse-memoryssa -S \| FileCheck %s			; RUN: opt < %s -basicaa -dse -enable-dse-memoryssa -S \| FileCheck %s

	target datalayout = "e-p:64:64:64"			target datalayout = "e-p:64:64:64"

	declare void @free(i8* nocapture)			declare void @free(i8* nocapture)
	declare noalias i8* @malloc(i64)			declare noalias i8* @malloc(i64)

	; CHECK-LABEL: @test(			; CHECK-LABEL: @test(
	▲ Show 20 Lines • Show All 62 Lines • Show Last 20 Lines

llvm/test/Transforms/DeadStoreElimination/MSSA/inst-limits.ll

; RUN: opt -S -dse -enable-dse-memoryssa < %s \| FileCheck %s		; RUN: opt -S -dse -enable-dse-memoryssa < %s \| FileCheck %s
target datalayout = "e-m:e-i64:64-f80:128-n8:16:32:64-S128"		target datalayout = "e-m:e-i64:64-f80:128-n8:16:32:64-S128"

; If there are two stores to the same location, DSE should be able to remove		; This test is not relevant for DSE with MemorySSA. Non-memory instructions
; the first store if the two stores are separated by no more than 98		; are ignored anyways. The limits for the MemorySSA traversal are tested in
; instructions. The existence of debug intrinsics between the stores should		; llvm/test/Transforms/DeadStoreElimination/MSSA/memoryssa-scan-limit.ll
; not affect this instruction limit.

@x = global i32 0, align 4		@x = global i32 0, align 4

; Function Attrs: nounwind		; Function Attrs: nounwind
define i32 @test_within_limit() !dbg !4 {		define i32 @test_within_limit() !dbg !4 {
entry:		entry:
; The first store; later there is a second store to the same location,		; The first store; later there is a second store to the same location,
; so this store should be optimized away by DSE.		; so this store should be optimized away by DSE.
▲ Show 20 Lines • Show All 108 Lines • ▼ Show 20 Lines	entry:
store i32 -1, i32* @x, align 4		store i32 -1, i32* @x, align 4
ret i32 0		ret i32 0
}		}

; Function Attrs: nounwind		; Function Attrs: nounwind
define i32 @test_outside_limit() {		define i32 @test_outside_limit() {
entry:		entry:
; The first store; later there is a second store to the same location		; The first store; later there is a second store to the same location
; CHECK: store i32 1, i32* @x, align 4		; CHECK-NOT: store i32 1, i32* @x, align 4
store i32 1, i32* @x, align 4		store i32 1, i32* @x, align 4

; Insert 99 dummy instructions between the two stores; this is		; Insert 99 dummy instructions between the two stores; this is
; one too many instruction for the DSE to take place.		; one too many instruction for the DSE to take place.
%0 = bitcast i32 0 to i32		%0 = bitcast i32 0 to i32
%1 = bitcast i32 0 to i32		%1 = bitcast i32 0 to i32
%2 = bitcast i32 0 to i32		%2 = bitcast i32 0 to i32
%3 = bitcast i32 0 to i32		%3 = bitcast i32 0 to i32
▲ Show 20 Lines • Show All 121 Lines • Show Last 20 Lines

llvm/test/Transforms/DeadStoreElimination/MSSA/lifetime.ll

				; XFAIL: *

	; RUN: opt -S -basicaa -dse -enable-dse-memoryssa < %s \| FileCheck %s			; RUN: opt -S -basicaa -dse -enable-dse-memoryssa < %s \| FileCheck %s

	target datalayout = "E-p:64:64:64-a0:0:8-f32:32:32-f64:64:64-i1:8:8-i8:8:8-i16:16:16-i32:32:32-i64:32:64-v64:64:64-v128:128:128"			target datalayout = "E-p:64:64:64-a0:0:8-f32:32:32-f64:64:64-i1:8:8-i8:8:8-i16:16:16-i32:32:32-i64:32:64-v64:64:64-v128:128:128"

	declare void @llvm.lifetime.start.p0i8(i64, i8* nocapture) nounwind			declare void @llvm.lifetime.start.p0i8(i64, i8* nocapture) nounwind
	declare void @llvm.lifetime.end.p0i8(i64, i8* nocapture) nounwind			declare void @llvm.lifetime.end.p0i8(i64, i8* nocapture) nounwind
	declare void @llvm.memset.p0i8.i8(i8* nocapture, i8, i8, i1) nounwind			declare void @llvm.memset.p0i8.i8(i8* nocapture, i8, i8, i1) nounwind

	Show All 29 Lines

llvm/test/Transforms/DeadStoreElimination/MSSA/memcpy-complete-overwrite.ll

	; NOTE: Assertions have been autogenerated by utils/update_test_checks.py			; NOTE: Assertions have been autogenerated by utils/update_test_checks.py

				; XFAIL: *
	; RUN: opt < %s -basicaa -dse -enable-dse-memoryssa -S \| FileCheck %s			; RUN: opt < %s -basicaa -dse -enable-dse-memoryssa -S \| FileCheck %s
	; RUN: opt < %s -aa-pipeline=basic-aa -passes=dse -enable-dse-memoryssa -S \| FileCheck %s			; RUN: opt < %s -aa-pipeline=basic-aa -passes=dse -enable-dse-memoryssa -S \| FileCheck %s
	target datalayout = "E-p:64:64:64-a0:0:8-f32:32:32-f64:64:64-i1:8:8-i8:8:8-i16:16:16-i32:32:32-i64:32:64-v64:64:64-v128:128:128"			target datalayout = "E-p:64:64:64-a0:0:8-f32:32:32-f64:64:64-i1:8:8-i8:8:8-i16:16:16-i32:32:32-i64:32:64-v64:64:64-v128:128:128"

	declare void @llvm.memset.p0i8.i64(i8* nocapture, i8, i64, i1) nounwind			declare void @llvm.memset.p0i8.i64(i8* nocapture, i8, i64, i1) nounwind
	declare void @llvm.memset.element.unordered.atomic.p0i8.i64(i8* nocapture, i8, i64, i32) nounwind			declare void @llvm.memset.element.unordered.atomic.p0i8.i64(i8* nocapture, i8, i64, i32) nounwind
	declare void @llvm.memcpy.p0i8.p0i8.i64(i8* nocapture, i8* nocapture, i64, i1) nounwind			declare void @llvm.memcpy.p0i8.p0i8.i64(i8* nocapture, i8* nocapture, i64, i1) nounwind
	declare void @llvm.memcpy.element.unordered.atomic.p0i8.p0i8.i64(i8* nocapture, i8* nocapture, i64, i32) nounwind			declare void @llvm.memcpy.element.unordered.atomic.p0i8.p0i8.i64(i8* nocapture, i8* nocapture, i64, i32) nounwind
	▲ Show 20 Lines • Show All 178 Lines • Show Last 20 Lines

llvm/test/Transforms/DeadStoreElimination/MSSA/memintrinsics.ll

	; NOTE: Assertions have been autogenerated by utils/update_test_checks.py			; NOTE: Assertions have been autogenerated by utils/update_test_checks.py
				; XFAIL: *
	; RUN: opt -S -dse -enable-dse-memoryssa < %s \| FileCheck %s			; RUN: opt -S -dse -enable-dse-memoryssa < %s \| FileCheck %s

	declare void @llvm.memcpy.p0i8.p0i8.i8(i8* nocapture, i8* nocapture, i8, i1) nounwind			declare void @llvm.memcpy.p0i8.p0i8.i8(i8* nocapture, i8* nocapture, i8, i1) nounwind
	declare void @llvm.memmove.p0i8.p0i8.i8(i8* nocapture, i8* nocapture, i8, i1) nounwind			declare void @llvm.memmove.p0i8.p0i8.i8(i8* nocapture, i8* nocapture, i8, i1) nounwind
	declare void @llvm.memset.p0i8.i8(i8* nocapture, i8, i8, i1) nounwind			declare void @llvm.memset.p0i8.i8(i8* nocapture, i8, i8, i1) nounwind

	define void @test1() {			define void @test1() {
	; CHECK-LABEL: @test1(			; CHECK-LABEL: @test1(
	▲ Show 20 Lines • Show All 88 Lines • Show Last 20 Lines

llvm/test/Transforms/DeadStoreElimination/MSSA/memoryssa-scan-limit.ll

This file was added.

				; NOTE: Assertions have been autogenerated by utils/update_test_checks.py
				; RUN: opt < %s -basicaa -dse -enable-dse-memoryssa -S \| FileCheck --check-prefix=NO-LIMIT %s
				; RUN: opt < %s -basicaa -dse -enable-dse-memoryssa -dse-memoryssa-scanlimit=0 -S \| FileCheck --check-prefix=LIMIT-0 %s
				; RUN: opt < %s -basicaa -dse -enable-dse-memoryssa -dse-memoryssa-scanlimit=3 -S \| FileCheck --check-prefix=LIMIT-3 %s
				; RUN: opt < %s -basicaa -dse -enable-dse-memoryssa -dse-memoryssa-scanlimit=4 -S \| FileCheck --check-prefix=LIMIT-4 %s

				target datalayout = "e-m:e-p:32:32-i64:64-v128:64:128-a:0:32-n32-S64"


				define void @test2(i32* noalias %P, i32* noalias %Q, i32* noalias %R) {
				;
				; NO-LIMIT-LABEL: @test2(
				; NO-LIMIT-NEXT: br i1 true, label [[BB1:%.]], label [[BB2:%.]]
				; NO-LIMIT: bb1:
				; NO-LIMIT-NEXT: br label [[BB3:%.*]]
				; NO-LIMIT: bb2:
				; NO-LIMIT-NEXT: br label [[BB3]]
				; NO-LIMIT: bb3:
				; NO-LIMIT-NEXT: store i32 0, i32* [[Q:%.*]]
				; NO-LIMIT-NEXT: store i32 0, i32* [[R:%.*]]
				; NO-LIMIT-NEXT: store i32 0, i32* [[P:%.*]]
				; NO-LIMIT-NEXT: ret void
				;
				; LIMIT-0-LABEL: @test2(
				; LIMIT-0-NEXT: br i1 true, label [[BB1:%.]], label [[BB2:%.]]
				; LIMIT-0: bb1:
				; LIMIT-0-NEXT: br label [[BB3:%.*]]
				; LIMIT-0: bb2:
				; LIMIT-0-NEXT: br label [[BB3]]
				; LIMIT-0: bb3:
				; LIMIT-0-NEXT: store i32 0, i32* [[Q:%.*]]
				; LIMIT-0-NEXT: store i32 0, i32* [[R:%.*]]
				; LIMIT-0-NEXT: store i32 0, i32* [[P:%.*]]
				; LIMIT-0-NEXT: ret void
				;
				; LIMIT-3-LABEL: @test2(
				; LIMIT-3-NEXT: store i32 1, i32* [[P:%.*]]
				; LIMIT-3-NEXT: br i1 true, label [[BB1:%.]], label [[BB2:%.]]
				; LIMIT-3: bb1:
				; LIMIT-3-NEXT: br label [[BB3:%.*]]
				; LIMIT-3: bb2:
				; LIMIT-3-NEXT: br label [[BB3]]
				; LIMIT-3: bb3:
				; LIMIT-3-NEXT: store i32 0, i32* [[Q:%.*]]
				; LIMIT-3-NEXT: store i32 0, i32* [[R:%.*]]
				; LIMIT-3-NEXT: store i32 0, i32* [[P]]
				; LIMIT-3-NEXT: ret void
				;
				; LIMIT-4-LABEL: @test2(
				; LIMIT-4-NEXT: br i1 true, label [[BB1:%.]], label [[BB2:%.]]
				; LIMIT-4: bb1:
				; LIMIT-4-NEXT: br label [[BB3:%.*]]
				; LIMIT-4: bb2:
				; LIMIT-4-NEXT: br label [[BB3]]
				; LIMIT-4: bb3:
				; LIMIT-4-NEXT: store i32 0, i32* [[Q:%.*]]
				; LIMIT-4-NEXT: store i32 0, i32* [[R:%.*]]
				; LIMIT-4-NEXT: store i32 0, i32* [[P:%.*]]
				; LIMIT-4-NEXT: ret void
				;
				store i32 1, i32* %P
				br i1 true, label %bb1, label %bb2
				bb1:
				br label %bb3
				bb2:
				br label %bb3
				bb3:
				store i32 0, i32* %Q
				store i32 0, i32* %R
				store i32 0, i32* %P
				ret void
				}

llvm/test/Transforms/DeadStoreElimination/MSSA/memset-and-memcpy.ll

Show First 20 Lines • Show All 53 Lines • ▼ Show 20 Lines	;
tail call void @llvm.memset.element.unordered.atomic.p0i8.i64(i8* align 1 %P, i8 42, i64 8, i32 1)		tail call void @llvm.memset.element.unordered.atomic.p0i8.i64(i8* align 1 %P, i8 42, i64 8, i32 1)
tail call void @llvm.memcpy.p0i8.p0i8.i64(i8* align 1 %P, i8* align 1 %Q, i64 12, i1 false)		tail call void @llvm.memcpy.p0i8.p0i8.i64(i8* align 1 %P, i8* align 1 %Q, i64 12, i1 false)
ret void		ret void
}		}

; Should not delete the volatile memset.		; Should not delete the volatile memset.
define void @test17v(i8* %P, i8* %Q) nounwind ssp {		define void @test17v(i8* %P, i8* %Q) nounwind ssp {
; CHECK-LABEL: @test17v(		; CHECK-LABEL: @test17v(
; CHECK-NEXT: tail call void @llvm.memset.p0i8.i64(i8* [[P:%.*]], i8 42, i64 8, i1 true)		; CHECK-NEXT: tail call void @llvm.memcpy.p0i8.p0i8.i64(i8* [[P:%.]], i8 [[Q:%.*]], i64 12, i1 false)
; CHECK-NEXT: tail call void @llvm.memcpy.p0i8.p0i8.i64(i8* [[P]], i8* [[Q:%.*]], i64 12, i1 false)
; CHECK-NEXT: ret void		; CHECK-NEXT: ret void
;		;
tail call void @llvm.memset.p0i8.i64(i8* %P, i8 42, i64 8, i1 true)		tail call void @llvm.memset.p0i8.i64(i8* %P, i8 42, i64 8, i1 true)
tail call void @llvm.memcpy.p0i8.p0i8.i64(i8* %P, i8* %Q, i64 12, i1 false)		tail call void @llvm.memcpy.p0i8.p0i8.i64(i8* %P, i8* %Q, i64 12, i1 false)
ret void		ret void
}		}

; According to the current LangRef, memcpy's source and destination cannot		; According to the current LangRef, memcpy's source and destination cannot
; overlap, hence the first memcpy is dead.		; overlap, hence the first memcpy is dead.
;		;
; Previously this was not allowed (PR8728), also discussed in PR11763.		; Previously this was not allowed (PR8728), also discussed in PR11763.
define void @test18(i8* %P, i8* %Q, i8* %R) nounwind ssp {		define void @test18(i8* %P, i8* %Q, i8* %R) nounwind ssp {
; CHECK-LABEL: @test18(		; CHECK-LABEL: @test18(
; CHECK-NEXT: tail call void @llvm.memcpy.p0i8.p0i8.i64(i8* [[P:%.]], i8 [[Q:%.*]], i64 12, i1 false)		; CHECK-NEXT: tail call void @llvm.memcpy.p0i8.p0i8.i64(i8* [[P:%.]], i8 [[R:%.*]], i64 12, i1 false)
; CHECK-NEXT: tail call void @llvm.memcpy.p0i8.p0i8.i64(i8* [[P]], i8* [[R:%.*]], i64 12, i1 false)
; CHECK-NEXT: ret void		; CHECK-NEXT: ret void
;		;
tail call void @llvm.memcpy.p0i8.p0i8.i64(i8* %P, i8* %Q, i64 12, i1 false)		tail call void @llvm.memcpy.p0i8.p0i8.i64(i8* %P, i8* %Q, i64 12, i1 false)
tail call void @llvm.memcpy.p0i8.p0i8.i64(i8* %P, i8* %R, i64 12, i1 false)		tail call void @llvm.memcpy.p0i8.p0i8.i64(i8* %P, i8* %R, i64 12, i1 false)
ret void		ret void
}		}

define void @test18_atomic(i8* %P, i8* %Q, i8* %R) nounwind ssp {		define void @test18_atomic(i8* %P, i8* %Q, i8* %R) nounwind ssp {
; CHECK-LABEL: @test18_atomic(		; CHECK-LABEL: @test18_atomic(
; CHECK-NEXT: tail call void @llvm.memcpy.element.unordered.atomic.p0i8.p0i8.i64(i8* align 1 [[P:%.]], i8 align 1 [[Q:%.*]], i64 12, i32 1)		; CHECK-NEXT: tail call void @llvm.memcpy.element.unordered.atomic.p0i8.p0i8.i64(i8* align 1 [[P:%.]], i8 align 1 [[R:%.*]], i64 12, i32 1)
; CHECK-NEXT: tail call void @llvm.memcpy.element.unordered.atomic.p0i8.p0i8.i64(i8* align 1 [[P]], i8* align 1 [[R:%.*]], i64 12, i32 1)
; CHECK-NEXT: ret void		; CHECK-NEXT: ret void
;		;
tail call void @llvm.memcpy.element.unordered.atomic.p0i8.p0i8.i64(i8* align 1 %P, i8* align 1 %Q, i64 12, i32 1)		tail call void @llvm.memcpy.element.unordered.atomic.p0i8.p0i8.i64(i8* align 1 %P, i8* align 1 %Q, i64 12, i32 1)
tail call void @llvm.memcpy.element.unordered.atomic.p0i8.p0i8.i64(i8* align 1 %P, i8* align 1 %R, i64 12, i32 1)		tail call void @llvm.memcpy.element.unordered.atomic.p0i8.p0i8.i64(i8* align 1 %P, i8* align 1 %R, i64 12, i32 1)
ret void		ret void
}		}

llvm/test/Transforms/DeadStoreElimination/MSSA/memset-missing-debugloc.ll

	; Test that the getelementptr generated when the dse pass determines that			; Test that the getelementptr generated when the dse pass determines that
	; a memset can be shortened has the debugloc carried over from the memset.			; a memset can be shortened has the debugloc carried over from the memset.

				; XFAIL: *
	; RUN: opt -S -march=native -dse -enable-dse-memoryssa < %s\| FileCheck %s			; RUN: opt -S -march=native -dse -enable-dse-memoryssa < %s\| FileCheck %s
	; CHECK: bitcast [5 x i64]* %{{[a-zA-Z_][a-zA-Z0-9_]}} to i8, !dbg			; CHECK: bitcast [5 x i64]* %{{[a-zA-Z_][a-zA-Z0-9_]}} to i8, !dbg
	; CHECK-NEXT: %{{[0-9]+}} = getelementptr inbounds i8, i8* %0, i64 32, !dbg ![[DBG:[0-9]+]]			; CHECK-NEXT: %{{[0-9]+}} = getelementptr inbounds i8, i8* %0, i64 32, !dbg ![[DBG:[0-9]+]]
	; CHECK: ![[DBG]] = !DILocation(line: 2,			; CHECK: ![[DBG]] = !DILocation(line: 2,

	; The test IR is generated by running:			; The test IR is generated by running:
	;			;
	; clang Debugify_Dead_Store_Elimination.cpp -Wno-c++11-narrowing -S \			; clang Debugify_Dead_Store_Elimination.cpp -Wno-c++11-narrowing -S \
	▲ Show 20 Lines • Show All 78 Lines • Show Last 20 Lines

llvm/test/Transforms/DeadStoreElimination/MSSA/merge-stores-big-endian.ll

	; NOTE: Assertions have been autogenerated by utils/update_test_checks.py			; NOTE: Assertions have been autogenerated by utils/update_test_checks.py
				; XFAIL: *
	; RUN: opt -dse -enable-dse-memoryssa -enable-dse-partial-store-merging -S < %s \| FileCheck %s			; RUN: opt -dse -enable-dse-memoryssa -enable-dse-partial-store-merging -S < %s \| FileCheck %s
	target datalayout = "E-m:e-i64:64-i128:128-n32:64-S128"			target datalayout = "E-m:e-i64:64-i128:128-n32:64-S128"

	define void @byte_by_byte_replacement(i32 *%ptr) {			define void @byte_by_byte_replacement(i32 *%ptr) {
	; CHECK-LABEL: @byte_by_byte_replacement(			; CHECK-LABEL: @byte_by_byte_replacement(
	; CHECK-NEXT: entry:			; CHECK-NEXT: entry:
	; CHECK-NEXT: store i32 151653132, i32* [[PTR:%.*]]			; CHECK-NEXT: store i32 151653132, i32* [[PTR:%.*]]
	; CHECK-NEXT: ret void			; CHECK-NEXT: ret void
	▲ Show 20 Lines • Show All 163 Lines • Show Last 20 Lines

llvm/test/Transforms/DeadStoreElimination/MSSA/merge-stores.ll

	; NOTE: Assertions have been autogenerated by utils/update_test_checks.py			; NOTE: Assertions have been autogenerated by utils/update_test_checks.py
				; XFAIL: *
	; RUN: opt -dse -enable-dse-memoryssa -enable-dse-partial-store-merging -S < %s \| FileCheck %s			; RUN: opt -dse -enable-dse-memoryssa -enable-dse-partial-store-merging -S < %s \| FileCheck %s
	target datalayout = "e-p:64:64:64-i1:8:8-i8:8:8-i16:16:16-i32:32:32-i64:64:64-f32:32:32-f64:64:64-v64:64:64-v128:128:128-a0:0:64-s0:64:64-f80:128:128-f128:128:128-n8:16:32:64"			target datalayout = "e-p:64:64:64-i1:8:8-i8:8:8-i16:16:16-i32:32:32-i64:64:64-f32:32:32-f64:64:64-v64:64:64-v128:128:128-a0:0:64-s0:64:64-f80:128:128-f128:128:128-n8:16:32:64"

	define void @byte_by_byte_replacement(i32 *%ptr) {			define void @byte_by_byte_replacement(i32 *%ptr) {
	; CHECK-LABEL: @byte_by_byte_replacement(			; CHECK-LABEL: @byte_by_byte_replacement(
	; CHECK-NEXT: entry:			; CHECK-NEXT: entry:
	; CHECK-NEXT: store i32 202050057, i32* [[PTR:%.*]]			; CHECK-NEXT: store i32 202050057, i32* [[PTR:%.*]]
	; CHECK-NEXT: ret void			; CHECK-NEXT: ret void
	▲ Show 20 Lines • Show All 227 Lines • Show Last 20 Lines

llvm/test/Transforms/DeadStoreElimination/MSSA/multiblock-captures.ll

Show All 19 Lines	;
store i8 1, i8* %m		store i8 1, i8* %m
ret i8* %m		ret i8* %m
}		}

; Same as @test_return_captures_1, but across BBs.		; Same as @test_return_captures_1, but across BBs.
define i8* @test_return_captures_2() {		define i8* @test_return_captures_2() {
; CHECK-LABEL: @test_return_captures_2(		; CHECK-LABEL: @test_return_captures_2(
; CHECK-NEXT: [[M:%.]] = call i8 @malloc(i64 24)		; CHECK-NEXT: [[M:%.]] = call i8 @malloc(i64 24)
; CHECK-NEXT: store i8 0, i8* [[M]]
; CHECK-NEXT: br label [[EXIT:%.*]]		; CHECK-NEXT: br label [[EXIT:%.*]]
; CHECK: exit:		; CHECK: exit:
; CHECK-NEXT: store i8 1, i8* [[M]]		; CHECK-NEXT: store i8 1, i8* [[M]]
; CHECK-NEXT: ret i8* [[M]]		; CHECK-NEXT: ret i8* [[M]]
;		;
%m = call i8* @malloc(i64 24)		%m = call i8* @malloc(i64 24)
store i8 0, i8* %m		store i8 0, i8* %m
br label %exit		br label %exit
▲ Show 20 Lines • Show All 49 Lines • ▼ Show 20 Lines	exit:
ret i8* %m		ret i8* %m
}		}

; We can remove the first store store i8 0, i8* %m because there are no throwing		; We can remove the first store store i8 0, i8* %m because there are no throwing
; instructions between the 2 stores and also %m escapes after the killing store.		; instructions between the 2 stores and also %m escapes after the killing store.
define i8* @test_malloc_capture_3() {		define i8* @test_malloc_capture_3() {
; CHECK-LABEL: @test_malloc_capture_3(		; CHECK-LABEL: @test_malloc_capture_3(
; CHECK-NEXT: [[M:%.]] = call i8 @malloc(i64 24)		; CHECK-NEXT: [[M:%.]] = call i8 @malloc(i64 24)
; CHECK-NEXT: store i8 0, i8* [[M]]
; CHECK-NEXT: br label [[EXIT:%.*]]		; CHECK-NEXT: br label [[EXIT:%.*]]
; CHECK: exit:		; CHECK: exit:
; CHECK-NEXT: store i8 1, i8* [[M]]		; CHECK-NEXT: store i8 1, i8* [[M]]
; CHECK-NEXT: call void @capture(i8* [[M]])		; CHECK-NEXT: call void @capture(i8* [[M]])
; CHECK-NEXT: ret i8* [[M]]		; CHECK-NEXT: ret i8* [[M]]
;		;
%m = call i8* @malloc(i64 24)		%m = call i8* @malloc(i64 24)
store i8 0, i8* %m		store i8 0, i8* %m
▲ Show 20 Lines • Show All 83 Lines • ▼ Show 20 Lines	exit:
ret i8* %m		ret i8* %m
}		}

; We can remove the first store 'store i8 0, i8* %m' even though there is a		; We can remove the first store 'store i8 0, i8* %m' even though there is a
; throwing instruction between them, because %m escapes after the killing store.		; throwing instruction between them, because %m escapes after the killing store.
define i8* @test_malloc_capture_7() {		define i8* @test_malloc_capture_7() {
; CHECK-LABEL: @test_malloc_capture_7(		; CHECK-LABEL: @test_malloc_capture_7(
; CHECK-NEXT: [[M:%.]] = call i8 @malloc(i64 24)		; CHECK-NEXT: [[M:%.]] = call i8 @malloc(i64 24)
; CHECK-NEXT: store i8 0, i8* [[M]]
; CHECK-NEXT: call void @may_throw()		; CHECK-NEXT: call void @may_throw()
; CHECK-NEXT: br label [[EXIT:%.*]]		; CHECK-NEXT: br label [[EXIT:%.*]]
; CHECK: exit:		; CHECK: exit:
; CHECK-NEXT: store i8 1, i8* [[M]]		; CHECK-NEXT: store i8 1, i8* [[M]]
; CHECK-NEXT: call void @capture(i8* [[M]])		; CHECK-NEXT: call void @capture(i8* [[M]])
; CHECK-NEXT: ret i8* [[M]]		; CHECK-NEXT: ret i8* [[M]]
;		;

%m = call i8* @malloc(i64 24)		%m = call i8* @malloc(i64 24)
store i8 0, i8* %m		store i8 0, i8* %m
call void @may_throw()		call void @may_throw()
br label %exit		br label %exit

exit:		exit:
store i8 1, i8* %m		store i8 1, i8* %m
call void @capture(i8* %m)		call void @capture(i8* %m)
ret i8* %m		ret i8* %m
}		}
; TODO: Remove store in exit.		; TODO: Remove store in exit.
; Stores to stack objects can be eliminated if they are not captured inside the function.		; Stores to stack objects can be eliminated if they are not captured inside the function.
define void @test_alloca_nocapture_1() {		define void @test_alloca_nocapture_1() {
; CHECK-LABEL: @test_alloca_nocapture_1(		; CHECK-LABEL: @test_alloca_nocapture_1(
; CHECK-NEXT: [[M:%.*]] = alloca i8		; CHECK-NEXT: [[M:%.*]] = alloca i8
; CHECK-NEXT: store i8 0, i8* [[M]]
; CHECK-NEXT: call void @foo()		; CHECK-NEXT: call void @foo()
; CHECK-NEXT: br label [[EXIT:%.*]]		; CHECK-NEXT: br label [[EXIT:%.*]]
; CHECK: exit:		; CHECK: exit:
		; CHECK-NEXT: store i8 1, i8* [[M]]
; CHECK-NEXT: ret void		; CHECK-NEXT: ret void
;		;
%m = alloca i8		%m = alloca i8
store i8 0, i8* %m		store i8 0, i8* %m
call void @foo()		call void @foo()
br label %exit		br label %exit

exit:		exit:
store i8 1, i8* %m		store i8 1, i8* %m
ret void		ret void
}		}

; TODO: Remove store in exit.		; TODO: Remove store in exit.
; Cannot remove first store i8 0, i8* %m, as the call to @capture captures the object.		; Cannot remove first store i8 0, i8* %m, as the call to @capture captures the object.
define void @test_alloca_capture_1() {		define void @test_alloca_capture_1() {
; CHECK-LABEL: @test_alloca_capture_1(		; CHECK-LABEL: @test_alloca_capture_1(
; CHECK-NEXT: [[M:%.*]] = alloca i8		; CHECK-NEXT: [[M:%.*]] = alloca i8
; CHECK-NEXT: store i8 0, i8* [[M]]		; CHECK-NEXT: store i8 0, i8* [[M]]
; CHECK-NEXT: call void @capture(i8* [[M]])		; CHECK-NEXT: call void @capture(i8* [[M]])
; CHECK-NEXT: br label [[EXIT:%.*]]		; CHECK-NEXT: br label [[EXIT:%.*]]
; CHECK: exit:		; CHECK: exit:
		; CHECK-NEXT: store i8 1, i8* [[M]]
; CHECK-NEXT: ret void		; CHECK-NEXT: ret void
;		;
%m = alloca i8		%m = alloca i8
store i8 0, i8* %m		store i8 0, i8* %m
call void @capture(i8* %m)		call void @capture(i8* %m)
br label %exit		br label %exit

exit:		exit:
store i8 1, i8* %m		store i8 1, i8* %m
ret void		ret void
}		}

; TODO: Remove store at exit.		; TODO: Remove store at exit.
; We can remove the last store to %m, even though it escapes because the alloca		; We can remove the last store to %m, even though it escapes because the alloca
; becomes invalid after the function returns.		; becomes invalid after the function returns.
define void @test_alloca_capture_2(%S1* %E) {		define void @test_alloca_capture_2(%S1* %E) {
; CHECK-LABEL: @test_alloca_capture_2(		; CHECK-LABEL: @test_alloca_capture_2(
; CHECK-NEXT: [[M:%.*]] = alloca i8		; CHECK-NEXT: [[M:%.*]] = alloca i8
; CHECK-NEXT: br label [[EXIT:%.*]]		; CHECK-NEXT: br label [[EXIT:%.*]]
; CHECK: exit:		; CHECK: exit:
; CHECK-NEXT: [[F_PTR:%.]] = getelementptr [[S1:%.]], %S1* [[E:%.*]], i32 0, i32 0		; CHECK-NEXT: [[F_PTR:%.]] = getelementptr [[S1:%.]], %S1* [[E:%.*]], i32 0, i32 0
; CHECK-NEXT: store i8* [[M]], i8** [[F_PTR]]		; CHECK-NEXT: store i8* [[M]], i8** [[F_PTR]]
		; CHECK-NEXT: store i8 1, i8* [[M]]
; CHECK-NEXT: ret void		; CHECK-NEXT: ret void
;		;
%m = alloca i8		%m = alloca i8
br label %exit		br label %exit

exit:		exit:
%f.ptr = getelementptr %S1, %S1* %E, i32 0, i32 0		%f.ptr = getelementptr %S1, %S1* %E, i32 0, i32 0
store i8* %m, i8** %f.ptr		store i8* %m, i8** %f.ptr
▲ Show 20 Lines • Show All 51 Lines • Show Last 20 Lines

llvm/test/Transforms/DeadStoreElimination/MSSA/multiblock-exceptions.ll

	Show All 24 Lines
	; CHECK: catch:			; CHECK: catch:
	; CHECK-NEXT: [[C:%.*]] = catchpad within [[CS1]] []			; CHECK-NEXT: [[C:%.*]] = catchpad within [[CS1]] []
	; CHECK-NEXT: [[LV:%.]] = load i32, i32 [[SV]]			; CHECK-NEXT: [[LV:%.]] = load i32, i32 [[SV]]
	; CHECK-NEXT: br label [[EXIT]]			; CHECK-NEXT: br label [[EXIT]]
	; CHECK: cleanup:			; CHECK: cleanup:
	; CHECK-NEXT: [[C1:%.*]] = cleanuppad within none []			; CHECK-NEXT: [[C1:%.*]] = cleanuppad within none []
	; CHECK-NEXT: br label [[EXIT]]			; CHECK-NEXT: br label [[EXIT]]
	; CHECK: exit:			; CHECK: exit:
				; CHECK-NEXT: store i32 40, i32* [[SV]]
	; CHECK-NEXT: ret void			; CHECK-NEXT: ret void
	;			;
	block1:			block1:
	%sv = alloca i32			%sv = alloca i32
	br label %block2			br label %block2

	block2:			block2:
	store i32 20, i32* %sv			store i32 20, i32* %sv
	Show All 23 Lines

llvm/test/Transforms/DeadStoreElimination/MSSA/multiblock-loops.ll

Show First 20 Lines • Show All 105 Lines • ▼ Show 20 Lines
; CHECK-NEXT: br i1 [[CMP27]], label [[FOR_BODY4_LR_PH_PREHEADER:%.]], label [[FOR_COND_CLEANUP:%.]]		; CHECK-NEXT: br i1 [[CMP27]], label [[FOR_BODY4_LR_PH_PREHEADER:%.]], label [[FOR_COND_CLEANUP:%.]]
; CHECK: for.body4.lr.ph.preheader:		; CHECK: for.body4.lr.ph.preheader:
; CHECK-NEXT: br label [[FOR_BODY4_LR_PH:%.*]]		; CHECK-NEXT: br label [[FOR_BODY4_LR_PH:%.*]]
; CHECK: for.cond.cleanup:		; CHECK: for.cond.cleanup:
; CHECK-NEXT: ret void		; CHECK-NEXT: ret void
; CHECK: for.body4.lr.ph:		; CHECK: for.body4.lr.ph:
; CHECK-NEXT: [[I_028:%.]] = phi i32 [ [[INC11:%.]], [[FOR_COND_CLEANUP3:%.*]] ], [ 0, [[FOR_BODY4_LR_PH_PREHEADER]] ]		; CHECK-NEXT: [[I_028:%.]] = phi i32 [ [[INC11:%.]], [[FOR_COND_CLEANUP3:%.*]] ], [ 0, [[FOR_BODY4_LR_PH_PREHEADER]] ]
; CHECK-NEXT: [[ARRAYIDX:%.]] = getelementptr inbounds i32, i32 [[B:%.*]], i32 [[I_028]]		; CHECK-NEXT: [[ARRAYIDX:%.]] = getelementptr inbounds i32, i32 [[B:%.*]], i32 [[I_028]]
; CHECK-NEXT: store i32 0, i32* [[ARRAYIDX]], align 4
; CHECK-NEXT: [[MUL:%.*]] = mul nsw i32 [[I_028]], [[N]]		; CHECK-NEXT: [[MUL:%.*]] = mul nsw i32 [[I_028]], [[N]]
; CHECK-NEXT: br label [[FOR_BODY4:%.*]]		; CHECK-NEXT: br label [[FOR_BODY4:%.*]]
; CHECK: for.body4:		; CHECK: for.body4:
; CHECK-NEXT: [[TMP0:%.]] = phi i32 [ 0, [[FOR_BODY4_LR_PH]] ], [ [[ADD9:%.]], [[FOR_BODY4]] ]		; CHECK-NEXT: [[TMP0:%.]] = phi i32 [ 0, [[FOR_BODY4_LR_PH]] ], [ [[ADD9:%.]], [[FOR_BODY4]] ]
; CHECK-NEXT: [[J_026:%.]] = phi i32 [ 0, [[FOR_BODY4_LR_PH]] ], [ [[INC:%.]], [[FOR_BODY4]] ]		; CHECK-NEXT: [[J_026:%.]] = phi i32 [ 0, [[FOR_BODY4_LR_PH]] ], [ [[INC:%.]], [[FOR_BODY4]] ]
; CHECK-NEXT: [[ADD:%.*]] = add nsw i32 [[J_026]], [[MUL]]		; CHECK-NEXT: [[ADD:%.*]] = add nsw i32 [[J_026]], [[MUL]]
; CHECK-NEXT: [[ARRAYIDX5:%.]] = getelementptr inbounds i32, i32 [[A:%.*]], i32 [[ADD]]		; CHECK-NEXT: [[ARRAYIDX5:%.]] = getelementptr inbounds i32, i32 [[A:%.*]], i32 [[ADD]]
; CHECK-NEXT: [[TMP1:%.]] = load i32, i32 [[ARRAYIDX5]], align 4		; CHECK-NEXT: [[TMP1:%.]] = load i32, i32 [[ARRAYIDX5]], align 4
▲ Show 20 Lines • Show All 42 Lines • ▼ Show 20 Lines	for.body4: ; preds = %for.body4, %for.body4.lr.ph
br i1 %exitcond, label %for.cond.cleanup3, label %for.body4		br i1 %exitcond, label %for.cond.cleanup3, label %for.body4

for.cond.cleanup3: ; preds = %for.body4		for.cond.cleanup3: ; preds = %for.body4
store i32 %add9, i32* %arrayidx, align 4		store i32 %add9, i32* %arrayidx, align 4
%inc11 = add nuw nsw i32 %i.028, 1		%inc11 = add nuw nsw i32 %i.028, 1
%exitcond29 = icmp eq i32 %inc11, %N		%exitcond29 = icmp eq i32 %inc11, %N
br i1 %exitcond29, label %for.cond.cleanup, label %for.body4.lr.ph		br i1 %exitcond29, label %for.cond.cleanup, label %for.body4.lr.ph
}		}

		declare i1 @cond() readnone

		; TODO: We can eliminate the store in for.header, but we currently hit a MemoryPhi.
		define void @loop_multiple_def_uses(i32* noalias %P) {
		; CHECK-LABEL: @loop_multiple_def_uses(
		; CHECK-NEXT: entry:
		; CHECK-NEXT: br label [[FOR_HEADER:%.*]]
		; CHECK: for.header:
		; CHECK-NEXT: store i32 1, i32* [[P:%.*]], align 4
		; CHECK-NEXT: [[C1:%.*]] = call i1 @cond()
		; CHECK-NEXT: br i1 [[C1]], label [[FOR_BODY:%.]], label [[END:%.]]
		; CHECK: for.body:
		; CHECK-NEXT: store i32 1, i32* [[P]], align 4
		; CHECK-NEXT: [[LV:%.]] = load i32, i32 [[P]]
		; CHECK-NEXT: br label [[FOR_HEADER]]
		; CHECK: end:
		; CHECK-NEXT: store i32 3, i32* [[P]], align 4
		; CHECK-NEXT: ret void
		;
		entry:
		br label %for.header

		for.header:
		store i32 1, i32* %P, align 4
		%c1 = call i1 @cond()
		br i1 %c1, label %for.body, label %end

		for.body:
		store i32 1, i32* %P, align 4
		%lv = load i32, i32* %P
		br label %for.header

		end:
		store i32 3, i32* %P, align 4
		ret void
		}

		; We cannot eliminate the store in for.header, as it is only partially
		; overwritten in for.body and read afterwards.
		define void @loop_multiple_def_uses_partial_write(i32* noalias %p) {
		; CHECK-LABEL: @loop_multiple_def_uses_partial_write(
		; CHECK-NEXT: entry:
		; CHECK-NEXT: br label [[FOR_HEADER:%.*]]
		; CHECK: for.header:
		; CHECK-NEXT: store i32 1239491, i32* [[P:%.*]], align 4
		; CHECK-NEXT: [[C1:%.*]] = call i1 @cond()
		; CHECK-NEXT: br i1 [[C1]], label [[FOR_BODY:%.]], label [[END:%.]]
		; CHECK: for.body:
		; CHECK-NEXT: [[C:%.]] = bitcast i32 [[P]] to i8*
		; CHECK-NEXT: store i8 1, i8* [[C]], align 4
		; CHECK-NEXT: [[LV:%.]] = load i32, i32 [[P]]
		; CHECK-NEXT: br label [[FOR_HEADER]]
		; CHECK: end:
		; CHECK-NEXT: store i32 3, i32* [[P]], align 4
		; CHECK-NEXT: ret void
		;
		entry:
		br label %for.header

		for.header:
		store i32 1239491, i32* %p, align 4
		%c1 = call i1 @cond()
		br i1 %c1, label %for.body, label %end

		for.body:
		%c = bitcast i32* %p to i8*
		store i8 1, i8* %c, align 4
		%lv = load i32, i32* %p
		br label %for.header

		end:
		store i32 3, i32* %p, align 4
		ret void
		}

		; We cannot eliminate the store in for.header, as the location is not overwritten
		; in for.body and read afterwards.
		define void @loop_multiple_def_uses_mayalias_write(i32* %p, i32* %q) {

		; CHECK-LABEL: @loop_multiple_def_uses_mayalias_write(
		; CHECK-NEXT: entry:
		; CHECK-NEXT: br label [[FOR_HEADER:%.*]]
		; CHECK: for.header:
		; CHECK-NEXT: store i32 1239491, i32* [[P:%.*]], align 4
		; CHECK-NEXT: [[C1:%.*]] = call i1 @cond()
		; CHECK-NEXT: br i1 [[C1]], label [[FOR_BODY:%.]], label [[END:%.]]
		; CHECK: for.body:
		; CHECK-NEXT: store i32 1, i32* [[Q:%.*]], align 4
		; CHECK-NEXT: [[LV:%.]] = load i32, i32 [[P]]
		; CHECK-NEXT: br label [[FOR_HEADER]]
		; CHECK: end:
		; CHECK-NEXT: store i32 3, i32* [[P]], align 4
		; CHECK-NEXT: ret void
		;
		entry:
		br label %for.header

		for.header:
		store i32 1239491, i32* %p, align 4
		%c1 = call i1 @cond()
		br i1 %c1, label %for.body, label %end

		for.body:
		store i32 1, i32* %q, align 4
		%lv = load i32, i32* %p
		br label %for.header

		end:
		store i32 3, i32* %p, align 4
		ret void
		}

llvm/test/Transforms/DeadStoreElimination/MSSA/multiblock-memoryphis.ll

	Show First 20 Lines • Show All 97 Lines • ▼ Show 20 Lines
	bb1:			bb1:
	br label %bb3			br label %bb3
	bb2:			bb2:
	br label %bb3			br label %bb3
	bb3:			bb3:
	store i8 1, i8* %P2			store i8 1, i8* %P2
	ret void			ret void
	}			}

				declare void @hoge()

				; Check a function with a MemoryPhi with 3 incoming values.
				define void @widget(i32* %Ptr, i1 %c1, i1 %c2, i32 %v1, i32 %v2, i32 %v3) {
				; CHECK-LABEL: @widget(
				; CHECK-NEXT: bb:
				; CHECK-NEXT: tail call void @hoge()
				; CHECK-NEXT: br i1 [[C1:%.]], label [[BB3:%.]], label [[BB1:%.*]]
				; CHECK: bb1:
				; CHECK-NEXT: br i1 [[C2:%.]], label [[BB2:%.]], label [[BB3]]
				; CHECK: bb2:
				; CHECK-NEXT: store i32 -1, i32* [[PTR:%.*]], align 4
				; CHECK-NEXT: br label [[BB3]]
				; CHECK: bb3:
				; CHECK-NEXT: br label [[BB4:%.*]]
				; CHECK: bb4:
				; CHECK-NEXT: switch i32 [[V1:%.]], label [[BB8:%.]] [
				; CHECK-NEXT: i32 0, label [[BB5:%.*]]
				; CHECK-NEXT: i32 1, label [[BB6:%.*]]
				; CHECK-NEXT: i32 2, label [[BB7:%.*]]
				; CHECK-NEXT: ]
				; CHECK: bb5:
				; CHECK-NEXT: store i32 0, i32* [[PTR]], align 4
				; CHECK-NEXT: br label [[BB8]]
				; CHECK: bb6:
				; CHECK-NEXT: store i32 1, i32* [[PTR]], align 4
				; CHECK-NEXT: br label [[BB8]]
				; CHECK: bb7:
				; CHECK-NEXT: store i32 2, i32* [[PTR]], align 4
				; CHECK-NEXT: br label [[BB8]]
				; CHECK: bb8:
				; CHECK-NEXT: br label [[BB4]]
				;
				bb:
				tail call void @hoge()
				br i1 %c1, label %bb3, label %bb1

				bb1: ; preds = %bb
				br i1 %c2, label %bb2, label %bb3

				bb2: ; preds = %bb1
				store i32 -1, i32* %Ptr, align 4
				br label %bb3

				bb3: ; preds = %bb2, %bb1, %bb
				br label %bb4

				bb4: ; preds = %bb8, %bb3
				switch i32 %v1, label %bb8 [
				i32 0, label %bb5
				i32 1, label %bb6
				i32 2, label %bb7
				]

				bb5: ; preds = %bb4
				store i32 0, i32* %Ptr, align 4
				br label %bb8

				bb6: ; preds = %bb4
				store i32 1, i32* %Ptr, align 4
				br label %bb8

				bb7: ; preds = %bb4
				store i32 2, i32* %Ptr, align 4
				br label %bb8

				bb8: ; preds = %bb7, %bb6, %bb5, %bb4
				br label %bb4
				}

llvm/test/Transforms/DeadStoreElimination/MSSA/multiblock-partial.ll

Show All 26 Lines	bb3:
%P.i16 = bitcast i32* %P to i16*		%P.i16 = bitcast i32* %P to i16*
store i16 0, i16* %P.i16		store i16 0, i16* %P.i16
ret void		ret void
}		}


define void @second_store_bigger(i32* noalias %P) {		define void @second_store_bigger(i32* noalias %P) {
; CHECK-LABEL: @second_store_bigger(		; CHECK-LABEL: @second_store_bigger(
; CHECK-NEXT: store i32 1, i32* [[P:%.*]]
; CHECK-NEXT: br i1 true, label [[BB1:%.]], label [[BB2:%.]]		; CHECK-NEXT: br i1 true, label [[BB1:%.]], label [[BB2:%.]]
; CHECK: bb1:		; CHECK: bb1:
; CHECK-NEXT: br label [[BB3:%.*]]		; CHECK-NEXT: br label [[BB3:%.*]]
; CHECK: bb2:		; CHECK: bb2:
; CHECK-NEXT: br label [[BB3]]		; CHECK-NEXT: br label [[BB3]]
; CHECK: bb3:		; CHECK: bb3:
; CHECK-NEXT: [[P_I64:%.]] = bitcast i32 [[P]] to i64*		; CHECK-NEXT: [[P_I64:%.]] = bitcast i32 [[P:%.]] to i64
; CHECK-NEXT: store i64 0, i64* [[P_I64]]		; CHECK-NEXT: store i64 0, i64* [[P_I64]]
; CHECK-NEXT: ret void		; CHECK-NEXT: ret void
;		;
store i32 1, i32* %P		store i32 1, i32* %P
br i1 true, label %bb1, label %bb2		br i1 true, label %bb1, label %bb2
bb1:		bb1:
br label %bb3		br label %bb3
bb2:		bb2:
br label %bb3		br label %bb3
bb3:		bb3:
%P.i64 = bitcast i32* %P to i64*		%P.i64 = bitcast i32* %P to i64*
store i64 0, i64* %P.i64		store i64 0, i64* %P.i64
ret void		ret void
}		}

llvm/test/Transforms/DeadStoreElimination/MSSA/multiblock-simple.ll

; NOTE: Assertions have been autogenerated by utils/update_test_checks.py		; NOTE: Assertions have been autogenerated by utils/update_test_checks.py
; RUN: opt < %s -basicaa -dse -enable-dse-memoryssa -S \| FileCheck %s		; RUN: opt < %s -basicaa -dse -enable-dse-memoryssa -S \| FileCheck %s

target datalayout = "e-m:e-p:32:32-i64:64-v128:64:128-a:0:32-n32-S64"		target datalayout = "e-m:e-p:32:32-i64:64-v128:64:128-a:0:32-n32-S64"


define void @test2(i32* noalias %P) {		define void @test2(i32* noalias %P) {
; CHECK-LABEL: @test2(		; CHECK-LABEL: @test2(
; CHECK-NEXT: store i32 1, i32* [[P:%.*]]
; CHECK-NEXT: br i1 true, label [[BB1:%.]], label [[BB2:%.]]		; CHECK-NEXT: br i1 true, label [[BB1:%.]], label [[BB2:%.]]
; CHECK: bb1:		; CHECK: bb1:
; CHECK-NEXT: br label [[BB3:%.*]]		; CHECK-NEXT: br label [[BB3:%.*]]
; CHECK: bb2:		; CHECK: bb2:
; CHECK-NEXT: br label [[BB3]]		; CHECK-NEXT: br label [[BB3]]
; CHECK: bb3:		; CHECK: bb3:
; CHECK-NEXT: store i32 0, i32* [[P]]		; CHECK-NEXT: store i32 0, i32* [[P:%.*]]
; CHECK-NEXT: ret void		; CHECK-NEXT: ret void
;		;
store i32 1, i32* %P		store i32 1, i32* %P
br i1 true, label %bb1, label %bb2		br i1 true, label %bb1, label %bb2
bb1:		bb1:
br label %bb3		br label %bb3
bb2:		bb2:
br label %bb3		br label %bb3
Show All 23 Lines	bb2:
br label %bb3		br label %bb3
bb3:		bb3:
ret void		ret void
}		}


define void @test7(i32* noalias %P, i32* noalias %Q) {		define void @test7(i32* noalias %P, i32* noalias %Q) {
; CHECK-LABEL: @test7(		; CHECK-LABEL: @test7(
; CHECK-NEXT: store i32 1, i32* [[Q:%.*]]
; CHECK-NEXT: br i1 true, label [[BB1:%.]], label [[BB2:%.]]		; CHECK-NEXT: br i1 true, label [[BB1:%.]], label [[BB2:%.]]
; CHECK: bb1:		; CHECK: bb1:
; CHECK-NEXT: [[TMP1:%.]] = load i32, i32 [[P:%.*]]		; CHECK-NEXT: [[TMP1:%.]] = load i32, i32 [[P:%.*]]
; CHECK-NEXT: br label [[BB3:%.*]]		; CHECK-NEXT: br label [[BB3:%.*]]
; CHECK: bb2:		; CHECK: bb2:
; CHECK-NEXT: br label [[BB3]]		; CHECK-NEXT: br label [[BB3]]
; CHECK: bb3:		; CHECK: bb3:
; CHECK-NEXT: store i32 0, i32* [[Q]]		; CHECK-NEXT: store i32 0, i32* [[Q:%.*]]
; CHECK-NEXT: store i32 0, i32* [[P]]		; CHECK-NEXT: store i32 0, i32* [[P]]
; CHECK-NEXT: ret void		; CHECK-NEXT: ret void
;		;
store i32 1, i32* %Q		store i32 1, i32* %Q
br i1 true, label %bb1, label %bb2		br i1 true, label %bb1, label %bb2
bb1:		bb1:
load i32, i32* %P		load i32, i32* %P
br label %bb3		br label %bb3
Show All 36 Lines
bb1:		bb1:
br label %bb3		br label %bb3
bb2:		bb2:
ret void		ret void
bb3:		bb3:
store i32 0, i32* %P		store i32 0, i32* %P
ret void		ret void
}		}

		; We cannot eliminate `store i32 0, i32* %P`, as it is read by the later load.
		; Make sure that we check the uses of `store i32 1, i32* %P.1 which does not
		; alias %P. Note that uses point to the first def that may alias.
		define void @overlapping_read(i32* %P) {
		; CHECK-LABEL: @overlapping_read(
		; CHECK-NEXT: store i32 0, i32* [[P:%.*]]
		; CHECK-NEXT: [[P_1:%.]] = getelementptr i32, i32 [[P]], i32 1
		; CHECK-NEXT: store i32 1, i32* [[P_1]]
		; CHECK-NEXT: [[P_64:%.]] = bitcast i32 [[P]] to i64*
		; CHECK-NEXT: [[LV:%.]] = load i64, i64 [[P_64]]
		; CHECK-NEXT: br i1 true, label [[BB1:%.]], label [[BB2:%.]]
		; CHECK: bb1:
		; CHECK-NEXT: br label [[BB3:%.*]]
		; CHECK: bb2:
		; CHECK-NEXT: ret void
		; CHECK: bb3:
		; CHECK-NEXT: store i32 2, i32* [[P]]
		; CHECK-NEXT: ret void
		;
		store i32 0, i32* %P
		%P.1 = getelementptr i32, i32* %P, i32 1
		store i32 1, i32* %P.1

		%P.64 = bitcast i32* %P to i64*
		%lv = load i64, i64* %P.64
		br i1 true, label %bb1, label %bb2
		bb1:
		br label %bb3
		bb2:
		ret void
		bb3:
		store i32 2, i32* %P
		ret void
		}

llvm/test/Transforms/DeadStoreElimination/MSSA/operand-bundles.ll

				; XFAIL: *
	; RUN: opt < %s -basicaa -dse -enable-dse-memoryssa -S \| FileCheck %s			; RUN: opt < %s -basicaa -dse -enable-dse-memoryssa -S \| FileCheck %s

	declare noalias i8* @malloc(i64) "malloc-like"			declare noalias i8* @malloc(i64) "malloc-like"

	declare void @foo()			declare void @foo()
	declare void @bar(i8*)			declare void @bar(i8*)

	define void @test() {			define void @test() {
	▲ Show 20 Lines • Show All 47 Lines • Show Last 20 Lines

llvm/test/Transforms/DeadStoreElimination/MSSA/simple-todo.ll

	; NOTE: Assertions have been autogenerated by utils/update_test_checks.py			; NOTE: Assertions have been autogenerated by utils/update_test_checks.py
				; XFAIL: *
	; RUN: opt < %s -basicaa -dse -enable-dse-memoryssa -S \| FileCheck %s			; RUN: opt < %s -basicaa -dse -enable-dse-memoryssa -S \| FileCheck %s
	; RUN: opt < %s -aa-pipeline=basic-aa -passes=dse -enable-dse-memoryssa -S \| FileCheck %s			; RUN: opt < %s -aa-pipeline=basic-aa -passes=dse -enable-dse-memoryssa -S \| FileCheck %s
	target datalayout = "E-p:64:64:64-a0:0:8-f32:32:32-f64:64:64-i1:8:8-i8:8:8-i16:16:16-i32:32:32-i64:32:64-v64:64:64-v128:128:128"			target datalayout = "E-p:64:64:64-a0:0:8-f32:32:32-f64:64:64-i1:8:8-i8:8:8-i16:16:16-i32:32:32-i64:32:64-v64:64:64-v128:128:128"

	declare void @llvm.memset.p0i8.i64(i8* nocapture, i8, i64, i1) nounwind			declare void @llvm.memset.p0i8.i64(i8* nocapture, i8, i64, i1) nounwind
	declare void @llvm.memset.element.unordered.atomic.p0i8.i64(i8* nocapture, i8, i64, i32) nounwind			declare void @llvm.memset.element.unordered.atomic.p0i8.i64(i8* nocapture, i8, i64, i32) nounwind
	declare void @llvm.memcpy.p0i8.p0i8.i64(i8* nocapture, i8* nocapture, i64, i1) nounwind			declare void @llvm.memcpy.p0i8.p0i8.i64(i8* nocapture, i8* nocapture, i64, i1) nounwind
	declare void @llvm.memcpy.element.unordered.atomic.p0i8.p0i8.i64(i8* nocapture, i8* nocapture, i64, i32) nounwind			declare void @llvm.memcpy.element.unordered.atomic.p0i8.p0i8.i64(i8* nocapture, i8* nocapture, i64, i32) nounwind
	▲ Show 20 Lines • Show All 409 Lines • Show Last 20 Lines

llvm/test/Transforms/DeadStoreElimination/MSSA/simple.ll

Show First 20 Lines • Show All 288 Lines • ▼ Show 20 Lines	;
tail call void @llvm.memcpy.element.unordered.atomic.p0i8.p0i8.i64(i8* align 1 %P, i8* align 1 %Q, i64 12, i32 1)		tail call void @llvm.memcpy.element.unordered.atomic.p0i8.p0i8.i64(i8* align 1 %P, i8* align 1 %Q, i64 12, i32 1)
tail call void @llvm.memmove.element.unordered.atomic.p0i8.p0i8.i64(i8* align 1 %P, i8* align 1 %R, i64 12, i32 1)		tail call void @llvm.memmove.element.unordered.atomic.p0i8.p0i8.i64(i8* align 1 %P, i8* align 1 %R, i64 12, i32 1)
ret void		ret void
}		}

; The memmove is dead, because memcpy arguments cannot overlap.		; The memmove is dead, because memcpy arguments cannot overlap.
define void @test38(i8* %P, i8* %Q, i8* %R) {		define void @test38(i8* %P, i8* %Q, i8* %R) {
; CHECK-LABEL: @test38(		; CHECK-LABEL: @test38(
; CHECK-NEXT: tail call void @llvm.memmove.p0i8.p0i8.i64(i8* [[P:%.]], i8 [[Q:%.*]], i64 12, i1 false)		; CHECK-NEXT: tail call void @llvm.memcpy.p0i8.p0i8.i64(i8* [[P:%.]], i8 [[R:%.*]], i64 12, i1 false)
; CHECK-NEXT: tail call void @llvm.memcpy.p0i8.p0i8.i64(i8* [[P]], i8* [[R:%.*]], i64 12, i1 false)
; CHECK-NEXT: ret void		; CHECK-NEXT: ret void
;		;

tail call void @llvm.memmove.p0i8.p0i8.i64(i8* %P, i8* %Q, i64 12, i1 false)		tail call void @llvm.memmove.p0i8.p0i8.i64(i8* %P, i8* %Q, i64 12, i1 false)
tail call void @llvm.memcpy.p0i8.p0i8.i64(i8* %P, i8* %R, i64 12, i1 false)		tail call void @llvm.memcpy.p0i8.p0i8.i64(i8* %P, i8* %R, i64 12, i1 false)
ret void		ret void
}		}

; The memmove is dead, because memcpy arguments cannot overlap.		; The memmove is dead, because memcpy arguments cannot overlap.
define void @test38_atomic(i8* %P, i8* %Q, i8* %R) {		define void @test38_atomic(i8* %P, i8* %Q, i8* %R) {
; CHECK-LABEL: @test38_atomic(		; CHECK-LABEL: @test38_atomic(
; CHECK-NEXT: tail call void @llvm.memmove.element.unordered.atomic.p0i8.p0i8.i64(i8* align 1 [[P:%.]], i8 align 1 [[Q:%.*]], i64 12, i32 1)		; CHECK-NEXT: tail call void @llvm.memcpy.element.unordered.atomic.p0i8.p0i8.i64(i8* align 1 [[P:%.]], i8 align 1 [[R:%.*]], i64 12, i32 1)
; CHECK-NEXT: tail call void @llvm.memcpy.element.unordered.atomic.p0i8.p0i8.i64(i8* align 1 [[P]], i8* align 1 [[R:%.*]], i64 12, i32 1)
; CHECK-NEXT: ret void		; CHECK-NEXT: ret void
;		;

tail call void @llvm.memmove.element.unordered.atomic.p0i8.p0i8.i64(i8* align 1 %P, i8* align 1 %Q, i64 12, i32 1)		tail call void @llvm.memmove.element.unordered.atomic.p0i8.p0i8.i64(i8* align 1 %P, i8* align 1 %Q, i64 12, i32 1)
tail call void @llvm.memcpy.element.unordered.atomic.p0i8.p0i8.i64(i8* align 1 %P, i8* align 1 %R, i64 12, i32 1)		tail call void @llvm.memcpy.element.unordered.atomic.p0i8.p0i8.i64(i8* align 1 %P, i8* align 1 %R, i64 12, i32 1)
ret void		ret void
}		}

▲ Show 20 Lines • Show All 54 Lines • ▼ Show 20 Lines
}		}

; I think this case is currently handled incorrectly by memdeps dse		; I think this case is currently handled incorrectly by memdeps dse
; throwing should leave store i32 1, not remove from the free.		; throwing should leave store i32 1, not remove from the free.
declare void @free(i8* nocapture)		declare void @free(i8* nocapture)
define void @test41(i32* noalias %P) {		define void @test41(i32* noalias %P) {
; CHECK-LABEL: @test41(		; CHECK-LABEL: @test41(
; CHECK-NEXT: [[P2:%.]] = bitcast i32 [[P:%.]] to i8		; CHECK-NEXT: [[P2:%.]] = bitcast i32 [[P:%.]] to i8
		; CHECK-NEXT: store i32 1, i32* [[P]]
; CHECK-NEXT: call void @unknown_func()		; CHECK-NEXT: call void @unknown_func()
		; CHECK-NEXT: store i32 2, i32* [[P]]
; CHECK-NEXT: call void @free(i8* [[P2]])		; CHECK-NEXT: call void @free(i8* [[P2]])
; CHECK-NEXT: ret void		; CHECK-NEXT: ret void
;		;
%P2 = bitcast i32* %P to i8*		%P2 = bitcast i32* %P to i8*
store i32 1, i32* %P		store i32 1, i32* %P
call void @unknown_func()		call void @unknown_func()
store i32 2, i32* %P		store i32 2, i32* %P
call void @free(i8* %P2)		call void @free(i8* %P2)
Show All 32 Lines

This is an archive of the discontinued LLVM Phabricator instance.

[DSE] Add first version of MemorySSA-backed DSE (Bottom up walk).ClosedPublic

Details

Diff Detail

Event Timeline

Revision Contents

Diff 243504

llvm/lib/Transforms/Scalar/DeadStoreElimination.cpp

llvm/test/Transforms/DeadStoreElimination/MSSA/2011-09-06-EndOfFunction.ll

llvm/test/Transforms/DeadStoreElimination/MSSA/OverwriteStoreBegin.ll

llvm/test/Transforms/DeadStoreElimination/MSSA/OverwriteStoreEnd.ll

llvm/test/Transforms/DeadStoreElimination/MSSA/atomic.ll

llvm/test/Transforms/DeadStoreElimination/MSSA/calloc-store.ll

llvm/test/Transforms/DeadStoreElimination/MSSA/fence-todo.ll

llvm/test/Transforms/DeadStoreElimination/MSSA/fence.ll

llvm/test/Transforms/DeadStoreElimination/MSSA/free.ll

llvm/test/Transforms/DeadStoreElimination/MSSA/inst-limits.ll

llvm/test/Transforms/DeadStoreElimination/MSSA/lifetime.ll

llvm/test/Transforms/DeadStoreElimination/MSSA/memcpy-complete-overwrite.ll

llvm/test/Transforms/DeadStoreElimination/MSSA/memintrinsics.ll

llvm/test/Transforms/DeadStoreElimination/MSSA/memoryssa-scan-limit.ll

llvm/test/Transforms/DeadStoreElimination/MSSA/memset-and-memcpy.ll

llvm/test/Transforms/DeadStoreElimination/MSSA/memset-missing-debugloc.ll

llvm/test/Transforms/DeadStoreElimination/MSSA/merge-stores-big-endian.ll

llvm/test/Transforms/DeadStoreElimination/MSSA/merge-stores.ll

llvm/test/Transforms/DeadStoreElimination/MSSA/multiblock-captures.ll

llvm/test/Transforms/DeadStoreElimination/MSSA/multiblock-exceptions.ll

llvm/test/Transforms/DeadStoreElimination/MSSA/multiblock-loops.ll

llvm/test/Transforms/DeadStoreElimination/MSSA/multiblock-memoryphis.ll

llvm/test/Transforms/DeadStoreElimination/MSSA/multiblock-partial.ll

llvm/test/Transforms/DeadStoreElimination/MSSA/multiblock-simple.ll

llvm/test/Transforms/DeadStoreElimination/MSSA/operand-bundles.ll

llvm/test/Transforms/DeadStoreElimination/MSSA/simple-todo.ll

llvm/test/Transforms/DeadStoreElimination/MSSA/simple.ll

[DSE] Add first version of MemorySSA-backed DSE (Bottom up walk).
ClosedPublic