This is an archive of the discontinued LLVM Phabricator instance.

Paths

Table of Contentst

-
llvm/
-
lib/Transforms/Scalar/
-
Transforms/
-
Scalar/
8/15
DeadStoreElimination.cpp
-
test/Transforms/DeadStoreElimination/MSSA/
-
Transforms/
-
DeadStoreElimination/
-
MSSA/
1/2
memset-missing-debugloc.ll
-
multiblock-malloc-free.ll
-
multiblock-multipath.ll
-
multiblock-overlap.ll
-
multiblock-simple.ll
-
simple.ll

Differential D73763

[DSE] Lift post-dominance restriction.
AbandonedPublic

Authored by fhahn on Jan 30 2020, 6:41 PM.

Download Raw Diff

Details

Reviewers

dmgreen
bryant
asbirlea
Tyker
efriedma
george.burgess.iv

Summary

To eliminate a store, we must ensure that it is not accessed before the
killing def on any path to a function exit. Currently we use
post-dominance to ensure that the killing def is executed on each path
from the killed def. But this excludes cases where there are no other
reads before another overwriting def on other paths. An example can be
found below. The first store (store i32 1, i32* %P) is dead, but neither
of the killing stores post-dominate it.

This patch adds support for such cases in a relatively straight-forward
way. When checking for read clobbers, we already traverse explore all
uses. If there are any reads before clobbering writes along any path to
an exit, we will check them. We can stop adding uses to the worklist for
MemoryDefs that completely overwrite the original location.

The only remaining problem is accesses after returning from the
function. But I think we can make those reads explicit in the MemorySSA
by introducing an additonal MemoryUse that clobbers all locations
visible to the caller at each exit. With those additional MemoryUses,
the existing algorithms should handle those cases naturally as well.

Currently the patch uses a hack to create those MemoryUses: it just
introduces a readonly function call at the end of each function and
treats it as clobbering all objects may visible to the caller. If the
solution makes sense, I'll find a proper solution for the hack.

define void @test4(i32* noalias %P, i1 %c1) {

store i32 1, i32* %P
br i1 %c1, label %bb1, label %bb2

bb1:

store i32 0, i32* %P
br label %bb5

bb2:

store i32 3, i32* %P
br label %bb5

bb5:

call void @use(i32* %P)
ret void

}

Diff Detail

Repository: rG LLVM Github Monorepo

Event Timeline

fhahn created this revision.Jan 30 2020, 6:41 PM

Herald added a project: Restricted Project. · View Herald TranscriptJan 30 2020, 6:41 PM

Herald added subscribers: george.burgess.iv, hiraditya. · View Herald Transcript

Unit tests: unknown.

clang-tidy: unknown.

clang-format: unknown.

Build artifacts: diff.json, console-log.txt

Pre-merge checks is in beta. Report issue. Please join beta or enable it for your project.

Harbormaster failed remote builds in B45411: Diff 241628!Jan 30 2020, 6:42 PM

fhahn mentioned this in D72700: [DSE] Add first version of MemorySSA-backed DSE (Bottom up walk)..Jan 30 2020, 6:45 PM

Currently the patch uses a hack to create those MemoryUses: it just introduces a readonly function call at the end of each function

I'm not deeply familiar with the MemorySSA datastructures, but that makes sense. I'm trying to think if any other passes care, though. I can't come up with anything off the top of my head, but maybe I'm forgetting something.

In D73763#1852269, @efriedma wrote:

Currently the patch uses a hack to create those MemoryUses: it just introduces a readonly function call at the end of each function

I'm not deeply familiar with the MemorySSA datastructures, but that makes sense. I'm trying to think if any other passes care, though. I can't come up with anything off the top of my head, but maybe I'm forgetting something.

Currently the pass cleans up the artificial uses it introduces at the end, so other passes should not be impacted, unless I am missing something. The fact that they only alias objects visible to the caller after the function returns is only modelled directly in DSE and leaving them around could potentially slightly pessimize other passes.

fhahn added a parent revision: D72148: [DSE] Support traversing MemoryPhis..Feb 3 2020, 1:35 PM

Rebase after latest updates to earlier change.

Harbormaster failed remote builds in B45971: Diff 243261!Feb 7 2020, 12:26 PM

Ping. Rebased after D72700 landed.

Harbormaster completed remote builds in B46087: Diff 243556.Feb 10 2020, 8:11 AM

jrmuizel added a subscriber: jrmuizel.Feb 11 2020, 5:40 AM

ping

this seems like a great approach to me.

llvm/lib/Transforms/Scalar/DeadStoreElimination.cpp
1596	the comment seems outdated. i think that with this patch dominating will not be true anymore. and MemoryPHI seems to not cause a bail out.

Disclaimer: I haven't looked at the approach here.
Have you seen the MustBeExecutedContextExplorer and the path exploration it allows (D65593)?
Sounds like this might actually be applicable here. If the chosen approach is superior or more natural feel free to say so.

llvm/lib/Transforms/Scalar/DeadStoreElimination.cpp
1485	I'm not sure this is allowed in a function pass.

uenoku added a subscriber: uenoku.Feb 29 2020, 5:51 PM

Rebased, update comment.

In D73763#1899610, @jdoerfert wrote:

Disclaimer: I haven't looked at the approach here.
Have you seen the MustBeExecutedContextExplorer and the path exploration it allows (D65593)?
Sounds like this might actually be applicable here. If the chosen approach is superior or more natural feel free to say so.

We use MemorySSA traversals here, which allows us to only check the relevant linked memory instructions, without needing to explore the CFG. I had a brief look at MustExecute and I think the MemorySSA traversal is better suited to the problem at hand, as MemorySSA directly links the 'interesting' instructions (although I might miss something about MustExecute).

Also, for a killing store, we need to find an overwriting store that must execute when the other store executes. But we also have to check for any aliasing reads that may execute in between.

fhahn marked 2 inline comments as done.Mar 2 2020, 4:21 AM

fhahn added inline comments.

llvm/lib/Transforms/Scalar/DeadStoreElimination.cpp
1485	Do you mean adding a new global or adding a function attribute to the added global? Ideally function passes would only modify things in the function scope, but I think adding a new global is quite common, as most function passes that add new calls also may need to add a declaration of the called function. As for adding the attribute, this needs to definitely change before submitting! I think we can either use an intrinsic (with the right attributes) or model it directly in MemorySSA, whatever option is preferred.
1596	Yes, I've rebased the patch and updated the comment as well.

Harbormaster completed remote builds in B47761: Diff 247596.Mar 2 2020, 5:05 AM

ping.

updated to use an intrinsic & fix a crash where AA with the new pass manager does return ModRef for our inserted read-only functions. As mentioned earlier, alternatively it could be modeled in MemorySSA directly, if preferered.

jdoerfert added inline comments.Mar 6 2020, 1:59 PM

llvm/lib/Transforms/Scalar/DeadStoreElimination.cpp
1485	I thought there was a new function created, sorry. I have the feeling this "exit read" should be part of MSSAs functionality (as it seems tied to MSSA and reusable) but I don't have a strong opinion about it.

Harbormaster completed remote builds in B48390: Diff 248817.Mar 6 2020, 2:23 PM

Ping. @asbirlea, what do you think about modeling reads at the end of the function? Should it be done directly in MSSA?

Rebased after landing MemoryPhi support, pre-committed new tests.

fhahn mentioned this in rGece6cf0fa566: [DSE,MSSA] Precommit additional tests for D73763..Mar 20 2020, 7:00 AM

Harbormaster completed remote builds in B49878: Diff 251621.Mar 20 2020, 7:32 AM

ping

I don't think MemorySSA should be adding that generic intrinsic. If we need such a hack at least contain it to within the pass, not an analysis used throughout the pass pipeline.
+@george.burgess.iv for his thoughts on this too.

llvm/include/llvm/IR/Intrinsics.td
1160 ↗	(On Diff #251621)	I wouldn't use this name. As far as I see, it has nothing to do with MSSA, it's just a generic intrinsic that models a read-only function call. This looks more akin to `int_donothing_mayread`.
llvm/lib/Transforms/Scalar/DeadStoreElimination.cpp
1473	Cleanup PDT if it's not used anymore?
1516	If this intrinsic is read only, then assert a MemoryUse was created?

asbirlea added a reviewer: george.burgess.iv.Mar 26 2020, 1:01 PM

Thanks for taking a look!

Renamed intrinsic, removed PDT, added comment about why calls are inserted at the end of a function.

llvm/include/llvm/IR/Intrinsics.td
1160 ↗	(On Diff #251621)	Sounds good to me.
llvm/lib/Transforms/Scalar/DeadStoreElimination.cpp
1473	Yes, it should indeed be removed!
1516	There is some weird behavior with the new pass manager, in which AA fails to infer writenone for the newly added call to the intrinsic. I am not sure exactly what is going on there, I'll take a closer look if the intrinsic approach is the way forward.

Harbormaster failed remote builds in B50615: Diff 252971!Mar 26 2020, 2:11 PM

Thanks for working on this!

+1 to the general idea of "adding a read-everything intrinsic doesn't smell great," and a further +1 for containing it to only this pass if we do end up needing it.

This is the first time I'm glancing at this pass, so I'm going to err a bit on the side of overcommunication in hopes of making misunderstandings on my side more obvious. :)

I gather that getDomMemoryDef, given two defs A (Current) and B (KillingDef):

Determines some Def/Phi that dominates A, C.
Walks all of the MemoryAccesses that're transitively users of C, stopping at full overwrites of C + B.
If any of those MemoryAccesses is a potential read of A, gives up. Otherwise, returns A

If we remove the postdom restriction, I'm uncertain about how this algo will scale. It used to be that all paths necessarily terminated at B, but now it looks like we'll potentially walk a ton more? e.g.,

bool branch(); // inaccessiblememonly

{
  // 1 = MemoryDef(LOE) -- DomAccess
  *a = 1;
  if (branch()) {
    // 2 = MemoryDef(1) -- The thing we want to replace it with
    *a = 2;
  }
  // 3 = MemoryPhi(2, 1)
  // Every Phi|Def below here is now walked (?)
}

To be clear, I'm not saying this walking isn't necessary; just that it now happens AFAICT.

In any case, the thing I see this new intrinsic standing in for is an oracle that responds to "here's the set of blocks we've observed complete overwrites in. Is there any way for us to exit this function without going through one of those blocks, starting from A->getBlock()?" This is an expensive problem, but I think we can simplify it substantially, *if* my comment about the extra walking above is correct. I'd imagine we could do it essentially by constructing a sort-of def frontier for each ret terminator. In other words, a DenseSet<MemoryAccess *> // (MemoryDef || MemoryPhi) where membership in the set == "there exists a path where this Def/Phi is the final Def/Phi before we hit a ret." Since lifting the postdom restriction makes us explore every Def/Phi reachable from C (stopping at full overwrites, and stopping after N steps), we should hopefully be able to consult that set in the getDomMemoryDef loop to get a similar effect to the fake read-everything calls?

llvm/lib/Transforms/Scalar/DeadStoreElimination.cpp
1596	(did this update get dropped accidentally? This comment still appears to say "Find a MemoryDef, ...")
1647	tangential to this patch: should we be ignoring `U`s which are `MemoryDefs` whose only Use of `Acc` is as their `getOptimized()` operand? Without that, it's not immediately obvious to me how we aren't overly conservative in situations like: { // 1 = MemoryDef(LOE) -- this is DomAccess a = 1; // 2 = MemoryDef(1) b = 2; // 3 = MemoryDef(2) -- this is the obvious kill a = 3; // 4 = MemoryDef(3) (optimized to 2) c = 4; // MemoryUse(4) -- AA says MayAlias for `(d, a)`, so `isReadClobber()` should be `true` if we visit this? if (*d) { // ... } }

In D73763#1951432, @george.burgess.iv wrote:

Thanks for working on this!

+1 to the general idea of "adding a read-everything intrinsic doesn't smell great," and a further +1 for containing it to only this pass if we do end up needing it.

Regarding the intrinsic, I am not sure if it is really needed, I guess it would be enough to check if we reached the end of our walk for each path. I'll try that tomorrow.

This is the first time I'm glancing at this pass, so I'm going to err a bit on the side of overcommunication in hopes of making misunderstandings on my side more obvious. :)

Thank you very much for taking a look!

I gather that getDomMemoryDef, given two defs A (Current) and B (KillingDef):

Determines some Def/Phi that dominates A, C.

Walks all of the MemoryAccesses that're transitively users of C, stopping at full overwrites of C + B.

If any of those MemoryAccesses is a potential read of A, gives up. Otherwise, returns A

If we remove the postdom restriction, I'm uncertain about how this algo will scale. It used to be that all paths necessarily terminated at B, but now it looks like we'll potentially walk a ton more? e.g.,
bool branch(); // inaccessiblememonly

{
  // 1 = MemoryDef(LOE) -- DomAccess
  *a = 1;
  if (branch()) {
    // 2 = MemoryDef(1) -- The thing we want to replace it with
    *a = 2;
  }
  // 3 = MemoryPhi(2, 1)
  // Every Phi|Def below here is now walked (?)
}
To be clear, I'm not saying this walking isn't necessary; just that it now happens AFAICT.

Yes the summary above is correct. Removing the post-dominance restriction requires us to explore potentially much further unfortunately.

The main property we need to check is the following: given MemoryDefs A and B, where A dominates B and B completely overwrites A. We can eliminate A, if there are no access that may read A before B overwrites it. Only considering defs where B post-dominates A restricts the paths we have to check to the paths 'between' A and B. Without requiring post-dominance we have to check the property along all paths from A to the exit. In particular, the new case this allows to eliminate is when A is not read on any paths that do not go through B.

In any case, the thing I see this new intrinsic standing in for is an oracle that responds to "here's the set of blocks we've observed complete overwrites in. Is there any way for us to exit this function without going through one of those blocks, starting from A->getBlock()?" This is an expensive problem, but I think we can simplify it substantially, *if* my comment about the extra walking above is correct.

I think there are 2 cases to distinguish:

For accesses to non-alloca objects, this matches what the intrinsic achieves I think. We can only eliminate stores to objects that are visible after the function returns, if they are overwritten along all paths to the exit. So if we have determined a set of blocks that overwrite A (and there are no reads in between), we could check if all paths to the exit from A must go through one of the overwriting blocks. I think that matches your suggestion.
For objects that are not visible to the caller (e.g. alloca), the intrinsic achieves something slightly different: Given A and B as above, we can eliminate A, if there are no reads of A between A and any of the exits from A. We can stop walking any paths that contain overwrites of A.

I think 1. could be improved as you suggested. If we want to cover 2. I think we have to check all potentially aliasing access along all paths to the exit.

The approach I tried to follow to bring up MemoySSA-backed DSE is to add support for additional cases in smallish steps to have a baseline (and compile-time is controlled by cut-offs) and subsequently improve the parts that case issues in practice. For example, lift the post-dominance restriction with potentially excessive walking, followed by patches to implement the special handling for the non-alloca case as suggested and future performance improvements (e.g. there is plenty of potential for caching, like in D75025)? That would also allow to quantify the impact of improvements. What do you think?

Avoid inserting calls by pre-computing set of the last memorydefs or phis before a function exit.

fhahn marked an inline comment as done.Apr 1 2020, 12:07 PM

fhahn added inline comments.

llvm/lib/Transforms/Scalar/DeadStoreElimination.cpp
1647	Oh, the optimized access is also added to the uses of `2 = MemoryDef` here? Yes that would be overly conservative. Is there a way to force MemorySSA to be optimized before we run DSE, for testing? I'll put up a separate fix.

Harbormaster failed remote builds in B51326: Diff 254267!Apr 1 2020, 12:22 PM

I think there are 2 cases to distinguish:

For accesses to non-alloca objects, this matches what the intrinsic achieves I think. We can only eliminate stores to objects that are visible after the function returns, if they are overwritten along all paths to the exit. So if we have determined a set of blocks that overwrite A (and there are no reads in between), we could check if all paths to the exit from A must go through one of the overwriting blocks. I think that matches your suggestion.

Sounds like what I was thinking, yeah.

For objects that are not visible to the caller (e.g. alloca), the intrinsic achieves something slightly different: Given A and B as above, we can eliminate A, if there are no reads of A between A and any of the exits from A. We can stop walking any paths that contain overwrites of A.

I'm confused about why allocas need special treatment. This new set that we're building is essentially a fast way to answer the question "are there potentially unknowable reads after this Def/Phi?" If we know memory (e.g., from an alloca) is unreadable outside of the function, I'd hope we can ignore the set entirely, like this patch is doing now? Baked into my suggestion was the assumption that we were already:

check[ing] all potentially aliasing access along all paths to the exit

...since I think the transitive User walking we're doing inside of getDomMemoryDef should accomplish that?

In any case, if you're confident that this works for #1, I'm content to move discussion about "why don't things work as they are now," to a patch that doesn't block this one.

The approach I tried to follow to bring up MemoySSA-backed DSE is to add support for additional cases in smallish steps to have a baseline (and compile-time is controlled by cut-offs) and subsequently improve the parts that case issues in practice. For example, lift the post-dominance restriction with potentially excessive walking, followed by patches to implement the special handling for the non-alloca case as suggested and future performance improvements (e.g. there is plenty of potential for caching, like in D75025)? That would also allow to quantify the impact of improvements. What do you think?

I like incremental approaches when they're possible, so I'm happy with this. :)

llvm/lib/Transforms/Scalar/DeadStoreElimination.cpp
1468	that is defs or phis without any users that are defs or phis. This isn't quite complete; consider: void bar(); // inaccessiblememonly void foo(int a) { // 1 = MemoryDef(LOE) a = 1; if (bar()) return; // 2 = MemoryDef(1) *a = 2; } Despite having `2` as a `User`, we'd still want `1` to be in this set, since depending on the way the `if` goes, it's the last Def/Phi before the function ends
1647	Is there a way to force MemorySSA to be optimized before we run DSE, for testing? I'm unsure offhand. @asbirlea would likely know if there is. If there's not and you'd find it useful, a debugging-only pass that boils down to `for (MemoryAccess *MA : Fn) { MSSA->getWalker()->getClobberingMemoryAccess(MA, {}); }` should force that. We preoptimize `Use`s, but not `Def`s.

In D73763#1956347, @george.burgess.iv wrote:

I think there are 2 cases to distinguish:

For accesses to non-alloca objects, this matches what the intrinsic achieves I think. We can only eliminate stores to objects that are visible after the function returns, if they are overwritten along all paths to the exit. So if we have determined a set of blocks that overwrite A (and there are no reads in between), we could check if all paths to the exit from A must go through one of the overwriting blocks. I think that matches your suggestion.

Sounds like what I was thinking, yeah.

For objects that are not visible to the caller (e.g. alloca), the intrinsic achieves something slightly different: Given A and B as above, we can eliminate A, if there are no reads of A between A and any of the exits from A. We can stop walking any paths that contain overwrites of A.

I'm confused about why allocas need special treatment. This new set that we're building is essentially a fast way to answer the question "are there potentially unknowable reads after this Def/Phi?" If we know memory (e.g., from an alloca) is unreadable outside of the function, I'd hope we can ignore the set entirely, like this patch is doing now? Baked into my suggestion was the assumption that we were already:

Ah right, I think I was not sure if your suggestion also applied to the alloca case in the previous comment.

IIUC, the additional set would be used for the non-alloca cases, as we have to ensure that there are overwrites along all paths to the exit. For allocas, we have to explore all access along all paths to function exits, as this patch currently does. What I meant to say in my previous comment was that I think we have to stick to the walking as in the patch, which is also what your last comment said, unless I am missing something :)

In any case, if you're confident that this works for #1, I'm content to move discussion about "why don't things work as they are now," to a patch that doesn't block this one.

The discussion so far has been extremely helpful, thanks! I think it would make sense to address the alloca/non-alloca separately. If you are happy with the direction, I would update this patch to only handle the alloca case (basically this patch modulo the intrinsic/LastDefOrPhi) and address the non-alloca in a subsequent patch.

llvm/lib/Transforms/Scalar/DeadStoreElimination.cpp
1468	Of course, I remember why the intrinsic was so convenient to start with again! Anyways, I plan to split up the patch and probably look into using the suggested set straight away instead for the cases that required LastPhiOrDef.

IIUC, the additional set would be used for the non-alloca cases, as we have to ensure that there are overwrites along all paths to the exit.

Yup!

For allocas, we have to explore all access along all paths to function exits, as this patch currently does. What I meant to say in my previous comment was that I think we have to stick to the walking as in the patch, which is also what your last comment said, unless I am missing something :)

Ah, I see. Yeah, agreed :)

If you are happy with the direction, I would update this patch to only handle the alloca case

SGTM

fhahn mentioned this in D77736: [DSE] Lift post-dominance for objs not accessible in caller..Apr 8 2020, 8:29 AM

In D73763#1964635, @george.burgess.iv wrote:

If you are happy with the direction, I would update this patch to only handle the alloca case

SGTM

I put up a patch that only handles the alloca (and alloca like non-escaping) case: D77736

I'll try to get an update on the second case as well soonish.

fhahn mentioned this in rGcf9ee49b4d7f: [DSE] Lift post-dominance for objs not accessible in caller..Apr 15 2020, 3:48 AM

vsapsai added a subscriber: vsapsai.Apr 16 2020, 12:27 PM

vsapsai added inline comments.

llvm/test/Transforms/DeadStoreElimination/MSSA/memset-missing-debugloc.ll
18	It would be great to update the comment to match the new IR, i.e., make `b` a function parameter instead of a local variable.

I've put up D78932 which implements eliminating stores to objects that are visible to the caller along the lines of @george.burgess.iv's suggestion

llvm/test/Transforms/DeadStoreElimination/MSSA/memset-missing-debugloc.ll
18	Thanks will do when I submit the test change!

fhahn mentioned this in D78932: [DSE,MSSA] Relax post-dom restriction for objs visible after return..May 4 2020, 11:17 AM

fhahn mentioned this in D119760: [DSE] Fall back to CFG scan for unreachable terminators..Feb 15 2022, 12:08 PM

Revision Contents

Path

Size

llvm/

lib/

Transforms/

Scalar/

DeadStoreElimination.cpp

66 lines

test/

Transforms/

DeadStoreElimination/

MSSA/

memset-missing-debugloc.ll

5 lines

multiblock-malloc-free.ll

3 lines

multiblock-multipath.ll

74 lines

multiblock-overlap.ll

114 lines

multiblock-simple.ll

7 lines

simple.ll

2 lines

Diff 243556

llvm/lib/Transforms/Scalar/DeadStoreElimination.cpp

Show First 20 Lines • Show All 1,459 Lines • ▼ Show 20 Lines	struct DSEState {
// function returns.		// function returns.
SmallPtrSet<const Value *, 16> InvisibleToCaller;		SmallPtrSet<const Value *, 16> InvisibleToCaller;
// Keep track of blocks with throwing instructions not modeled in MemorySSA.		// Keep track of blocks with throwing instructions not modeled in MemorySSA.
SmallPtrSet<BasicBlock *, 16> ThrowingBlocks;		SmallPtrSet<BasicBlock *, 16> ThrowingBlocks;
// Post-order numbers for each basic block. Used to figure out if memory		// Post-order numbers for each basic block. Used to figure out if memory
// accesses are executed before another access.		// accesses are executed before another access.
DenseMap<BasicBlock *, unsigned> PostOrderNumbers;		DenseMap<BasicBlock *, unsigned> PostOrderNumbers;

DenseMap<BasicBlock *, InstOverlapIntervalsTy> IOLs;		DenseMap<BasicBlock *, InstOverlapIntervalsTy> IOLs;
		george.burgess.ivUnsubmitted Not Done Reply Inline Actions that is defs or phis without any users that are defs or phis. This isn't quite complete; consider: void bar(); // inaccessiblememonly void foo(int a) { // 1 = MemoryDef(LOE) a = 1; if (bar()) return; // 2 = MemoryDef(1) a = 2; } Despite having `2` as a `User`, we'd still want `1` to be in this set, since depending on the way the `if` goes, it's the last Def/Phi before the function ends george.burgess.iv:* > that is defs or phis without any users that are defs or phis. This isn't quite complete…
		fhahnAuthorUnsubmitted Done Reply Inline Actions Of course, I remember why the intrinsic was so convenient to start with again! Anyways, I plan to split up the patch and probably look into using the suggested set straight away instead for the cases that required LastPhiOrDef. fhahn: Of course, I remember why the intrinsic was so convenient to start with again! Anyways, I plan…

		Function *ExitUseFn;

DSEState(Function &F, AliasAnalysis &AA, MemorySSA &MSSA, DominatorTree &DT,		DSEState(Function &F, AliasAnalysis &AA, MemorySSA &MSSA, DominatorTree &DT,
PostDominatorTree &PDT, const TargetLibraryInfo &TLI)		PostDominatorTree &PDT, const TargetLibraryInfo &TLI)
		asbirleaUnsubmitted Done Reply Inline Actions Cleanup PDT if it's not used anymore? asbirlea: Cleanup PDT if it's not used anymore?
		fhahnAuthorUnsubmitted Done Reply Inline Actions Yes, it should indeed be removed! fhahn: Yes, it should indeed be removed!
: F(F), AA(AA), MSSA(MSSA), DT(DT), PDT(PDT), TLI(TLI) {}		: F(F), AA(AA), MSSA(MSSA), DT(DT), PDT(PDT), TLI(TLI) {}

static DSEState get(Function &F, AliasAnalysis &AA, MemorySSA &MSSA,		static DSEState get(Function &F, AliasAnalysis &AA, MemorySSA &MSSA,
DominatorTree &DT, PostDominatorTree &PDT,		DominatorTree &DT, PostDominatorTree &PDT,
const TargetLibraryInfo &TLI) {		const TargetLibraryInfo &TLI) {
DSEState State(F, AA, MSSA, DT, PDT, TLI);		DSEState State(F, AA, MSSA, DT, PDT, TLI);
		Module *M = F.getParent();
		LLVMContext &Ctx = M->getContext();
		FunctionType *FnTy = FunctionType::get(Type::getVoidTy(Ctx), {}, false);
		State.ExitUseFn =
		cast<Function>(M->getOrInsertFunction("____foobar", FnTy).getCallee());
		State.ExitUseFn->addFnAttr(Attribute::ReadOnly);
		jdoerfertUnsubmitted Not Done Reply Inline Actions I'm not sure this is allowed in a function pass. jdoerfert: I'm not sure this is allowed in a function pass.
		fhahnAuthorUnsubmitted Done Reply Inline Actions Do you mean adding a new global or adding a function attribute to the added global? Ideally function passes would only modify things in the function scope, but I think adding a new global is quite common, as most function passes that add new calls also may need to add a declaration of the called function. As for adding the attribute, this needs to definitely change before submitting! I think we can either use an intrinsic (with the right attributes) or model it directly in MemorySSA, whatever option is preferred. fhahn: Do you mean adding a new global or adding a function attribute to the added global? Ideally…
		jdoerfertUnsubmitted Not Done Reply Inline Actions I thought there was a new function created, sorry. I have the feeling this "exit read" should be part of MSSAs functionality (as it seems tied to MSSA and reusable) but I don't have a strong opinion about it. jdoerfert: I thought there was a new function created, sorry. I have the feeling this "exit read" should…

// Collect blocks with throwing instructions not modeled in MemorySSA and		// Collect blocks with throwing instructions not modeled in MemorySSA and
// alloc-like objects.		// alloc-like objects.
unsigned PO = 0;		unsigned PO = 0;
for (BasicBlock *BB : post_order(&F)) {		for (BasicBlock *BB : post_order(&F)) {
State.PostOrderNumbers[BB] = PO++;		State.PostOrderNumbers[BB] = PO++;
for (Instruction &I : *BB) {		for (Instruction &I : *BB) {
if (I.mayThrow() && !MSSA.getMemoryAccess(&I))		if (I.mayThrow() && !MSSA.getMemoryAccess(&I))
State.ThrowingBlocks.insert(I.getParent());		State.ThrowingBlocks.insert(I.getParent());

auto *MD = dyn_cast_or_null<MemoryDef>(MSSA.getMemoryAccess(&I));		auto *MD = dyn_cast_or_null<MemoryDef>(MSSA.getMemoryAccess(&I));
if (MD && State.MemDefs.size() < MemorySSADefsPerBlockLimit &&		if (MD && State.MemDefs.size() < MemorySSADefsPerBlockLimit &&
hasAnalyzableMemoryWrite(&I, TLI) && isRemovable(&I))		hasAnalyzableMemoryWrite(&I, TLI) && isRemovable(&I))
State.MemDefs.push_back(MD);		State.MemDefs.push_back(MD);

// Track alloca and alloca-like objects. Here we care about objects not		// Track alloca and alloca-like objects. Here we care about objects not
// visible to the caller during function execution. Alloca objects are		// visible to the caller during function execution. Alloca objects are
// invalid in the caller, for alloca-like objects we ensure that they		// invalid in the caller, for alloca-like objects we ensure that they
// are not captured throughout the function.		// are not captured throughout the function.
if (isa<AllocaInst>(&I) \|\|		if (isa<AllocaInst>(&I) \|\|
(isAllocLikeFn(&I, &TLI) && !PointerMayBeCaptured(&I, false, true)))		(isAllocLikeFn(&I, &TLI) && !PointerMayBeCaptured(&I, false, true)))
State.InvisibleToCaller.insert(&I);		State.InvisibleToCaller.insert(&I);
}		}
		if (isa<ReturnInst>(BB->getTerminator()) && DT.isReachableFromEntry(BB)) {
		MemorySSAUpdater MSSAU(&MSSA);
		CallInst *CI =
		CallInst::Create(State.ExitUseFn->getFunctionType(),
		State.ExitUseFn, {}, BB->getTerminator());
		MemoryAccess *NewMemAcc = MSSAU.createMemoryAccessInBB(
		CI, nullptr, CI->getParent(), MemorySSA::End);
		MSSAU.insertUse(cast<MemoryUse>(NewMemAcc), false);
		asbirleaUnsubmitted Done Reply Inline Actions If this intrinsic is read only, then assert a MemoryUse was created? asbirlea: If this intrinsic is read only, then assert a MemoryUse was created?
		fhahnAuthorUnsubmitted Done Reply Inline Actions There is some weird behavior with the new pass manager, in which AA fails to infer writenone for the newly added call to the intrinsic. I am not sure exactly what is going on there, I'll take a closer look if the intrinsic approach is the way forward. fhahn: There is some weird behavior with the new pass manager, in which AA fails to infer writenone…
		}
}		}

// Treat byval or inalloca arguments the same as Allocas, stores to them are		// Treat byval or inalloca arguments the same as Allocas, stores to them are
// dead at the end of the function.		// dead at the end of the function.
for (Argument &AI : F.args())		for (Argument &AI : F.args())
if (AI.hasByValOrInAllocaAttr())		if (AI.hasByValOrInAllocaAttr())
State.InvisibleToCaller.insert(&AI);		State.InvisibleToCaller.insert(&AI);
return State;		return State;
Show All 20 Lines	if (auto CS = CallSite(I)) {
}		}
return None;		return None;
}		}

return MemoryLocation::getOrNone(I);		return MemoryLocation::getOrNone(I);
}		}

/// Returns true if \p Use completely overwrites \p DefLoc.		/// Returns true if \p Use completely overwrites \p DefLoc.
bool isCompleteOverwrite(MemoryLocation DefLoc, Instruction *UseInst) const {		bool isMustWriteClobber(MemoryLocation DefLoc, Instruction *UseInst) const {
// UseInst has a MemoryDef associated in MemorySSA. It's possible for a		// UseInst has a MemoryDef associated in MemorySSA. It's possible for a
// MemoryDef to not write to memory, e.g. a volatile load is modeled as a		// MemoryDef to not write to memory, e.g. a volatile load is modeled as a
// MemoryDef.		// MemoryDef.
if (!UseInst->mayWriteToMemory())		if (!UseInst->mayWriteToMemory())
return false;		return false;

if (auto CS = CallSite(UseInst))		if (auto CS = CallSite(UseInst))
if (CS.onlyAccessesInaccessibleMemory())		if (CS.onlyAccessesInaccessibleMemory())
return false;		return false;

ModRefInfo MR = AA.getModRefInfo(UseInst, DefLoc);		ModRefInfo MR = AA.getModRefInfo(UseInst, DefLoc);
// If necessary, perform additional analysis.
if (isModSet(MR))
MR = AA.callCapturesBefore(UseInst, DefLoc, &DT);

Optional<MemoryLocation> UseLoc = getLocForWriteEx(UseInst);		Optional<MemoryLocation> UseLoc = getLocForWriteEx(UseInst);
return isModSet(MR) && isMustSet(MR) &&		return isModSet(MR) && isMustSet(MR) &&
UseLoc->Size.getValue() >= DefLoc.Size.getValue();		UseLoc->Size.getValue() >= DefLoc.Size.getValue();
}		}

/// Returns true if \p Use may read from \p DefLoc.		/// Returns true if \p Use may read from \p DefLoc.
bool isReadClobber(MemoryLocation DefLoc, Instruction *UseInst) const {		bool isReadClobber(MemoryLocation DefLoc, Instruction *UseInst) const {
if (!UseInst->mayReadFromMemory())		if (!UseInst->mayReadFromMemory())
return false;		return false;

if (auto CS = CallSite(UseInst))		if (auto CS = CallSite(UseInst)) {
if (CS.onlyAccessesInaccessibleMemory())		if (CS.onlyAccessesInaccessibleMemory())
return false;		return false;

		if (CS.getCalledFunction() == ExitUseFn) {
		DataLayout DL = F.getParent()->getDataLayout();
		const Value *UO = GetUnderlyingObject(DefLoc.Ptr, DL);
		/// Maybe pre-compute
		return !UO \|\| InvisibleToCaller.find(UO) == InvisibleToCaller.end() \|\|
		(!isa<AllocaInst>(UO) && PointerMayBeCaptured(UO, true, false));
		}
		}

ModRefInfo MR = AA.getModRefInfo(UseInst, DefLoc);		ModRefInfo MR = AA.getModRefInfo(UseInst, DefLoc);
// If necessary, perform additional analysis.		// If necessary, perform additional analysis.
if (isRefSet(MR))		if (isRefSet(MR))
MR = AA.callCapturesBefore(UseInst, DefLoc, &DT);		MR = AA.callCapturesBefore(UseInst, DefLoc, &DT);
return isRefSet(MR);		return isRefSet(MR);
}		}

// Find a MemoryDef writing to \p DefLoc and dominating \p Current, with no		// Find a MemoryDef writing to \p DefLoc and dominating \p Current, with no
		TykerUnsubmitted Not Done Reply Inline Actions the comment seems outdated. i think that with this patch dominating will not be true anymore. and MemoryPHI seems to not cause a bail out. Tyker: the comment seems outdated. i think that with this patch dominating will not be true anymore.
		fhahnAuthorUnsubmitted Done Reply Inline Actions Yes, I've rebased the patch and updated the comment as well. fhahn: Yes, I've rebased the patch and updated the comment as well.
		george.burgess.ivUnsubmitted Not Done Reply Inline Actions (did this update get dropped accidentally? This comment still appears to say "Find a MemoryDef, ...") george.burgess.iv: (did this update get dropped accidentally? This comment still appears to say "Find a MemoryDef…
// read access in between or return None otherwise. The returned value may not		// read access in between or return None otherwise. The returned value may not
// (completely) overwrite \p DefLoc. Currently we bail out when we encounter		// (completely) overwrite \p DefLoc. Currently we bail out when we encounter
// any of the following		// any of the following
// * An aliasing MemoryUse (read).		// * An aliasing MemoryUse (read).
// * A MemoryPHI.		// * A MemoryPHI.
Optional<MemoryAccess > getDomMemoryDef(MemoryDef KillingDef,		Optional<MemoryAccess > getDomMemoryDef(MemoryDef KillingDef,
MemoryAccess *Current,		MemoryAccess *Current,
MemoryLocation DefLoc,		MemoryLocation DefLoc,
Show All 19 Lines	do {
return None;		return None;

// Look for access that clobber DefLoc.		// Look for access that clobber DefLoc.
DomAccess = MSSA.getSkipSelfWalker()->getClobberingMemoryAccess(CurrentUD,		DomAccess = MSSA.getSkipSelfWalker()->getClobberingMemoryAccess(CurrentUD,
DefLoc);		DefLoc);
if (MSSA.isLiveOnEntryDef(DomAccess))		if (MSSA.isLiveOnEntryDef(DomAccess))
return None;		return None;

// Check if we can skip DomDef for DSE. We also require the KillingDef		// Check if we can skip DomDef for DSE.
// execute whenever DomDef executes and use post-dominance to ensure that.

MemoryDef *DomDef = dyn_cast<MemoryDef>(DomAccess);		MemoryDef *DomDef = dyn_cast<MemoryDef>(DomAccess);
if ((DomDef && canSkipDef(DomDef, DefVisibleToCaller)) \|\|		if ((DomDef && canSkipDef(DomDef, DefVisibleToCaller))) {
!PDT.dominates(KillingDef->getBlock(), DomDef->getBlock())) {
StepAgain = true;		StepAgain = true;
Current = DomDef->getDefiningAccess();		Current = DomDef->getDefiningAccess();
}		}

} while (StepAgain);		} while (StepAgain);

LLVM_DEBUG(dbgs() << " Checking for reads of " << *DomAccess << " ("		LLVM_DEBUG(dbgs() << " Checking for reads of " << *DomAccess << " ("
<< *cast<MemoryDef>(DomAccess)->getMemoryInst() << ")\n");		<< *cast<MemoryDef>(DomAccess)->getMemoryInst() << ")\n");

SmallSetVector<MemoryAccess *, 32> WorkList;		SmallSetVector<MemoryAccess *, 32> WorkList;
auto PushMemUses = [&WorkList](MemoryAccess *Acc) {		auto PushMemUses = [&WorkList](MemoryAccess *Acc) {
for (Use &U : Acc->uses())		for (Use &U : Acc->uses())
WorkList.insert(cast<MemoryAccess>(U.getUser()));		WorkList.insert(cast<MemoryAccess>(U.getUser()));
		george.burgess.ivUnsubmitted Not Done Reply Inline Actions tangential to this patch: should we be ignoring `U`s which are `MemoryDefs` whose only Use of `Acc` is as their `getOptimized()` operand? Without that, it's not immediately obvious to me how we aren't overly conservative in situations like: { // 1 = MemoryDef(LOE) -- this is DomAccess a = 1; // 2 = MemoryDef(1) b = 2; // 3 = MemoryDef(2) -- this is the obvious kill a = 3; // 4 = MemoryDef(3) (optimized to 2) c = 4; // MemoryUse(4) -- AA says MayAlias for `(d, a)`, so `isReadClobber()` should be `true` if we visit this? if (d) { // ... } } george.burgess.iv:* tangential to this patch: should we be ignoring `U`s which are `MemoryDefs` whose only Use of…
		fhahnAuthorUnsubmitted Done Reply Inline Actions Oh, the optimized access is also added to the uses of `2 = MemoryDef` here? Yes that would be overly conservative. Is there a way to force MemorySSA to be optimized before we run DSE, for testing? I'll put up a separate fix. fhahn: Oh, the optimized access is also added to the uses of `2 = MemoryDef` here? Yes that would be…
		george.burgess.ivUnsubmitted Not Done Reply Inline Actions Is there a way to force MemorySSA to be optimized before we run DSE, for testing? I'm unsure offhand. @asbirlea would likely know if there is. If there's not and you'd find it useful, a debugging-only pass that boils down to `for (MemoryAccess MA : Fn) { MSSA->getWalker()->getClobberingMemoryAccess(MA, {}); }` should force that. We preoptimize `Use`s, but not `Def`s. george.burgess.iv:* > Is there a way to force MemorySSA to be optimized before we run DSE, for testing? I'm unsure…
};		};
PushMemUses(DomAccess);		PushMemUses(DomAccess);

// Check if DomDef may be read.		// Check if DomDef may be read.
for (unsigned I = 0; I < WorkList.size(); I++) {		for (unsigned I = 0; I < WorkList.size(); I++) {
MemoryAccess *UseAccess = WorkList[I];		MemoryAccess *UseAccess = WorkList[I];

LLVM_DEBUG(dbgs() << " Checking use " << *UseAccess);		LLVM_DEBUG(dbgs() << " Checking use " << *UseAccess);
Show All 34 Lines	for (unsigned I = 0; I < WorkList.size(); I++) {
// miss cases like the following		// miss cases like the following
// 1 = Def(LoE) ; <----- DomDef stores [0,1]		// 1 = Def(LoE) ; <----- DomDef stores [0,1]
// 2 = Def(1) ; (2, 1) = NoAlias, stores [2,3]		// 2 = Def(1) ; (2, 1) = NoAlias, stores [2,3]
// Use(2) ; MayAlias 2 and 1, loads [0, 3].		// Use(2) ; MayAlias 2 and 1, loads [0, 3].
// (The Use points to the first Def it may alias)		// (The Use points to the first Def it may alias)
// 3 = Def(1) ; <---- Current (3, 2) = NoAlias, (3,1) = MayAlias,		// 3 = Def(1) ; <---- Current (3, 2) = NoAlias, (3,1) = MayAlias,
// stores [0,1]		// stores [0,1]
if (MemoryDef *UseDef = dyn_cast<MemoryDef>(UseAccess)) {		if (MemoryDef *UseDef = dyn_cast<MemoryDef>(UseAccess)) {
if (!isCompleteOverwrite(DefLoc, UseInst))		int64_t InstWriteOffset, DepWriteOffset;
		auto CC = getLocForWriteEx(UseInst);
		InstOverlapIntervalsTy IOL;

		const DataLayout &DL = F.getParent()->getDataLayout();

		if (!isMustWriteClobber(DefLoc, UseInst) \|\|
		(CC &&
		isOverwrite(DefLoc, *CC, DL, TLI, DepWriteOffset, InstWriteOffset,
		UseInst, IOL, AA, &F) != OW_Complete)) {
		LLVM_DEBUG(dbgs() << " ... found non-aliasing MemoryDef\n");
PushMemUses(UseDef);		PushMemUses(UseDef);
}		}
}		}
		}

// No aliasing MemoryUses of DomAccess found, DomAccess is potentially dead.		// No aliasing MemoryUses of DomAccess found, DomAccess is potentially dead.
return {DomAccess};		return {DomAccess};
}		}

// Delete dead memory defs		// Delete dead memory defs
void deleteDeadInstruction(Instruction *SI) {		void deleteDeadInstruction(Instruction *SI) {
MemorySSAUpdater Updater(&MSSA);		MemorySSAUpdater Updater(&MSSA);
▲ Show 20 Lines • Show All 188 Lines • ▼ Show 20 Lines	for (unsigned I = 0; I < ToCheck.size(); I++) {
}		}
}		}
}		}

if (EnablePartialOverwriteTracking)		if (EnablePartialOverwriteTracking)
for (auto &KV : State.IOLs)		for (auto &KV : State.IOLs)
MadeChange \|= removePartiallyOverlappedStores(&AA, DL, KV.second);		MadeChange \|= removePartiallyOverlappedStores(&AA, DL, KV.second);

		MemorySSAUpdater MSSAU(&MSSA);
		for (auto &BB : F) {
		if (!DT.isReachableFromEntry(&BB))
		continue;
		if (isa<ReturnInst>(BB.getTerminator())) {
		Instruction C = &std::prev(BB.getTerminator()->getIterator());
		MSSAU.removeMemoryAccess(C);
		C->eraseFromParent();
		}
		}

return MadeChange;		return MadeChange;
}		}
} // end anonymous namespace		} // end anonymous namespace

//===----------------------------------------------------------------------===//		//===----------------------------------------------------------------------===//
// DSE Pass		// DSE Pass
//===----------------------------------------------------------------------===//		//===----------------------------------------------------------------------===//
PreservedAnalyses DSEPass::run(Function &F, FunctionAnalysisManager &AM) {		PreservedAnalyses DSEPass::run(Function &F, FunctionAnalysisManager &AM) {
▲ Show 20 Lines • Show All 100 Lines • Show Last 20 Lines

llvm/test/Transforms/DeadStoreElimination/MSSA/memset-missing-debugloc.ll

	Show All 9 Lines
	;			;
	; clang Debugify_Dead_Store_Elimination.cpp -Wno-c++11-narrowing -S \			; clang Debugify_Dead_Store_Elimination.cpp -Wno-c++11-narrowing -S \
	; -emit-llvm -O0 -w -Xclang -disable-O0-optnone -march=native -fdeclspec \			; -emit-llvm -O0 -w -Xclang -disable-O0-optnone -march=native -fdeclspec \
	; --target=x86_64-gnu-linux-unknown -Werror=unreachable-code -o -			; --target=x86_64-gnu-linux-unknown -Werror=unreachable-code -o -
	;			;
	; Where Debugify_Dead_Store_Elimination.cpp contains:			; Where Debugify_Dead_Store_Elimination.cpp contains:
	;			;
	; int a() {			; int a() {
	; long b[]{2, 2, 2, 2, 0};			; long b[]{2, 2, 2, 2, 0};
				vsapsaiUnsubmitted Not Done Reply Inline Actions It would be great to update the comment to match the new IR, i.e., make `b` a function parameter instead of a local variable. vsapsai: It would be great to update the comment to match the new IR, i.e., make `b` a function…
				fhahnAuthorUnsubmitted Done Reply Inline Actions Thanks will do when I submit the test change! fhahn: Thanks will do when I submit the test change!
	; if (a())			; if (a())
	; ;			; ;
	; }			; }


	define dso_local i32 @_Z1av() !dbg !7 {			define dso_local i32 @_Z1av([5 x i64]* %b) !dbg !7 {
	entry:			entry:
	%retval = alloca i32, align 4			%retval = alloca i32, align 4
	%b = alloca [5 x i64], align 16
	call void @llvm.dbg.declare(metadata [5 x i64]* %b, metadata !11, metadata !DIExpression()), !dbg !16			call void @llvm.dbg.declare(metadata [5 x i64]* %b, metadata !11, metadata !DIExpression()), !dbg !16
	%0 = bitcast [5 x i64]* %b to i8*, !dbg !16			%0 = bitcast [5 x i64]* %b to i8*, !dbg !16
	call void @llvm.memset.p0i8.i64(i8* align 16 %0, i8 0, i64 40, i1 false), !dbg !16			call void @llvm.memset.p0i8.i64(i8* align 16 %0, i8 0, i64 40, i1 false), !dbg !16
	%1 = bitcast i8* %0 to [5 x i64]*, !dbg !16			%1 = bitcast i8* %0 to [5 x i64]*, !dbg !16
	%2 = getelementptr inbounds [5 x i64], [5 x i64]* %1, i32 0, i32 0, !dbg !16			%2 = getelementptr inbounds [5 x i64], [5 x i64]* %1, i32 0, i32 0, !dbg !16
	store i64 2, i64* %2, align 16, !dbg !16			store i64 2, i64* %2, align 16, !dbg !16
	%3 = getelementptr inbounds [5 x i64], [5 x i64]* %1, i32 0, i32 1, !dbg !16			%3 = getelementptr inbounds [5 x i64], [5 x i64]* %1, i32 0, i32 1, !dbg !16
	store i64 2, i64* %3, align 8, !dbg !16			store i64 2, i64* %3, align 8, !dbg !16
	%4 = getelementptr inbounds [5 x i64], [5 x i64]* %1, i32 0, i32 2, !dbg !16			%4 = getelementptr inbounds [5 x i64], [5 x i64]* %1, i32 0, i32 2, !dbg !16
	store i64 2, i64* %4, align 16, !dbg !16			store i64 2, i64* %4, align 16, !dbg !16
	%5 = getelementptr inbounds [5 x i64], [5 x i64]* %1, i32 0, i32 3, !dbg !16			%5 = getelementptr inbounds [5 x i64], [5 x i64]* %1, i32 0, i32 3, !dbg !16
	store i64 2, i64* %5, align 8, !dbg !16			store i64 2, i64* %5, align 8, !dbg !16
	%call = call i32 @_Z1av(), !dbg !17			%call = call i32 @_Z1av([5 x i64]* %b), !dbg !17
	%tobool = icmp ne i32 %call, 0, !dbg !17			%tobool = icmp ne i32 %call, 0, !dbg !17
	br i1 %tobool, label %if.then, label %if.end, !dbg !19			br i1 %tobool, label %if.then, label %if.end, !dbg !19

	if.then: ; preds = %entry			if.then: ; preds = %entry
	br label %if.end, !dbg !19			br label %if.end, !dbg !19

	if.end: ; preds = %if.then, %entry			if.end: ; preds = %if.then, %entry
	call void @llvm.trap(), !dbg !20			call void @llvm.trap(), !dbg !20
	▲ Show 20 Lines • Show All 41 Lines • Show Last 20 Lines

llvm/test/Transforms/DeadStoreElimination/MSSA/multiblock-malloc-free.ll

	; NOTE: Assertions have been autogenerated by utils/update_test_checks.py			; NOTE: Assertions have been autogenerated by utils/update_test_checks.py

	; XFAIL: *
	; TODO: Handling of free not implemented yet.

	; RUN: opt < %s -basicaa -dse -enable-dse-memoryssa -S \| FileCheck %s			; RUN: opt < %s -basicaa -dse -enable-dse-memoryssa -S \| FileCheck %s

	target datalayout = "e-m:e-p:32:32-i64:64-v128:64:128-a:0:32-n32-S64"			target datalayout = "e-m:e-p:32:32-i64:64-v128:64:128-a:0:32-n32-S64"
	declare void @unknown_func()			declare void @unknown_func()
	declare void @llvm.lifetime.start.p0i8(i64, i8* nocapture) nounwind			declare void @llvm.lifetime.start.p0i8(i64, i8* nocapture) nounwind
	declare void @llvm.lifetime.end.p0i8(i64, i8* nocapture) nounwind			declare void @llvm.lifetime.end.p0i8(i64, i8* nocapture) nounwind
	declare void @llvm.memcpy.p0i8.p0i8.i64(i8* nocapture, i8* nocapture, i64, i1) nounwind			declare void @llvm.memcpy.p0i8.p0i8.i64(i8* nocapture, i8* nocapture, i64, i1) nounwind
	declare void @llvm.memset.p0i8.i64(i8* nocapture, i8, i64, i32, i1) nounwind			declare void @llvm.memset.p0i8.i64(i8* nocapture, i8, i64, i32, i1) nounwind
	▲ Show 20 Lines • Show All 431 Lines • Show Last 20 Lines

llvm/test/Transforms/DeadStoreElimination/MSSA/multiblock-multipath.ll

This file was added.

				; NOTE: Assertions have been autogenerated by utils/update_test_checks.py
				; RUN: opt < %s -basicaa -dse -enable-dse-memoryssa -S \| FileCheck %s

				target datalayout = "e-m:e-p:32:32-i64:64-v128:64:128-a:0:32-n32-S64"

				declare void @use(i32 *)

				define void @test4(i32* noalias %P, i1 %c1) {
				; CHECK-LABEL: @test4(
				; CHECK-NEXT: br i1 [[C1:%.]], label [[BB1:%.]], label [[BB2:%.*]]
				; CHECK: bb1:
				; CHECK-NEXT: store i32 0, i32* [[P:%.*]]
				; CHECK-NEXT: br label [[BB5:%.*]]
				; CHECK: bb2:
				; CHECK-NEXT: store i32 3, i32* [[P]]
				; CHECK-NEXT: br label [[BB5]]
				; CHECK: bb5:
				; CHECK-NEXT: call void @use(i32* [[P]])
				; CHECK-NEXT: ret void
				;
				store i32 1, i32* %P
				br i1 %c1, label %bb1, label %bb2

				bb1:
				store i32 0, i32* %P
				br label %bb5
				bb2:
				store i32 3, i32* %P
				br label %bb5

				bb5:
				call void @use(i32* %P)
				ret void
				}

				define void @test5(i32* noalias %P) {
				; CHECK-LABEL: @test5(
				; CHECK-NEXT: br i1 true, label [[BB1:%.]], label [[BB2:%.]]
				; CHECK: bb1:
				; CHECK-NEXT: store i32 0, i32* [[P:%.*]]
				; CHECK-NEXT: br label [[BB5:%.*]]
				; CHECK: bb2:
				; CHECK-NEXT: br i1 undef, label [[BB3:%.]], label [[BB4:%.]]
				; CHECK: bb3:
				; CHECK-NEXT: store i32 3, i32* [[P]]
				; CHECK-NEXT: br label [[BB5]]
				; CHECK: bb4:
				; CHECK-NEXT: store i32 5, i32* [[P]]
				; CHECK-NEXT: br label [[BB5]]
				; CHECK: bb5:
				; CHECK-NEXT: call void @use(i32* [[P]])
				; CHECK-NEXT: ret void
				;
				store i32 1, i32* %P
				br i1 true, label %bb1, label %bb2
				bb1:
				store i32 0, i32* %P
				br label %bb5

				bb2:
				br i1 undef, label %bb3, label %bb4

				bb3:
				store i32 3, i32* %P
				br label %bb5

				bb4:
				store i32 5, i32* %P
				br label %bb5

				bb5:
				call void @use(i32* %P)
				ret void
				}

llvm/test/Transforms/DeadStoreElimination/MSSA/multiblock-overlap.ll

This file was added.

				; NOTE: Assertions have been autogenerated by utils/update_test_checks.py
				; RUN: opt -dse -enable-dse-memoryssa %s -S \| FileCheck %s


				%struct.ham = type { [3 x double], [3 x double]}

				declare void @may_throw()
				declare void @llvm.memset.p0i8.i64(i8* nocapture writeonly, i8, i64, i1 immarg)

				define void @overlap1(%struct.ham* %arg, i1 %cond) {
				; CHECK-LABEL: @overlap1(
				; CHECK-NEXT: bb:
				; CHECK-NEXT: [[TMP:%.]] = getelementptr inbounds [[STRUCT_HAM:%.]], %struct.ham* [[ARG:%.*]], i64 0, i32 0, i64 2
				; CHECK-NEXT: [[TMP1:%.]] = getelementptr inbounds [[STRUCT_HAM]], %struct.ham [[ARG]], i64 0, i32 0, i64 1
				; CHECK-NEXT: [[TMP2:%.]] = getelementptr inbounds [[STRUCT_HAM]], %struct.ham [[ARG]], i64 0, i32 0, i64 0
				; CHECK-NEXT: [[TMP3:%.]] = getelementptr inbounds [[STRUCT_HAM]], %struct.ham [[ARG]], i64 0, i32 1, i64 2
				; CHECK-NEXT: [[TMP4:%.]] = getelementptr inbounds [[STRUCT_HAM]], %struct.ham [[ARG]], i64 0, i32 1, i64 1
				; CHECK-NEXT: [[TMP5:%.]] = getelementptr inbounds [[STRUCT_HAM]], %struct.ham [[ARG]], i64 0, i32 1, i32 0
				; CHECK-NEXT: br i1 [[COND:%.]], label [[BB7:%.]], label [[BB8:%.*]]
				; CHECK: bb7:
				; CHECK-NEXT: br label [[BB9:%.*]]
				; CHECK: bb8:
				; CHECK-NEXT: br label [[BB9]]
				; CHECK: bb9:
				; CHECK-NEXT: store double 1.000000e+00, double* [[TMP2]], align 8
				; CHECK-NEXT: store double 2.000000e+00, double* [[TMP1]], align 8
				; CHECK-NEXT: store double 3.000000e+00, double* [[TMP]], align 8
				; CHECK-NEXT: store double 4.000000e+00, double* [[TMP5]], align 8
				; CHECK-NEXT: store double 5.000000e+00, double* [[TMP4]], align 8
				; CHECK-NEXT: store double 6.000000e+00, double* [[TMP3]], align 8
				; CHECK-NEXT: ret void
				;
				bb:
				%tmp = getelementptr inbounds %struct.ham, %struct.ham* %arg, i64 0, i32 0, i64 2
				%tmp1 = getelementptr inbounds %struct.ham, %struct.ham* %arg, i64 0, i32 0, i64 1
				%tmp2 = getelementptr inbounds %struct.ham, %struct.ham* %arg, i64 0, i32 0, i64 0
				%tmp3 = getelementptr inbounds %struct.ham, %struct.ham* %arg, i64 0,i32 1, i64 2
				%tmp4 = getelementptr inbounds %struct.ham, %struct.ham* %arg, i64 0, i32 1, i64 1
				%tmp5 = getelementptr inbounds %struct.ham, %struct.ham* %arg, i64 0, i32 1, i32 0
				%tmp6 = bitcast double* %tmp2 to i8*
				call void @llvm.memset.p0i8.i64(i8* nonnull align 8 dereferenceable(48) %tmp6, i8 0, i64 48, i1 false)
				br i1 %cond, label %bb7, label %bb8

				bb7: ; preds = %bb
				br label %bb9

				bb8: ; preds = %bb
				br label %bb9

				bb9: ; preds = %bb8, %bb7
				store double 1.0, double* %tmp2, align 8
				store double 2.0, double* %tmp1, align 8
				store double 3.0, double* %tmp, align 8
				store double 4.0, double* %tmp5, align 8
				store double 5.0, double* %tmp4, align 8
				store double 6.0, double* %tmp3, align 8
				ret void
				}

				define void @overlap2(%struct.ham* %arg, i1 %cond) {
				; CHECK-LABEL: @overlap2(
				; CHECK-NEXT: bb:
				; CHECK-NEXT: [[TMP:%.]] = getelementptr inbounds [[STRUCT_HAM:%.]], %struct.ham* [[ARG:%.*]], i64 0, i32 0, i64 2
				; CHECK-NEXT: [[TMP1:%.]] = getelementptr inbounds [[STRUCT_HAM]], %struct.ham [[ARG]], i64 0, i32 0, i64 1
				; CHECK-NEXT: [[TMP2:%.]] = getelementptr inbounds [[STRUCT_HAM]], %struct.ham [[ARG]], i64 0, i32 0, i64 0
				; CHECK-NEXT: [[TMP3:%.]] = getelementptr inbounds [[STRUCT_HAM]], %struct.ham [[ARG]], i64 0, i32 1, i64 2
				; CHECK-NEXT: [[TMP4:%.]] = getelementptr inbounds [[STRUCT_HAM]], %struct.ham [[ARG]], i64 0, i32 1, i64 1
				; CHECK-NEXT: [[TMP5:%.]] = getelementptr inbounds [[STRUCT_HAM]], %struct.ham [[ARG]], i64 0, i32 1, i32 0
				; CHECK-NEXT: [[TMP6:%.]] = bitcast double [[TMP2]] to i8*
				; CHECK-NEXT: call void @llvm.memset.p0i8.i64(i8* nonnull align 8 dereferenceable(48) [[TMP6]], i8 0, i64 48, i1 false)
				; CHECK-NEXT: br i1 [[COND:%.]], label [[BB7:%.]], label [[BB8:%.*]]
				; CHECK: bb7:
				; CHECK-NEXT: call void @may_throw()
				; CHECK-NEXT: br label [[BB9:%.*]]
				; CHECK: bb8:
				; CHECK-NEXT: br label [[BB9]]
				; CHECK: bb9:
				; CHECK-NEXT: store double 1.000000e+00, double* [[TMP2]], align 8
				; CHECK-NEXT: store double 2.000000e+00, double* [[TMP1]], align 8
				; CHECK-NEXT: store double 3.000000e+00, double* [[TMP]], align 8
				; CHECK-NEXT: store double 4.000000e+00, double* [[TMP5]], align 8
				; CHECK-NEXT: store double 5.000000e+00, double* [[TMP4]], align 8
				; CHECK-NEXT: store double 6.000000e+00, double* [[TMP3]], align 8
				; CHECK-NEXT: ret void
				;
				bb:
				%tmp = getelementptr inbounds %struct.ham, %struct.ham* %arg, i64 0, i32 0, i64 2
				%tmp1 = getelementptr inbounds %struct.ham, %struct.ham* %arg, i64 0, i32 0, i64 1
				%tmp2 = getelementptr inbounds %struct.ham, %struct.ham* %arg, i64 0, i32 0, i64 0
				%tmp3 = getelementptr inbounds %struct.ham, %struct.ham* %arg, i64 0,i32 1, i64 2
				%tmp4 = getelementptr inbounds %struct.ham, %struct.ham* %arg, i64 0, i32 1, i64 1
				%tmp5 = getelementptr inbounds %struct.ham, %struct.ham* %arg, i64 0, i32 1, i32 0
				%tmp6 = bitcast double* %tmp2 to i8*
				call void @llvm.memset.p0i8.i64(i8* nonnull align 8 dereferenceable(48) %tmp6, i8 0, i64 48, i1 false)
				br i1 %cond, label %bb7, label %bb8

				bb7: ; preds = %bb
				call void @may_throw()
				br label %bb9

				bb8: ; preds = %bb
				br label %bb9

				bb9: ; preds = %bb8, %bb7
				store double 1.0, double* %tmp2, align 8
				store double 2.0, double* %tmp1, align 8
				store double 3.0, double* %tmp, align 8
				store double 4.0, double* %tmp5, align 8
				store double 5.0, double* %tmp4, align 8
				store double 6.0, double* %tmp3, align 8
				ret void
				}

llvm/test/Transforms/DeadStoreElimination/MSSA/multiblock-simple.ll

	Show First 20 Lines • Show All 169 Lines • ▼ Show 20 Lines
	bb3:			bb3:
	ret void			ret void
	}			}


	define void @test11() {			define void @test11() {
	; CHECK-LABEL: @test11(			; CHECK-LABEL: @test11(
	; CHECK-NEXT: [[P:%.*]] = alloca i32			; CHECK-NEXT: [[P:%.*]] = alloca i32
	; CHECK-NEXT: store i32 0, i32* [[P]]
	; CHECK-NEXT: br i1 true, label [[BB1:%.]], label [[BB2:%.]]			; CHECK-NEXT: br i1 true, label [[BB1:%.]], label [[BB2:%.]]
	; CHECK: bb1:			; CHECK: bb1:
	; CHECK-NEXT: store i32 0, i32* [[P]]			; CHECK-NEXT: store i32 0, i32* [[P]]
	; CHECK-NEXT: br label [[BB3:%.*]]			; CHECK-NEXT: br label [[BB3:%.*]]
	; CHECK: bb2:			; CHECK: bb2:
	; CHECK-NEXT: ret void			; CHECK-NEXT: ret void
	; CHECK: bb3:			; CHECK: bb3:
	; CHECK-NEXT: ret void			; CHECK-NEXT: ret void
	;			;
	%P = alloca i32			%P = alloca i32
	store i32 0, i32* %P			store i32 0, i32* %P
	br i1 true, label %bb1, label %bb2			br i1 true, label %bb1, label %bb2
	bb1:			bb1:
	store i32 0, i32* %P			store i32 0, i32* %P
	br label %bb3			br label %bb3
	bb2:			bb2:
	ret void			ret void
	bb3:			bb3:
	ret void			ret void
	}			}


	define void @test12(i32* %P) {			define void @test12(i32* %P) {
	; CHECK-LABEL: @test12(			; CHECK-LABEL: @test12(
	; CHECK-NEXT: store i32 0, i32* [[P:%.*]]
	; CHECK-NEXT: br i1 true, label [[BB1:%.]], label [[BB2:%.]]			; CHECK-NEXT: br i1 true, label [[BB1:%.]], label [[BB2:%.]]
	; CHECK: bb1:			; CHECK: bb1:
	; CHECK-NEXT: store i32 1, i32* [[P]]			; CHECK-NEXT: store i32 1, i32* [[P:%.*]]
	; CHECK-NEXT: br label [[BB3:%.*]]			; CHECK-NEXT: br label [[BB3:%.*]]
	; CHECK: bb2:			; CHECK: bb2:
	; CHECK-NEXT: store i32 1, i32* [[P]]			; CHECK-NEXT: store i32 1, i32* [[P]]
	; CHECK-NEXT: ret void			; CHECK-NEXT: ret void
	; CHECK: bb3:			; CHECK: bb3:
	; CHECK-NEXT: ret void			; CHECK-NEXT: ret void
	;			;
	store i32 0, i32* %P			store i32 0, i32* %P
	br i1 true, label %bb1, label %bb2			br i1 true, label %bb1, label %bb2
	bb1:			bb1:
	store i32 1, i32* %P			store i32 1, i32* %P
	br label %bb3			br label %bb3
	bb2:			bb2:
	store i32 1, i32* %P			store i32 1, i32* %P
	ret void			ret void
	bb3:			bb3:
	ret void			ret void
	}			}


	define void @test13(i32* %P) {			define void @test13(i32* %P) {
	; CHECK-LABEL: @test13(			; CHECK-LABEL: @test13(
	; CHECK-NEXT: store i32 0, i32* [[P:%.*]]
	; CHECK-NEXT: br i1 true, label [[BB1:%.]], label [[BB2:%.]]			; CHECK-NEXT: br i1 true, label [[BB1:%.]], label [[BB2:%.]]
	; CHECK: bb1:			; CHECK: bb1:
	; CHECK-NEXT: store i32 1, i32* [[P]]			; CHECK-NEXT: store i32 1, i32* [[P:%.*]]
	; CHECK-NEXT: br label [[BB3:%.*]]			; CHECK-NEXT: br label [[BB3:%.*]]
	; CHECK: bb2:			; CHECK: bb2:
	; CHECK-NEXT: store i32 1, i32* [[P]]			; CHECK-NEXT: store i32 1, i32* [[P]]
	; CHECK-NEXT: br label [[BB3]]			; CHECK-NEXT: br label [[BB3]]
	; CHECK: bb3:			; CHECK: bb3:
	; CHECK-NEXT: ret void			; CHECK-NEXT: ret void
	;			;
	store i32 0, i32* %P			store i32 0, i32* %P
	Show All 10 Lines

llvm/test/Transforms/DeadStoreElimination/MSSA/simple.ll

	Show First 20 Lines • Show All 238 Lines • ▼ Show 20 Lines
	}			}

	; Check another case like PR13547 where strdup is not like malloc.			; Check another case like PR13547 where strdup is not like malloc.
	define i8* @test25(i8* %p) nounwind {			define i8* @test25(i8* %p) nounwind {
	; CHECK-LABEL: @test25(			; CHECK-LABEL: @test25(
	; CHECK-NEXT: [[P_4:%.]] = getelementptr i8, i8 [[P:%.*]], i64 4			; CHECK-NEXT: [[P_4:%.]] = getelementptr i8, i8 [[P:%.*]], i64 4
	; CHECK-NEXT: [[TMP:%.]] = load i8, i8 [[P_4]], align 1			; CHECK-NEXT: [[TMP:%.]] = load i8, i8 [[P_4]], align 1
	; CHECK-NEXT: store i8 0, i8* [[P_4]], align 1			; CHECK-NEXT: store i8 0, i8* [[P_4]], align 1
	; CHECK-NEXT: [[Q:%.]] = call i8 @strdup(i8* [[P]]) #6			; CHECK-NEXT: [[Q:%.]] = call i8 @strdup(i8* [[P]]) #7
	; CHECK-NEXT: store i8 [[TMP]], i8* [[P_4]], align 1			; CHECK-NEXT: store i8 [[TMP]], i8* [[P_4]], align 1
	; CHECK-NEXT: ret i8* [[Q]]			; CHECK-NEXT: ret i8* [[Q]]
	;			;
	%p.4 = getelementptr i8, i8* %p, i64 4			%p.4 = getelementptr i8, i8* %p, i64 4
	%tmp = load i8, i8* %p.4, align 1			%tmp = load i8, i8* %p.4, align 1
	store i8 0, i8* %p.4, align 1			store i8 0, i8* %p.4, align 1
	%q = call i8* @strdup(i8* %p) nounwind optsize			%q = call i8* @strdup(i8* %p) nounwind optsize
	store i8 %tmp, i8* %p.4, align 1			store i8 %tmp, i8* %p.4, align 1
	▲ Show 20 Lines • Show All 326 Lines • Show Last 20 Lines