Page MenuHomePhabricator

[DSE] Add first version of MemorySSA-backed DSE.
AbandonedPublic

Authored by fhahn on Jan 3 2020, 6:09 AM.

Details

Summary

This patch adds a first version of a MemorySSA based DSE. It is missing
a lot of features, which will get added as follow-ups, to help to keep
the review manageable.

For each MemoryDef, we walk the MemorySSA def-use chains to try to find
a overwriting MemoryDef. During the walk, we bail out if we hit either:

  • A MemoryPhi (restriction will be lifted later).
  • An aliasing MemoryAccess.
  • Multiple MemoryDefs (which indicate a split in control flow).

The patch uses a very simple approach to guarantee that no throwing
instructions are between 2 stores: We only allow accesses to stack
objects, access that are in the same basic block if the block does not
contain any throwing instructions or accesses in functions that do
not contain any throwing instructions. This will get lifted later.

Besides adding support for the missing cases, there is plenty of additonal
potential for improvements as follow-up work, e.g. the way we visit stores
(could be just a traversal of the MemorySSA, rather than collecting them
up-front), using the alias information discovered during walking to optimize
the MemorySSA.

This is based on D40480 by Dave Green.

Diff Detail

Event Timeline

fhahn created this revision.Jan 3 2020, 6:09 AM

Unit tests: unknown.

clang-tidy: unknown.

clang-format: unknown.

Build artifacts: diff.json, console-log.txt

Tyker added a subscriber: Tyker.Jan 3 2020, 8:29 AM
Tyker added a comment.Jan 3 2020, 2:14 PM

i haven't looked in details at your patch but i was working on something similar D72182.

fhahn added a comment.Jan 3 2020, 3:46 PM

i haven't looked in details at your patch but i was working on something similar D72182.

Interesting, thanks for sharing! I had a brief look at your patch and I think it would be great if we could combine our efforts. I tried to keep the first patches relatively straight forward for easy review and add support for more cases incrementally. It seems like your patch has propagation of non-escaping object info and checks for barriers across multiple blocks. Do you think it would make sense to use D72146 (this patch) as foundation and port the improvements from your patch? It would be great if you could have a look and let me know what you think!

There also are a few things to improve the walking in this patch: 1. deal with reaching the end of the walk, 2. handle cases with defs in multiple paths. Both should be quite straight forward to add (I have a patch for the first one nearly ready), but I think it would be great to do this in incremental steps as separate patches.

Tyker added a comment.Jan 4 2020, 2:39 AM

Do you think it would make sense to use D72146 (this patch) as foundation and port the improvements from your patch? It would be great if you could have a look and let me know what you think!

I took a look at this patch and D72148(the patch adding MemPhi traversal). they have many good ideas i haven't thought about.
but i have one major concern.
how to you plan to dealing with store being killed by multiple stores which together post-dominate the "dead" store ?
like the store in the entry block of the following:

define void @test5A(i32* %P) {
  store i32 1, i32* %P ; this is dead
  br i1 true, label %bb1, label %bb2
bb1:
  store i32 0, i32* %P
  br label %bb3
bb2:
  store i32 1, i32* %P
  br label %bb3
bb3:
  ret void
}

the design around getNextMemoryDef seems to me like it is too restrictive for the above example.

aside from that we went for different strategies for storing pass related data structures but i am not locked on my decision of having a DSEContext storing "everything".

Tyker added inline comments.Jan 4 2020, 4:10 AM
llvm/lib/Transforms/Scalar/DeadStoreElimination.cpp
1703

PointerMayBeCaptured(&I, false, true)
i think we should care about return capture here. if the pointer is returned the content is observable.

fhahn added a comment.Jan 4 2020, 6:19 AM

Do you think it would make sense to use D72146 (this patch) as foundation and port the improvements from your patch? It would be great if you could have a look and let me know what you think!

I took a look at this patch and D72148(the patch adding MemPhi traversal). they have many good ideas i haven't thought about.
but i have one major concern.
how to you plan to dealing with store being killed by multiple stores which together post-dominate the "dead" store ?
like the store in the entry block of the following:

define void @test5A(i32* %P) {
  store i32 1, i32* %P ; this is dead
  br i1 true, label %bb1, label %bb2
bb1:
  store i32 0, i32* %P
  br label %bb3
bb2:
  store i32 1, i32* %P
  br label %bb3
bb3:
  ret void
}

the design around getNextMemoryDef seems to me like it is too restrictive for the above example.

Yes as it is it does not cover that case. But I think it would be relatively straight forward to extend it to return a list of MemoryDefs encountered along multiple paths. Those can then be checked for legality separately. If some of the found Defs do not eliminate the original store, we can continue walking just those Defs. To support that, it probably also makes sense to re-write getNextMemoryDef to work iteratively, like in your patch.

IMO the most productive way forward would be to get in a patch with limited but solid support as a baseline and once that is in we can work on adding missing functionality in parallel. What do you think about this approach?

Besides extending the functionality, I think we will also have to spend a fair amount of work on making sure that preceding and succeeding passes in the pipeline preserve MemorySSA, to avoid computing it from scratch just for DSE.

aside from that we went for different strategies for storing pass related data structures but i am not locked on my decision of having a DSEContext storing "everything".

Tyker added a comment.Jan 4 2020, 2:54 PM

Do you think it would make sense to use D72146 (this patch) as foundation and port the improvements from your patch? It would be great if you could have a look and let me know what you think!

I took a look at this patch and D72148(the patch adding MemPhi traversal). they have many good ideas i haven't thought about.
but i have one major concern.
how to you plan to dealing with store being killed by multiple stores which together post-dominate the "dead" store ?
like the store in the entry block of the following:

define void @test5A(i32* %P) {
  store i32 1, i32* %P ; this is dead
  br i1 true, label %bb1, label %bb2
bb1:
  store i32 0, i32* %P
  br label %bb3
bb2:
  store i32 1, i32* %P
  br label %bb3
bb3:
  ret void
}

the design around getNextMemoryDef seems to me like it is too restrictive for the above example.

Yes as it is it does not cover that case. But I think it would be relatively straight forward to extend it to return a list of MemoryDefs encountered along multiple paths. Those can then be checked for legality separately. If some of the found Defs do not eliminate the original store, we can continue walking just those Defs. To support that, it probably also makes sense to re-write getNextMemoryDef to work iteratively, like in your patch.

Ok seems reasonable, so we will change the core loop for each store will have to change in a following patch.

IMO the most productive way forward would be to get in a patch with limited but solid support as a baseline and once that is in we can work on adding missing functionality in parallel. What do you think about this approach?

you are right. i probably went a bit to far for a first patch.

Besides extending the functionality, I think we will also have to spend a fair amount of work on making sure that preceding and succeeding passes in the pipeline preserve MemorySSA, to avoid computing it from scratch just for DSE.

yeah. and we may have to do the same thing for the post dominator tree.
is there a way to have a pass preserve an analysis if it is still valid without depending on it but not updating it if it is not valid ?

llvm/lib/Transforms/Scalar/DeadStoreElimination.cpp
1624

this condition seems to relaxed to me. i think we can skip barrier detection for stores to non-escaping pointers only if there are killed by an end of function, not when they are killed by a full overwrite.

if the store is killed by an overwriting store, an exception thrown between the 2 stores could be caught in the same function after the overwriting store. the store we are trying to kill would not be overwritten and it will be observable for the rest of the function.

fhahn updated this revision to Diff 236258.Jan 5 2020, 12:09 PM
fhahn marked 2 inline comments as done.

Thank you very much for taking a look! I think I've addressed the comments. I also added a bunch of additional tests and comments.

[snip]

Yes as it is it does not cover that case. But I think it would be relatively straight forward to extend it to return a list of MemoryDefs encountered along multiple paths. Those can then be checked for legality separately. If some of the found Defs do not eliminate the original store, we can continue walking just those Defs. To support that, it probably also makes sense to re-write getNextMemoryDef to work iteratively, like in your patch.

Ok seems reasonable, so we will change the core loop for each store will have to change in a following patch.

+1

IMO the most productive way forward would be to get in a patch with limited but solid support as a baseline and once that is in we can work on adding missing functionality in parallel. What do you think about this approach?

you are right. i probably went a bit to far for a first patch.

Besides extending the functionality, I think we will also have to spend a fair amount of work on making sure that preceding and succeeding passes in the pipeline preserve MemorySSA, to avoid computing it from scratch just for DSE.

yeah. and we may have to do the same thing for the post dominator tree.
is there a way to have a pass preserve an analysis if it is still valid without depending on it but not updating it if it is not valid ?

Yes, PostDominators is the second thing we need to get better at preserving. IIRC all places where we already use DomTreeUpdater we basically preserve PDT as well without any additional work, besides passing in the PDT to the updater. There's getAnalysisIfAvailable (legacy pass manager)/getCachedResult (new PM) to get an analysis, only if it is already computed. The only caveat is that the legacy pass manager cannot preserve function analysis if it is followed by a function pass (IIUC).

fhahn added inline comments.Jan 5 2020, 12:14 PM
llvm/lib/Transforms/Scalar/DeadStoreElimination.cpp
1624

Do you have a particular test in mind? I could not come up with a test that fails, but I've added one that *should* check for the issue.

If we have reads of a stack object in a catch block, those would be visible in the MemorySSA and we would not find a suitable MemoryDef, right? Also, catchpad/landingpad instructions are marked as mayWriteToMemory, so they are currently MemoryDefs that alias everything (due to no having any info about the aliasing location).

1703

As long as we do not use it for eliminating stores due to reaching the end of the walk, it should be fine to ignore return captures. I think all we care about here is that they are not known to the caller, so it is safe to eliminate stores even if we encounter a function that may throw.

After thinking a bit more, I think we do not need to check for capturing here at all. Function calls are modeled in MemorySSA and we currently treat them as aliasing everything. So if a pointer escapes to such a function, we would catch that through the MemorySSA.

I've renamed NonEscpaingStackObjects to make it a bit clearer and also added more explanation in comments.

Unit tests: pass. 61297 tests passed, 0 failed and 736 were skipped.

clang-tidy: pass.

clang-format: pass.

Build artifacts: diff.json, clang-tidy.txt, clang-format.patch, CMakeCache.txt, console-log.txt, test-results.xml

Tyker added a comment.EditedJan 5 2020, 2:19 PM

Yes, PostDominators is the second thing we need to get better at preserving. IIRC all places where we already use DomTreeUpdater we basically preserve PDT as well without any additional work, besides passing in the PDT to the updater. There's getAnalysisIfAvailable (legacy pass manager)/getCachedResult (new PM) to get an analysis, only if it is already computed. The only caveat is that the legacy pass manager cannot preserve function analysis if it is followed by a function pass (IIUC).

thank you for the explaination.

llvm/lib/Transforms/Scalar/DeadStoreElimination.cpp
1624

my bad. after further checking this is correct

1703

ok. in the with the current uses not checking the return seems correct.

but not checking at all for capture doesn't seem correct.
capturing function calls can be anywhere in the function not necessarily between the 2 stores so they may not be reached by a MemorySSA traversal. and the throw may be attributed such that it doesn't appear in MemorySSA.
removing store i8 2, i8* %call, align 1 is not legal in the following example.

define void @f() {
  %call = call i8* @malloc(i32 1)
  call void @may_capture(i8* %call)
  store i8 2, i8* %call, align 1
  call void @may_throw()
  store i8 3, i8* %call, align 1
  ret void
}

declare i8* @malloc(i32)
declare void @may_capture(i8*)
declare void @may_throw() readnone
fhahn marked an inline comment as done.Jan 5 2020, 2:41 PM
fhahn added inline comments.
llvm/lib/Transforms/Scalar/DeadStoreElimination.cpp
1703

Ah right, I think it basically boils down to whether readnone functions can throw or not. I think from the definition in the langref, for C/C++, they cannot throw exceptions (or modify any state visible to the caller).

But you are right, after another reading I realized that it states that there may be other mechanisms that may throw without modifying visible state in other languages. I'll bring back the check. I wonder if there's a way to allow skipping the checks for C/C++.

Tyker added inline comments.Jan 6 2020, 10:31 AM
llvm/lib/Transforms/Scalar/DeadStoreElimination.cpp
1703

If all throwing instruction where in MemorySSA. We wouldn't need any checking of throwing instruction outside of the MemorySSA traversal this would simplify and most likely speedup the pass.

I think that changing MemorySSA such that throwing instruction are in MemorySSA even if they don't access memory is probably the best solution. but we need the opinion of a maintainer of MemorySSA for this.
An other solution is just to check for throwing instruction not present in MemorySSA before proceeding with the function and if there is any fallback to what I did in my patch and check it separately.

fhahn updated this revision to Diff 236425.Jan 6 2020, 11:01 AM

Add capture check again.

fhahn marked an inline comment as done.Jan 6 2020, 11:03 AM
fhahn added inline comments.
llvm/lib/Transforms/Scalar/DeadStoreElimination.cpp
1703

I think for now it's probably better to avoid adding them to MemorySSA. I've re-added the capture check.

Unit tests: pass. 61305 tests passed, 0 failed and 736 were skipped.

clang-tidy: pass.

clang-format: pass.

Build artifacts: diff.json, clang-tidy.txt, clang-format.patch, CMakeCache.txt, console-log.txt, test-results.xml

fhahn updated this revision to Diff 236638.Jan 7 2020, 11:15 AM

Only consider throwing instructions not modeled in the MemorySSA for ThrowingBlocks.

Unit tests: fail. 61351 tests passed, 1 failed and 736 were skipped.

failed: LLVM.Transforms/DeadStoreElimination/MSSA/combined-partial-overwrites.ll

clang-tidy: pass.

clang-format: pass.

Build artifacts: diff.json, clang-tidy.txt, clang-format.patch, CMakeCache.txt, console-log.txt, test-results.xml

fhahn updated this revision to Diff 237700.Jan 13 2020, 8:55 AM

Updated to use AA::getModRefInfo and AA::callCapturesBefore similar to DependenceAnalysis. Remove unnecessary getLocForReadEx.

Unit tests: unknown.

clang-tidy: unknown.

clang-format: pass.

Build artifacts: diff.json, clang-format.patch, CMakeCache.txt, console-log.txt

fhahn added a comment.Jan 13 2020, 4:12 PM

Ping.

I think the latest version should be in quite good shape for review. I've linked the follow-on patches which together give a nice improvement in dead stores eliminated. I'll post a patch for partial overwrite tracking tomorrow and I am also working on exploring multiple memory defs.

Some comments inline.

Re-adding the inline comment: downwards walks are inefficient. This also currently misses optimization opportunities. In order to not miss anything, all Defs need to be looked at, not only uses, adding more overhead. So it's possible replacing MemDepAnalysis with MemorySSA may not get many benefits.
Maybe do some preliminary performance testing to check trade-offs between this recursive approach, the iterative approach in Tyker's patch, or consider a bottom-up approach?

llvm/lib/Transforms/Scalar/DeadStoreElimination.cpp
1372

s/optimistaion/optimisation
s/Unfortately/Unfortunately

1383

s/acts a/acts as a

1393

Could you use more informative names here? Bad is vague, and there's a Next in the WalkResult below which will make the code using these confusing.
Perhaps StopWalk and ContinueWalk?

1401

Nit: Potential reduction of code duplication between this (its callsites) and hasAnalyzableMemoryWrite.

1442

This is not necessary.

1465

s/intrisnic/intrinsic

1476

Debuginfo no longer has associated MemoryDefs.
Even with a non-standard AA pipeline, MemorySSA will not create accesses for them.

1486

s/indiciates/indicates

1500

The list of uses will not always include all clobbers, even transitively.
For example, before MemorySSA optimizations, the Defs form a chain:

1 = Def(LoE)
2 = Def(1)
3 = Def(2)

Let's say that 1 and 3 are two stores where we could eliminate either one of them. The Def at 2 does not alias any of them.
Let's say some previous pass that uses MemorySSA queries the clobbering access for 2, and finds no alias, so it will optimize it to 2=Def(LoE).
Now 1 has no uses and this goes the "bail out" path.

This should only be the case for missed optimizations, not a correctness problem.
However, as a general comment, with MemorySSA, it's efficient to do walks upwards looking for clobbering accesses not downward.

1675

s/MemroySSA/MemorySSA

fhahn added a comment.Jan 14 2020, 6:31 AM

Some comments inline.

Re-adding the inline comment: downwards walks are inefficient. This also currently misses optimization opportunities. In order to not miss anything, all Defs need to be looked at, not only uses, adding more overhead. So it's possible replacing MemDepAnalysis with MemorySSA may not get many benefits.
Maybe do some preliminary performance testing to check trade-offs between this recursive approach, the iterative approach in Tyker's patch, or consider a bottom-up approach?

Thanks for taking a look! I've experimented with a bottom up walk, which tries to find MemoryDefs that may be killed by a given MemoryDef: D72700. Roughly, the approach uses getClobberingAccess to find dominating MemoryDefs that alias the original location and may be killed by the starting def. In order to kill a def we found, we also have to make sure that there are no clobbering reads between the 2 defs. I think for that, we have to look at the users of the dominating def and check if any of them are read clobbers. I *think* we still end up with needing to check fewer uses. Does the bottom-up approach make sense to you?

I still have to think a bit more about a few points below (I would suggest adding them as follow-up patches anyways), but it would be great to hear your thoughts about using the bottom-up approach over top-down, before addressing the remaining comments here.

  1. deal with PHIs both when walking up and also when checking the users.
  2. how to support eliminating stores like
   store i32 0, i32* %Ptr.  ; DEAD
   br i1 %cond, label %a, label %b
a:
  store i32 2, i32* %Ptr
   br label %c
b: 
  store i32 2, i32* %Ptr
  br label %c

Thank you for the other patch! I added a few comments on the bottom up approach. I think overall it's a better high-level approach.

llvm/lib/Transforms/Scalar/DeadStoreElimination.cpp
1604

s/arn't MemoryDef's/aren't MemoryDefs
s/MemoryDef that/MemoryDefs that

1722

There's a contradiction here. Either NextAccess is always a MemoryDef, in which case you can make it a MemoryDef inside the struct and remove the cast, or it can be a MemoryPhi, in which case ND is null.

fhahn abandoned this revision.Thu, Jan 30, 5:05 PM

I've updated the patch to do a bottom up walk to find candidates that can be eliminated in D72700. I'll update the linked patches to implement additional functionality on top of the bottom-up approach.