This is an archive of the discontinued LLVM Phabricator instance.

Paths

Table of Contentst

-
llvm/trunk/
-
trunk/
-
lib/Transforms/Scalar/
-
Transforms/
-
Scalar/
-
DeadStoreElimination.cpp
-
test/Transforms/DeadStoreElimination/
-
Transforms/
-
DeadStoreElimination/
-
cross_block_dse.ll
-
cross_block_dse_loop.ll

Differential D11143

[RFC] Cross Block DSE
ClosedPublic

Authored by karthikthecool on Jul 13 2015, 6:55 AM.

Download Raw Diff

Details

Reviewers

hfinkel

Commits

rGddc2a86a000c: Add support for cross block dse. This patch enables dead stroe elimination…
rL245025: Add support for cross block dse.

Summary

Hi All,
I would like to propose a RFC on Cross Block Dead Store Elimination.

Cross Block Dead Store elimination implemented in this patch works as follows-

When a Store with non local memory dependency we call "handleNonLocalStoreDeletion".
In handleNonLocalStoreDeletion we traverse the predecessor blocks of the store instruction till we encounter post domination frontier and try to find a candidate store for deletion.
getStoreCandidates finds out if we have a candidate store for deletion which in any of the predecessor block.
If we find a candidate store we then see that the store is not used by any instruction in any of the path from the current block till block were the overwriting store is present.
We call DeleteDeadInstruction for the candidate store instruction if it is safe to remove the store.

No Regressions were found in make check-all and LLVM LNT test suites.

Please let me know your valuable option on the patch.

Thanks and Regards
Karthik Bhat

Diff Detail

Repository: rL LLVM

Event Timeline

karthikthecool updated this revision to Diff 29560.Jul 13 2015, 6:55 AM

karthikthecool retitled this revision from to [RFC] Cross Block DSE.

karthikthecool updated this object.

karthikthecool added a reviewer: hfinkel.

karthikthecool added a subscriber: llvm-commits.

Attaching LLVM LNT Test results.(15 Samples)
Execution time of 2 benchmarks improved post Cross block DSE patch. Surprisingly i didn't see any compile time regression.

eastig added a subscriber: eastig.Jul 13 2015, 7:51 AM

eastig added inline comments.Jul 13 2015, 7:57 AM

lib/Transforms/Scalar/DeadStoreElimination.cpp
655 ↗	(On Diff #29560)	'nullptr' is returned when BB is not well formed. Maybe it's better to 'assert' here?

Karthik,

I don't know what was a reason to deprecate the DominanceFrontier class (see comments in DominanceFrontier.h). Your PostDominanceFrontier might be deprecated for the same reason.

Hi Evgeny,
Thanks for looking into the patch. I think the reason dom frontier is deprecared is that it is costly to calculate the same.. im not sure as well though.
Im currently using postdom frontier to get the basic block boundry till which a store might overwrite a memory location making it dead. Dead store elimation without having postdom seems to be a bit tough especially given the overhead it would require without postdom.
Thanks
Karthik

First, If you want to calculate post-dom-frontier, you should use the
linear-time IDF calculator and make it work for postdominance
frontiers, and just calculate the pieces you want.

Second, this looks like a *really* expensive approach to doing this.

A. Is there a reason you aren't just using a hash table storing each
store you see in the top-down walk, so that you don't have to go
looking for stores to eliminate?
In fact, you can do better than even this:
Walk the function once, finding stores.
Store the stores in a linked list ordered by DFS in/out numbers of the
postdom tree.

For your given block, get the DFS in/out of that node in the postdom tree.
The stores you may be able to eliminate are those in the list starting
at DFS in and ending at DFS out.

Calculating the post-dom frontier just to figure out where to stop
looking seems remarkably expensive as well

(you could also just use memoryssa, once i finish my managery tasks
that have kept me busy the past month or two, and get back to llvm
work)

B. If you want complete store elimination, this is normally formulated
as a PRE-like problem on the reverse flowgraph.

Hi Daniel,
Thanks for the input. Yes this approach is definitely very expensive. I was surprised when i didn't see any compile time regressions on LLVM LNT.
Your approach of using postdom and dfs in/out seems very interesting I will try out the same and update shortly.
Thanks for your valuable inputs. Will get back with the updated patch shortly.

Thanks and Regards
Karthik Bhat

Hi Daniel,
Please find the updated patch as per your inputs. I have removed the dependency from PDT Frontier and using the DFS in/out on PDT to find the candidate store.

There are no regressions in make check-all and LLVM LNT.

I also compiled llvm with modified/unmodified clang binary to measure compile time overhead. Not much compile time overhead is observed. The rough figures are as shown below-

With Change-

real   35m42.705s
user  136m54.137s
sys    3m19.238s

Without Change-

real    35m38.747s
user   136m58.076s
sys     3m19.637s

In LLVM LNT we observe improvement in 2 benchmarks mentioned in the previous posts.

Please let me know your valuable inputs on the same.

Thanks and Regards
Karthik Bhat

gentle ping..

hfinkel added inline comments.Jul 25 2015, 6:22 PM

lib/Transforms/Scalar/DeadStoreElimination.cpp
61 ↗	(On Diff #29877)	Use a DenseSet (I assume you're not depending on any ordering from std::map because it is storing pointers).
71 ↗	(On Diff #29877)	give -> given
74 ↗	(On Diff #29877)	You can use a range-based for loop here. for (const auto &I : BB) { ... }
85 ↗	(On Diff #29877)	Range-based for.
946 ↗	(On Diff #29877)	Use SmallVector<BasicBlock *, 16> (or some number you feel is appropriate).
947 ↗	(On Diff #29877)	Use DenseSet, or SmallPtrSet if the size will normally be small.
954 ↗	(On Diff #29877)	You don't need to do a cast here, you can compare a StoreInst* to an Instruction*, and they'll be equal only if they point to the same object (one is a base class of the other).
991 ↗	(On Diff #29877)	When you use a SmallVector instead, you can use pop_back_val().
1018 ↗	(On Diff #29877)	Add space before 'Conservatively'
1032 ↗	(On Diff #29877)	Maybe you should store these in a SetVector instead to make this more efficient.
1040 ↗	(On Diff #29877)	finding out candidate stroes -> finding candidate stores
1054 ↗	(On Diff #29877)	Range-based for?

Hi Hal,
Thanks for the review. Please find the updated patch addressing review comments.
Please let me know your opinion on the same.
Thanks and Regards
Karthik Bhat

Hi All,
Update the patch to fix a failure in LLVM-LNT testcase on 32bit ubuntu. Unfortunetly i was not able to catch it on my 64bit machine.
This failure occured as we incorrectly deleted a store that was followed by a store that partially aliased it.
When seraching if a candidate store is safe we conservately mark a store as unsafe if we found a store that may/partially alias the candidate store.
This fixes failure found in LLVM LNT on 32bit ubuntu.
We still observe some 30%+ execution time improvement in MultiSource/Benchmarks/Trimaran/enc-pc1/enc-pc1.

Thanks and Regards
Karthik Bhat

Gentle ping..

Danny, are you okay with the algorithm at this point?

lib/Transforms/Scalar/DeadStoreElimination.cpp
1001 ↗	(On Diff #30807)	If I'm reading this correctly, this call to AA->alias is going to essentially re-do all of the same analysis done in the AA->getModRefInfo call above. Could we check whether I is a StoreInst first and only to one query on store instructions instead of two?

Hi Hal,
Thanks for looking into the patch. Please find my comments inline.
Regards
Karthik Bhat

lib/Transforms/Scalar/DeadStoreElimination.cpp
1001 ↗	(On Diff #30807)	Hi Hal, The first check ensures that we see all instructions that may modify/ref a pointer (this is the pointer the store to which we are planning to delete). In case we find an instruction that refers to the pointer e.g. a load we then return false indicating that the store cannot be removed. If the instruction is a store we can safely remove the 1st store if and only if both the store and the candidate MustAlias. To detect this i had to check the alias again. Thanks and Regards Karthik

hfinkel added inline comments.Aug 4 2015, 8:34 AM

lib/Transforms/Scalar/DeadStoreElimination.cpp
1001 ↗	(On Diff #30807)	I understand that the interface here is less than ideal, but could you not skip the getModRefInfo call entirely if `I` is a StoreInst, and just use the result from AA->alias instead? In other words, hoist the dyn_cast<StoreInst>-based check above the ModRef logic?

LGTM with these comments

lib/Transforms/Scalar/DeadStoreElimination.cpp
101 ↗	(On Diff #30807)	Do you have a testcase for where PDT->getRootNode() is null? because that seems broken :)
108 ↗	(On Diff #30807)	So, if we only have 1 block, why are you doing the rest of the work here, instead of returning false and exiting? :)
515 ↗	(On Diff #30807)	Much like the one above, i don't understand why this PDT->getRootNode() check is here. The PDT should always have a root node, fake or not. If you've got cases it doesn't, it seems like we should fix PDT
1001 ↗	(On Diff #30807)	As Karthik says, i'm not sure there is a way around this. You can get better time bounds with PRE-like algorithms, but the underlying work here has to be done. The getModRefInfo is like finding your data dependences. For stores they may not be mustalias. If they are mod/ref, they block removal because they may read or write the memory from the first store before the second store does the same thing, so you can't remove the first store. IE given Store of 1 Load of 1 Store of 1 You can't remove the first store. (You could actually see if it's legal to sink everything below the second store to enable you to delete the first, but that is effectively global code motion. not hard, just a different algorithm with it's own issues. You could also see if it's possible to delete the second store instead of the first, but this is hard in the simple cross-block DSE we are doing here) Same with Store of 1 may-store of 1 Load of 1 Store of 1 Now, interestingly, given: Store of 1 may-store of 1 Store of 1 You can still eliminate the first store. In fact, i'm struggling to come up with a case where Mod without ref matters if you have a later mustalias store. It's early, but i think it's safe to eliminate store over store with only a may-mod store in the way. Karthik, do you have a counter-example? If not, i'd expect the check on line 960 above to take this into account (IE be okay with nomodref and mod, but not ref or modref) But as you can see, even if there is nothing "in the way" between the first and second store, you still have to check whether you mustalias the second store to know that you can remove the first one. SSA based store-PRE algorithms basically do this "not on the fly". They build memory use chains so they don't have to walk blocks (sound familiar? :P). They still have to do the mod-ref checks and mustalias checks, however. They are just structured in a way that it is not N^2 to figure out placement.

Sigh, my review got drafted before, but submitted after you wrote
this, so it looks like a reply to this part.

You are right that you could hoist this check, IFF alias returns
must/mayalias in cases where getmodrefinfo returns mod.

Is this really guaranteed to be the case?

They are different properties, though modrefinfo is often implemented
with help from alias. In our world of synchronization instructions,
isn't it possible for ordering to cause something to not be aliased
with a store (which is about the abstract memory locations) but still
be modified by it (due to it being a release or whatever) ?

Or do we consider this aliasing in the AA interface to try to avoid
user confusion? :)

I haven't looked deeply enough to know if we truly have consistency
here. I know getModRefInfo performs checks on ordering, etc, before it
calls alias(), and i assumed those checks were actually necessary.
But, you know, not a lot of unit tests here :)

wengxt added a subscriber: wengxt.Aug 6 2015, 1:02 PM

jingyue added a subscriber: jingyue.Aug 6 2015, 1:03 PM

Hi Daniel,Hal,
Thanks for the review. Please find my comments inline. I will try to upload the updated patch shortly.
Thanks and Regards
Karthik

lib/Transforms/Scalar/DeadStoreElimination.cpp
101 ↗	(On Diff #30807)	Hi Daniel, I found in the below case we were getting getRootNode as NULL so added the above check. void foo() { while(1); }
108 ↗	(On Diff #30807)	Updated code to executed FindFunctionBackedges and BackEdgesMap population only if Count > 1. I think we cannot return false here because we still need to do single block dse.

Hi All,
Rebased patch and implemented few review comments.
Please let me know your inputs on the same.
Thanks and Regards
Karthik Bhat

I think we should just fix the PDT in a separate patch, and remove this check.

The documentation for getRootNode says:
" /// getRootNode - This returns the entry node for the CFG of the
function. If

/// this tree represents the post-dominance relations for a

function, however,

/// this root may be a node with the block == NULL.  This is the case when
/// there are multiple exit nodes from a particular function.  Consumers of
/// post-dominance information must be capable of dealing with this
/// possibility."

Note: This may be a node with *block* == NULL, not that the root node
may be null.

The problem here is in PDT building.

GenericDomTreeConstruction does this:
if (TraitsTy::child_begin(I) == TraitsTy::child_end(I))

addRoot(I);

This is wrong, obviously, since in the case of an infinite loop like
you've shown, it will have a self-loop, but it is a root because there
is no exit.

This whole mechanism of trying to detect the roots is broken, in fact.
It's essentially trying to determine reachability without a DFS, which
is not possible in general.

The *only* correct way (i'm aware of) to determine which things need
to be children of artificial exit/a root is to DFS from all clear exit
blocks, and then, for anything still unvisited, DFS from each of them,
marking them as children of the artificial exit
(you could also reparent anything you reach that was marked as a
child of artificial exit , if we want to prefer anything over the
artificial exit as a parent. But this should not be necessary for
correct answers).

Please file a bug with your testcase and cc me, and i'l update the bug
with details.

(I'll see if i can find time to prepare a patch)

Hi Daniel,
Thanks for the comments.. Filed a bug as
https://llvm.org/bugs/show_bug.cgi?id=24415 .
I too will try to have look at this issue seperatly.. Any other comments on this patch Daniel?
Thanks for your valuable time.
Regards
Karthik

Hi Karthik,
Daniel pointed me to your patch, because I'm also working on something in DeadStoreElimination: http://reviews.llvm.org/D11854

I think your patch is not directly related to what I'm doing. Nevertheless I looked at your patch and posted some comments.

Erik

lib/Transforms/Scalar/DeadStoreElimination.cpp
949 ↗	(On Diff #31649)	As I understand, this loop is just to get the iterator for I. You can get it simply by: BasicBlock::iterator BBI(SI);
959 ↗	(On Diff #31649)	I think using PtrSize here is wrong. It should be the size of the memory access. This can give you a wrong NoModRef, e.g. if I is a load of a global which has a smaller size than the pointer size. Maybe you should just get the MemoryLocation of SI instead of Pointer+PtrSize and use AA->getModRefInfo(I, Location).
1011 ↗	(On Diff #31649)	Why is a path with a backedge not safe?

Hi Erik,
Thanks for looking into the patch. Please find my comments inline. Will post the updated patch shortly.
Thanks and Regards
Karthik Bhat

lib/Transforms/Scalar/DeadStoreElimination.cpp

949 ↗

(On Diff #31649)

Updated the code.

959 ↗

(On Diff #31649)

Thanks updated the code.

1011 ↗

(On Diff #31649)

Hi Erik,
I came across the following code from MultiSource/Benchmarks/MiBench/consumer-typeset benchmark because of which i conservately added this check. The below .ll snippet is from file z36.c TrieRead function-

for.cond:                                         ; preds = %for.inc, %if.else.618
  %indvars.iv1644 = phi i64 [ %indvars.iv.next1645, %for.inc ], [ 0, %if.else.618 ]
  %j.0 = phi i32 [ %j.1, %for.inc ], [ 1, %if.else.618 ]
  %prev.0 = phi i32 [ %prev.1, %for.inc ], [ 56, %if.else.618 ]
  %arrayidx626 = getelementptr inbounds [512 x i8], [512 x i8]* %str, i64 0, i64 %indvars.iv1644
  %42 = load i8, i8* %arrayidx626, align 1, !tbaa !18
  switch i8 %42, label %if.else.636 [
    i8 0, label %for.end
    i8 45, label %for.inc
  ]

if.else.636:                                      ; preds = %for.cond
  %idxprom639 = sext i32 %j.0 to i64
  %arrayidx640 = getelementptr inbounds [512 x i8], [512 x i8]* %key, i64 0, i64 %idxprom639
  store i8 %42, i8* %arrayidx640, align 1, !tbaa !18
  %conv641 = trunc i32 %prev.0 to i8
  %inc642 = add nsw i32 %j.0, 1
  %arrayidx644 = getelementptr inbounds [512 x i8], [512 x i8]* %value, i64 0, i64 %idxprom639
  store i8 %conv641, i8* %arrayidx644, align 1, !tbaa !18
  br label %for.inc

for.inc:                                          ; preds = %for.cond, %if.else.636
  %j.1 = phi i32 [ %inc642, %if.else.636 ], [ %j.0, %for.cond ]
  %prev.1 = phi i32 [ 56, %if.else.636 ], [ 57, %for.cond ]
  %indvars.iv.next1645 = add nuw nsw i64 %indvars.iv1644, 1
  br label %for.cond

for.end:                                          ; preds = %for.cond
  %prev.0.lcssa = phi i32 [ %prev.0, %for.cond ]
  %j.0.lcssa = phi i32 [ %j.0, %for.cond ]
  %idxprom647 = sext i32 %j.0.lcssa to i64
  %arrayidx648 = getelementptr inbounds [512 x i8], [512 x i8]* %key, i64 0, i64 %idxprom647
  store i8 46, i8* %arrayidx648, align 1, !tbaa !18
  %conv649 = trunc i32 %prev.0.lcssa to i8
  %inc650 = add nsw i32 %j.0.lcssa, 1
  %arrayidx652 = getelementptr inbounds [512 x i8], [512 x i8]* %value, i64 0, i64 %idxprom647
  store i8 %conv649, i8* %arrayidx652, align 1, !tbaa !18
  %idxprom653 = sext i32 %inc650 to i64
  %arrayidx654 = getelementptr inbounds [512 x i8], [512 x i8]* %key, i64 0, i64 %idxprom653
  store i8 0, i8* %arrayidx654, align 1, !tbaa !18
  %arrayidx657 = getelementptr inbounds [512 x i8], [512 x i8]* %value, i64 0, i64 %idxprom653
  store i8 56, i8* %arrayidx657, align 1, !tbaa !18
  %add658 = add nsw i32 %j.0.lcssa, 2
  %idxprom659 = sext i32 %add658 to i64
  %arrayidx660 = getelementptr inbounds [512 x i8], [512 x i8]* %value, i64 0, i64 %idxprom659
  store i8 0, i8* %arrayidx660, align 1, !tbaa !18
  %call666 = call fastcc i32 @TrieInsert(i8* %5, i8* %6, %struct.trie_rec* %11, i8* %arraydecay, i32 %inc)
  %tobool667 = icmp eq i32 %call666, 0
  br i1 %tobool667, label %if.then.668, label %while.cond.224.backedge

Here block for.end post dominates if.else.636 and for.inc and alias analysis returns MustAlias for the pair-

store i8 %42, i8* %arrayidx640, align 1, !tbaa !18
and
store i8 46, i8* %arrayidx648, align 1, !tbaa !18

and

store i8 %conv649, i8* %arrayidx652, align 1, !tbaa !18
and
store i8 %conv641, i8* %arrayidx644, align 1, !tbaa !18

When we do a DFS from block if.else.636 to see if it is safe to delete the store we take a backedge and reach for.end were we find a MustAlias Store and hence delete the store in the block if.else.636. But this is incorrect as the store in if.else.636 is inside a loop and stores to various memory location based on the iterator which is updated in for.cond but the store in for.end: stores to a single location after the loop is executed.
I came across this issue during testing hence conservately added a check not to take backedges during DFS.
I'm not sure if there is any other way to fix this. Please let me know if you have any inputs on this. It would be of great help.
Thanks

Updated patch as per Erik's comments.

hfinkel added inline comments.Aug 11 2015, 12:23 AM

lib/Transforms/Scalar/DeadStoreElimination.cpp
111 ↗	(On Diff #31771)	Range-based for
946 ↗	(On Diff #31771)	Line too long.
959 ↗	(On Diff #31771)	Line too long.
990 ↗	(On Diff #31771)	Line too long.
1017 ↗	(On Diff #31771)	Remove this function and instead call: DeadStores.count(SI)
1032 ↗	(On Diff #31771)	Line is too long.
1049 ↗	(On Diff #31771)	Line too long.

In D11143#217647, @dberlin wrote:

Sigh, my review got drafted before, but submitted after you wrote
this, so it looks like a reply to this part.

You are right that you could hoist this check, IFF alias returns
must/mayalias in cases where getmodrefinfo returns mod.

Is this really guaranteed to be the case?

I think that it should be. I'm not the best with the history here, but there's nothing in the comments on the interface that suggests that there should be a difference w.r.t. ordering/volatile of the instructions, and other clients, MemoryDependenceAnalysis for example, layer on their own checks for ordering/volatile.

They are different properties, though modrefinfo is often implemented
with help from alias. In our world of synchronization instructions,
isn't it possible for ordering to cause something to not be aliased
with a store (which is about the abstract memory locations) but still
be modified by it (due to it being a release or whatever) ?

There are certainly ordering constraints to preserve, but I don't believe that AA is the right place to enforce them. Clients can certainly bail on atomics, and should if they might disturb the ordering and can't do better, but optimizations on atomics are still possible and we don't want AA clients that can handle some optimizations on some atomics to need to work around the AA interface because it forces a conservative behavior.

So this is the wrong thread to fix this problem ;) -- but I don't think adding any new dependence on this behavior is a good thing.

Or do we consider this aliasing in the AA interface to try to avoid
user confusion? :)

I haven't looked deeply enough to know if we truly have consistency
here. I know getModRefInfo performs checks on ordering, etc, before it
calls alias(), and i assumed those checks were actually necessary.
But, you know, not a lot of unit tests here :)

Indeed I know :)

Run clang-format on updated patch and implement Hal's review comments.
I agree with Hal that we may hoist dyn_cast<StoreInst> above ModRef check. In case of store we can rely on Alias Analysis's alias to check if the stores mustalias. For other cases we can use getModRefInfo to make make sure that this instruction which is not a store does not Mod/Ref the pointer. Updated and tested the code after modification.
Please let me know your inputs on the same.

Thanks once again to all reviewers for spending their valuable time on this patch.
Regards
Karthik Bhat

Sorry guys for multiple updates on patch..:( forgot to move dyn_cast<StoreInst> above getModRefInfo in one of the place in the previous patch.
Regards
Karthik Bhat

eeckstein added inline comments.Aug 11 2015, 11:29 AM

lib/Transforms/Scalar/DeadStoreElimination.cpp
958 ↗	(On Diff #31788)	This is still not correct. You are passing the size of the pointer and not of the referenced memory location.
1010 ↗	(On Diff #31788)	Thanks, I see the problem.

jevinskie added a subscriber: jevinskie.Aug 11 2015, 2:44 PM

Updated patch as per Erik's comments.
Thanks and Regards
Karthik Bhat

Rebase patch and resolve confilts.
LNT Statistics:
Number of Cross Block Stores eliminated in LLVM-LNT: 849
Number of Benchmarks improved: 2

MultiSource/Benchmarks/Trimaran/enc-pc1/enc-pc1 : 38.86%
MultiSource/Benchmarks/mafft/pairlocalalign : 5.36%

No Regressions observed.

Hi Hal,Erik,Daniel,
Please let me know if you have any additional comments or if i can go ahead with committing this patch.
Thanks and Regards
Karthik Bhat

LGTM.

lib/Transforms/Scalar/DeadStoreElimination.cpp
1002 ↗	(On Diff #31900)	Missing space after period.

This revision is now accepted and ready to land.Aug 13 2015, 10:13 AM

Closed by commit rL245025: Add support for cross block dse. (authored by karthik). · Explain WhyAug 13 2015, 9:18 PM

This revision was automatically updated to reflect the committed changes.

Revision Contents

Path

Size

llvm/

trunk/

lib/

Transforms/

Scalar/

DeadStoreElimination.cpp

275 lines

test/

Transforms/

DeadStoreElimination/

cross_block_dse.ll

76 lines

cross_block_dse_loop.ll

104 lines

Diff 32127

llvm/trunk/lib/Transforms/Scalar/DeadStoreElimination.cpp

Show All 10 Lines
// basic-block local redundant stores.		// basic-block local redundant stores.
//		//
// FIXME: This should eventually be extended to be a post-dominator tree		// FIXME: This should eventually be extended to be a post-dominator tree
// traversal. Doing so would be pretty trivial.		// traversal. Doing so would be pretty trivial.
//		//
//===----------------------------------------------------------------------===//		//===----------------------------------------------------------------------===//

#include "llvm/Transforms/Scalar.h"		#include "llvm/Transforms/Scalar.h"
		#include "llvm/ADT/DenseSet.h"
#include "llvm/ADT/STLExtras.h"		#include "llvm/ADT/STLExtras.h"
#include "llvm/ADT/SetVector.h"		#include "llvm/ADT/SetVector.h"
#include "llvm/ADT/Statistic.h"		#include "llvm/ADT/Statistic.h"
#include "llvm/Analysis/AliasAnalysis.h"		#include "llvm/Analysis/AliasAnalysis.h"
#include "llvm/Analysis/CaptureTracking.h"		#include "llvm/Analysis/CaptureTracking.h"
		#include "llvm/Analysis/CFG.h"
#include "llvm/Analysis/MemoryBuiltins.h"		#include "llvm/Analysis/MemoryBuiltins.h"
#include "llvm/Analysis/MemoryDependenceAnalysis.h"		#include "llvm/Analysis/MemoryDependenceAnalysis.h"
		#include "llvm/Analysis/PostDominators.h"
#include "llvm/Analysis/TargetLibraryInfo.h"		#include "llvm/Analysis/TargetLibraryInfo.h"
#include "llvm/Analysis/ValueTracking.h"		#include "llvm/Analysis/ValueTracking.h"
#include "llvm/IR/Constants.h"		#include "llvm/IR/Constants.h"
#include "llvm/IR/DataLayout.h"		#include "llvm/IR/DataLayout.h"
#include "llvm/IR/Dominators.h"		#include "llvm/IR/Dominators.h"
#include "llvm/IR/Function.h"		#include "llvm/IR/Function.h"
#include "llvm/IR/GlobalVariable.h"		#include "llvm/IR/GlobalVariable.h"
#include "llvm/IR/Instructions.h"		#include "llvm/IR/Instructions.h"
#include "llvm/IR/IntrinsicInst.h"		#include "llvm/IR/IntrinsicInst.h"
#include "llvm/Pass.h"		#include "llvm/Pass.h"
#include "llvm/Support/Debug.h"		#include "llvm/Support/Debug.h"
#include "llvm/Support/raw_ostream.h"		#include "llvm/Support/raw_ostream.h"
#include "llvm/Transforms/Utils/Local.h"		#include "llvm/Transforms/Utils/Local.h"
using namespace llvm;		using namespace llvm;

#define DEBUG_TYPE "dse"		#define DEBUG_TYPE "dse"

STATISTIC(NumRedundantStores, "Number of redundant stores deleted");		STATISTIC(NumRedundantStores, "Number of redundant stores deleted");
STATISTIC(NumFastStores, "Number of stores deleted");		STATISTIC(NumFastStores, "Number of stores deleted");
		STATISTIC(NumCrossBlockStores, "Number of cross block stores deleted");
STATISTIC(NumFastOther , "Number of other instrs removed");		STATISTIC(NumFastOther , "Number of other instrs removed");

namespace {		namespace {
struct DSE : public FunctionPass {		struct DSE : public FunctionPass {
AliasAnalysis *AA;		AliasAnalysis *AA;
MemoryDependenceAnalysis *MD;		MemoryDependenceAnalysis *MD;
DominatorTree *DT;		DominatorTree *DT;
		PostDominatorTree *PDT;
const TargetLibraryInfo *TLI;		const TargetLibraryInfo *TLI;
		SmallVector<SmallVector<StoreInst *, 8>, 16> Candidates;
		SetVector<StoreInst *> DeadStores;
		SmallVector<std::pair<const BasicBlock , const BasicBlock >, 32>
		BackEdges;
		DenseSet<std::pair<const BasicBlock , const BasicBlock >> BackEdgesMap;
static char ID; // Pass identification, replacement for typeid		static char ID; // Pass identification, replacement for typeid
DSE() : FunctionPass(ID), AA(nullptr), MD(nullptr), DT(nullptr) {		DSE()
		: FunctionPass(ID), AA(nullptr), MD(nullptr), DT(nullptr),
		PDT(nullptr) {
initializeDSEPass(*PassRegistry::getPassRegistry());		initializeDSEPass(*PassRegistry::getPassRegistry());
}		}
		// Return all stores in a given BasicBlock.
		SmallVector<StoreInst , 8> getStores(BasicBlock BB) {
		SmallVector<StoreInst *, 8> VecStores;
		for (auto &BI : *BB) {
		if (StoreInst *SI = dyn_cast<StoreInst>(&BI))
		VecStores.push_back(SI);
		}
		return VecStores;
		}

		// Get dfs in/out on the PDT and populate Candidates store list which
		// is used to find potential dead stores for a given block
		void populateCandidateStores(Function &F) {
		for (auto &I : F) {
		DomTreeNode *DTNode = PDT->getNode(&I);
		if (!DTNode)
		continue;
		int DFSIn = DTNode->getDFSNumIn();
		SmallVector<StoreInst *, 8> VecStores = getStores(&I);
		Candidates[DFSIn] = VecStores;
		}
		}

bool runOnFunction(Function &F) override {		bool runOnFunction(Function &F) override {
if (skipOptnoneFunction(F))		if (skipOptnoneFunction(F))
return false;		return false;

AA = &getAnalysis<AliasAnalysis>();		AA = &getAnalysis<AliasAnalysis>();
MD = &getAnalysis<MemoryDependenceAnalysis>();		MD = &getAnalysis<MemoryDependenceAnalysis>();
DT = &getAnalysis<DominatorTreeWrapperPass>().getDomTree();		DT = &getAnalysis<DominatorTreeWrapperPass>().getDomTree();
TLI = &getAnalysis<TargetLibraryInfoWrapperPass>().getTLI();		TLI = &getAnalysis<TargetLibraryInfoWrapperPass>().getTLI();
		PDT = &getAnalysis<PostDominatorTree>();
		if (PDT->getRootNode()) {
		int Count = PDT->getRootNode()->getDFSNumOut();
		SmallVector<StoreInst *, 8> VecStores;
		Candidates.resize(Count + 1);
		Candidates.assign(Count + 1, VecStores);

		// If we have more than 1 block try to populate candidate store.
		if (Count > 1) {
		populateCandidateStores(F);
		FindFunctionBackedges(F, BackEdges);
		for (auto I : BackEdges)
		BackEdgesMap.insert(I);
		}
		}
bool Changed = false;		bool Changed = false;
for (Function::iterator I = F.begin(), E = F.end(); I != E; ++I)		for (Function::iterator I = F.begin(), E = F.end(); I != E; ++I)
// Only check non-dead blocks. Dead blocks may have strange pointer		// Only check non-dead blocks. Dead blocks may have strange pointer
// cycles that will confuse alias analysis.		// cycles that will confuse alias analysis.
if (DT->isReachableFromEntry(I))		if (DT->isReachableFromEntry(I))
Changed \|= runOnBasicBlock(*I);		Changed \|= runOnBasicBlock(*I);

AA = nullptr; MD = nullptr; DT = nullptr;		AA = nullptr; MD = nullptr; DT = nullptr;
return Changed;		return Changed;
}		}

bool runOnBasicBlock(BasicBlock &BB);		bool runOnBasicBlock(BasicBlock &BB);
bool MemoryIsNotModifiedBetween(LoadInst LI, StoreInst SI);		bool MemoryIsNotModifiedBetween(LoadInst LI, StoreInst SI);
bool HandleFree(CallInst *F);		bool HandleFree(CallInst *F);
bool handleEndBlock(BasicBlock &BB);		bool handleEndBlock(BasicBlock &BB);
void RemoveAccessedObjects(const MemoryLocation &LoadedLoc,		void RemoveAccessedObjects(const MemoryLocation &LoadedLoc,
SmallSetVector<Value *, 16> &DeadStackObjects,		SmallSetVector<Value *, 16> &DeadStackObjects,
const DataLayout &DL);		const DataLayout &DL);
		void handleNonLocalStoreDeletion(StoreInst *SI);
		bool isSafeCandidateForDeletion(BasicBlock SrcBlock, BasicBlock SinkBlock,
		StoreInst *SI);
		void DeleteDeadInstruction(Instruction *I, MemoryDependenceAnalysis &MD,
		const TargetLibraryInfo &TLI,
		SmallSetVector<Value , 16> ValueSet = nullptr);
void getAnalysisUsage(AnalysisUsage &AU) const override {		void getAnalysisUsage(AnalysisUsage &AU) const override {
AU.setPreservesCFG();		AU.setPreservesCFG();
AU.addRequired<DominatorTreeWrapperPass>();		AU.addRequired<DominatorTreeWrapperPass>();
AU.addRequired<AliasAnalysis>();		AU.addRequired<AliasAnalysis>();
AU.addRequired<MemoryDependenceAnalysis>();		AU.addRequired<MemoryDependenceAnalysis>();
		AU.addRequired<PostDominatorTree>();
AU.addRequired<TargetLibraryInfoWrapperPass>();		AU.addRequired<TargetLibraryInfoWrapperPass>();
AU.addPreserved<AliasAnalysis>();		AU.addPreserved<AliasAnalysis>();
AU.addPreserved<DominatorTreeWrapperPass>();		AU.addPreserved<DominatorTreeWrapperPass>();
AU.addPreserved<MemoryDependenceAnalysis>();		AU.addPreserved<MemoryDependenceAnalysis>();
		AU.addPreserved<PostDominatorTree>();
}		}
};		};
}		}

char DSE::ID = 0;		char DSE::ID = 0;
INITIALIZE_PASS_BEGIN(DSE, "dse", "Dead Store Elimination", false, false)		INITIALIZE_PASS_BEGIN(DSE, "dse", "Dead Store Elimination", false, false)
INITIALIZE_AG_DEPENDENCY(AliasAnalysis)		INITIALIZE_AG_DEPENDENCY(AliasAnalysis)
INITIALIZE_PASS_DEPENDENCY(DominatorTreeWrapperPass)		INITIALIZE_PASS_DEPENDENCY(DominatorTreeWrapperPass)
INITIALIZE_PASS_DEPENDENCY(MemoryDependenceAnalysis)		INITIALIZE_PASS_DEPENDENCY(MemoryDependenceAnalysis)
		INITIALIZE_PASS_DEPENDENCY(PostDominatorTree)
INITIALIZE_PASS_DEPENDENCY(TargetLibraryInfoWrapperPass)		INITIALIZE_PASS_DEPENDENCY(TargetLibraryInfoWrapperPass)
INITIALIZE_PASS_END(DSE, "dse", "Dead Store Elimination", false, false)		INITIALIZE_PASS_END(DSE, "dse", "Dead Store Elimination", false, false)

FunctionPass *llvm::createDeadStoreEliminationPass() { return new DSE(); }		FunctionPass *llvm::createDeadStoreEliminationPass() { return new DSE(); }

//===----------------------------------------------------------------------===//		//===----------------------------------------------------------------------===//
// Helper functions		// Helper functions
//===----------------------------------------------------------------------===//		//===----------------------------------------------------------------------===//

/// DeleteDeadInstruction - Delete this instruction. Before we do, go through
/// and zero out all the operands of this instruction. If any of them become
/// dead, delete them and the computation tree that feeds them.
///
/// If ValueSet is non-null, remove any deleted instructions from it as well.
///
static void DeleteDeadInstruction(Instruction *I,
MemoryDependenceAnalysis &MD,
const TargetLibraryInfo &TLI,
SmallSetVector<Value, 16> ValueSet = nullptr) {
SmallVector<Instruction*, 32> NowDeadInsts;

NowDeadInsts.push_back(I);
--NumFastOther;

// Before we touch this instruction, remove it from memdep!
do {
Instruction *DeadInst = NowDeadInsts.pop_back_val();
++NumFastOther;

// This instruction is dead, zap it, in stages. Start by removing it from
// MemDep, which needs to know the operands and needs it to be in the
// function.
MD.removeInstruction(DeadInst);

for (unsigned op = 0, e = DeadInst->getNumOperands(); op != e; ++op) {
Value *Op = DeadInst->getOperand(op);
DeadInst->setOperand(op, nullptr);

// If this operand just became dead, add it to the NowDeadInsts list.
if (!Op->use_empty()) continue;

if (Instruction *OpI = dyn_cast<Instruction>(Op))
if (isInstructionTriviallyDead(OpI, &TLI))
NowDeadInsts.push_back(OpI);
}

DeadInst->eraseFromParent();

if (ValueSet) ValueSet->remove(DeadInst);
} while (!NowDeadInsts.empty());
}


/// hasMemoryWrite - Does this instruction write some memory? This only returns		/// hasMemoryWrite - Does this instruction write some memory? This only returns
/// true for things that we can analyze with other helpers below.		/// true for things that we can analyze with other helpers below.
static bool hasMemoryWrite(Instruction *I, const TargetLibraryInfo &TLI) {		static bool hasMemoryWrite(Instruction *I, const TargetLibraryInfo &TLI) {
if (isa<StoreInst>(I))		if (isa<StoreInst>(I))
return true;		return true;
if (IntrinsicInst *II = dyn_cast<IntrinsicInst>(I)) {		if (IntrinsicInst *II = dyn_cast<IntrinsicInst>(I)) {
switch (II->getIntrinsicID()) {		switch (II->getIntrinsicID()) {
default:		default:
▲ Show 20 Lines • Show All 356 Lines • ▼ Show 20 Lines	if (StoreInst *SI = dyn_cast<StoreInst>(Inst)) {
MadeChange = true;		MadeChange = true;
continue;		continue;
}		}
}		}
}		}

MemDepResult InstDep = MD->getDependency(Inst);		MemDepResult InstDep = MD->getDependency(Inst);

// Ignore any store where we can't find a local dependence.		if (!InstDep.isDef() && !InstDep.isClobber() && !InstDep.isNonLocal())
// FIXME: cross-block DSE would be fun. :)		continue;
if (!InstDep.isDef() && !InstDep.isClobber())		if (InstDep.isNonLocal()) {
		if (!PDT->getRootNode())
		continue;
		if (StoreInst *SI = dyn_cast<StoreInst>(Inst))
		handleNonLocalStoreDeletion(SI);
continue;		continue;
		}

// Figure out what location is being stored to.		// Figure out what location is being stored to.
MemoryLocation Loc = getLocForWrite(Inst, *AA);		MemoryLocation Loc = getLocForWrite(Inst, *AA);

// If we didn't get a useful location, fail.		// If we didn't get a useful location, fail.
if (!Loc.Ptr)		if (!Loc.Ptr)
continue;		continue;

▲ Show 20 Lines • Show All 157 Lines • ▼ Show 20 Lines	for (pred_iterator I = pred_begin(BB), E = pred_end(BB); I != E; ++I) {
if (PredTI->getNumSuccessors() != 1)		if (PredTI->getNumSuccessors() != 1)
continue;		continue;

if (DT->isReachableFromEntry(Pred))		if (DT->isReachableFromEntry(Pred))
Blocks.push_back(Pred);		Blocks.push_back(Pred);
}		}
}		}

		/// DeleteDeadInstruction - Delete this instruction. Before we do, go through
		/// and zero out all the operands of this instruction. If any of them become
		/// dead, delete them and the computation tree that feeds them.
		/// If ValueSet is non-null, remove any deleted instructions from it as well.
		void DSE::DeleteDeadInstruction(Instruction *I, MemoryDependenceAnalysis &MD,
		const TargetLibraryInfo &TLI,
		SmallSetVector<Value , 16> ValueSet) {
		SmallVector<Instruction *, 32> NowDeadInsts;

		NowDeadInsts.push_back(I);
		--NumFastOther;

		// Before we touch this instruction, remove it from memdep!
		do {
		Instruction *DeadInst = NowDeadInsts.pop_back_val();
		++NumFastOther;
		if (StoreInst *SI = dyn_cast<StoreInst>(DeadInst))
		DeadStores.insert(SI);

		// This instruction is dead, zap it, in stages. Start by removing it from
		// MemDep, which needs to know the operands and needs it to be in the
		// function.
		MD.removeInstruction(DeadInst);

		for (unsigned op = 0, e = DeadInst->getNumOperands(); op != e; ++op) {
		Value *Op = DeadInst->getOperand(op);
		DeadInst->setOperand(op, nullptr);

		// If this operand just became dead, add it to the NowDeadInsts list.
		if (!Op->use_empty())
		continue;

		if (Instruction *OpI = dyn_cast<Instruction>(Op))
		if (isInstructionTriviallyDead(OpI, &TLI))
		NowDeadInsts.push_back(OpI);
		}

		DeadInst->eraseFromParent();

		if (ValueSet)
		ValueSet->remove(DeadInst);
		} while (!NowDeadInsts.empty());
		}

/// HandleFree - Handle frees of entire structures whose dependency is a store		/// HandleFree - Handle frees of entire structures whose dependency is a store
/// to a field of that structure.		/// to a field of that structure.
bool DSE::HandleFree(CallInst *F) {		bool DSE::HandleFree(CallInst *F) {
bool MadeChange = false;		bool MadeChange = false;

MemoryLocation Loc = MemoryLocation(F->getOperand(0));		MemoryLocation Loc = MemoryLocation(F->getOperand(0));
SmallVector<BasicBlock *, 16> Blocks;		SmallVector<BasicBlock *, 16> Blocks;
Blocks.push_back(F->getParent());		Blocks.push_back(F->getParent());
▲ Show 20 Lines • Show All 211 Lines • ▼ Show 20 Lines	void DSE::RemoveAccessedObjects(const MemoryLocation &LoadedLoc,

// Remove objects that could alias LoadedLoc.		// Remove objects that could alias LoadedLoc.
DeadStackObjects.remove_if([&](Value *I) {		DeadStackObjects.remove_if([&](Value *I) {
// See if the loaded location could alias the stack location.		// See if the loaded location could alias the stack location.
MemoryLocation StackLoc(I, getPointerSize(I, DL, *TLI));		MemoryLocation StackLoc(I, getPointerSize(I, DL, *TLI));
return !AA->isNoAlias(StackLoc, LoadedLoc);		return !AA->isNoAlias(StackLoc, LoadedLoc);
});		});
}		}

		/// isSafeCandidateForDeletion- Check all paths from the SrcBlock till
		/// SinkBlock to see if Store 'SI' is safe to be remove.
		/// Returns true if the candidate store SI is safe to delete
		/// else returns false.
		bool DSE::isSafeCandidateForDeletion(BasicBlock *SrcBlock,
		BasicBlock SinkBlock, StoreInst SI) {
		SmallVector<BasicBlock *, 16> WorkList;
		SmallPtrSet<BasicBlock *, 8> Visited;
		BasicBlock::iterator BBI(SI);

		// Check from the store till end of block and make sure we have no references
		// to memory stored by this Store Instruction.
		for (auto BI = ++BBI, BE = SrcBlock->end(); BI != BE; ++BI) {
		Instruction *I = BI;
		StoreInst *CSI = dyn_cast<StoreInst>(I);
		if (CSI) {
		AliasResult R =
		AA->alias(MemoryLocation::get(SI), MemoryLocation::get(CSI));
		if (R == MustAlias)
		return true;
		} else {
		ModRefInfo Res = AA->getModRefInfo(I, MemoryLocation::get(SI));
		if (Res != MRI_NoModRef)
		return false;
		}
		}

		// Add successors of the block to stack and start DFS.
		for (succ_iterator I = succ_begin(SrcBlock), E = succ_end(SrcBlock); I != E;
		++I) {
		if (!Visited.insert(*I).second)
		continue;
		// A path with backedge may not be safe. Conservatively mark
		// this store unsafe.
		if (BackEdgesMap.count(std::make_pair(SrcBlock, *I)))
		return false;
		WorkList.push_back(*I);
		}

		while (!WorkList.empty()) {
		BasicBlock *B = WorkList.pop_back_val();
		auto BI = B->begin();
		auto BE = B->end();
		for (; BI != BE; ++BI) {
		Instruction *I = BI;
		StoreInst *CSI = dyn_cast<StoreInst>(I);
		if (CSI) {
		AliasResult R =
		AA->alias(MemoryLocation::get(SI), MemoryLocation::get(CSI));
		if (R == MustAlias)
		break;
		} else {
		ModRefInfo Res = AA->getModRefInfo(I, MemoryLocation::get(SI));
		if (Res != MRI_NoModRef)
		return false;
		}
		}

		// If we reached the sink node or we found a block which has a stores that
		// overwrites the candidate block we need not look at their successors.
		if (B == SinkBlock \|\| BI != BE)
		continue;

		for (succ_iterator I = succ_begin(B), E = succ_end(B); I != E; ++I) {
		if (!Visited.insert(*I).second)
		continue;
		// A path with backedge may not be safe.Conservatively mark
		// this store unsafe.
		if (BackEdgesMap.count(std::make_pair(B, *I)))
		return false;
		WorkList.push_back(*I);
		}
		}

		return true;
		}

		/// handleNonLocalStoreDeletion - Handle non local dead store elimination.
		/// This works by finding candidate stores using PDT and then running DFS
		/// from candidate store block checking all paths to make sure the store is
		/// safe to delete.
		void DSE::handleNonLocalStoreDeletion(StoreInst *SI) {
		BasicBlock *BB = SI->getParent();
		Value *Pointer = SI->getPointerOperand();
		DomTreeNode *DTNode = PDT->getNode(BB);
		if (!DTNode)
		return;

		int DFSNumIn = DTNode->getDFSNumIn();
		int DFSNumOut = DTNode->getDFSNumOut();
		for (int i = DFSNumIn + 1; i < DFSNumOut; ++i) {
		for (auto &I : Candidates[i]) {
		StoreInst *CandidateSI = I;
		if (DeadStores.count(CandidateSI))
		continue;
		Value *MemPtr = CandidateSI->getPointerOperand();
		if (!MemPtr)
		continue;
		if (Pointer->getType() != MemPtr->getType())
		continue;
		AliasResult R =
		AA->alias(MemoryLocation::get(SI), MemoryLocation::get(CandidateSI));
		if (R != MustAlias)
		continue;
		if (isSafeCandidateForDeletion(CandidateSI->getParent(), BB,
		CandidateSI)) {
		DeleteDeadInstruction(CandidateSI, MD, TLI);
		++NumCrossBlockStores;
		}
		}
		}
		}

llvm/trunk/test/Transforms/DeadStoreElimination/cross_block_dse.ll

				; RUN: opt < %s -basicaa -dse -S \| FileCheck %s
				target datalayout = "e-m:e-i64:64-f80:128-n8:16:32:64-S128"

				@x = common global i32 0
				@y = common global i32 0

				define void @test_01(i32 %N) {
				%1 = alloca i32
				store i32 %N, i32* %1
				store i32 10, i32* @x
				%2 = load i32, i32* %1
				%3 = icmp ne i32 %2, 0
				br i1 %3, label %4, label %5

				; <label>:4
				store i32 5, i32* @x
				br label %5

				; <label>:5
				store i32 15, i32* @x
				ret void
				}
				; CHECK-LABEL: @test_01(
				; CHECK-NOT: store i32 10, i32* @x
				; CHECK-NOT: store i32 5, i32* @x
				; CHECK: store i32 15, i32* @x


				define void @test_02(i32 %N) {
				%1 = alloca i32
				store i32 %N, i32* %1
				store i32 10, i32* @x
				%2 = load i32, i32* %1
				%3 = icmp ne i32 %2, 0
				br i1 %3, label %4, label %5

				; <label>:4
				store i32 5, i32* @x
				br label %7

				; <label>:5
				%6 = load i32, i32* @x
				store i32 %6, i32* @y
				br label %7

				; <label>:7
				store i32 15, i32* @x
				ret void
				}
				; CHECK-LABEL: @test_02(
				; CHECK: store i32 10, i32* @x
				; CHECK-NOT: store i32 5, i32* @x
				; CHECK: store i32 %6, i32* @y


				define void @test_03(i32 %N) #0 {
				%1 = alloca i32
				store i32 %N, i32* %1
				store i32 10, i32* @x
				%2 = load i32, i32* %1
				%3 = icmp ne i32 %2, 0
				br i1 %3, label %4, label %6

				; <label>:4 ; preds = %0
				%5 = load i32, i32* @x
				store i32 %5, i32* @y
				br label %6

				; <label>:6 ; preds = %4, %0
				store i32 15, i32* @x
				ret void
				}
				; CHECK-LABEL: @test_03(
				; CHECK: store i32 10, i32* @x
				; CHECK: store i32 %5, i32* @y
				; CHECK: store i32 15, i32* @x

llvm/trunk/test/Transforms/DeadStoreElimination/cross_block_dse_loop.ll

				; RUN: opt < %s -basicaa -dse -S \| FileCheck %s
				target datalayout = "e-m:e-i64:64-f80:128-n8:16:32:64-S128"

				@A = common global [100 x i32] zeroinitializer, align 16
				@x = common global i32 0

				; Negative Test case-
				;void foo(int N) {
				; A[0] = N;
				; for(int i=0;i<N;++i)
				; A[i]+=i;
				; A[0] = 10;
				;}
				;; Stores should not be optimized away.

				define void @test_01(i32 %N) #0 {
				%1 = alloca i32
				%i = alloca i32
				store i32 %N, i32* %1
				%2 = load i32, i32* %1
				store i32 %2, i32* getelementptr inbounds ([100 x i32], [100 x i32]* @A, i32 0, i64 0)
				store i32 0, i32* %i
				br label %3

				; <label>:3 ; preds = %14, %0
				%4 = load i32, i32* %i
				%5 = load i32, i32* %1
				%6 = icmp slt i32 %4, %5
				br i1 %6, label %7, label %17

				; <label>:7 ; preds = %3
				%8 = load i32, i32* %i
				%9 = load i32, i32* %i
				%10 = sext i32 %9 to i64
				%11 = getelementptr inbounds [100 x i32], [100 x i32]* @A, i32 0, i64 %10
				%12 = load i32, i32* %11
				%13 = add nsw i32 %12, %8
				store i32 %13, i32* %11
				br label %14

				; <label>:14 ; preds = %7
				%15 = load i32, i32* %i
				%16 = add nsw i32 %15, 1
				store i32 %16, i32* %i
				br label %3

				; <label>:17 ; preds = %3
				store i32 10, i32* getelementptr inbounds ([100 x i32], [100 x i32]* @A, i32 0, i64 0)
				ret void
				}
				; CHECK-LABEL: @test_01(
				; CHECK: store i32 %2, i32* getelementptr inbounds ([100 x i32], [100 x i32]* @A, i32 0, i64 0)
				; CHECK: store i32 %13, i32* %11
				; CHECK: store i32 10, i32* getelementptr inbounds ([100 x i32], [100 x i32]* @A, i32 0, i64 0)


				; Postive Test case-
				;void foo(int N) {
				; A[0] = N;
				; for(int i=0;i<N;++i)
				; A[i]=i;
				; A[0] = 10;
				;}
				;; Stores should not be optimized away.
				define void @test_02(i32 %N) #0 {
				%1 = alloca i32
				%i = alloca i32
				store i32 %N, i32* %1
				%2 = load i32, i32* %1
				store i32 %2, i32* getelementptr inbounds ([100 x i32], [100 x i32]* @A, i32 0, i64 0)
				store i32 0, i32* %i
				br label %3

				; <label>:3 ; preds = %12, %0
				%4 = load i32, i32* %i
				%5 = load i32, i32* %1
				%6 = icmp slt i32 %4, %5
				br i1 %6, label %7, label %15

				; <label>:7 ; preds = %3
				%8 = load i32, i32* %i
				%9 = load i32, i32* %i
				%10 = sext i32 %9 to i64
				%11 = getelementptr inbounds [100 x i32], [100 x i32]* @A, i32 0, i64 %10
				store i32 %8, i32* %11
				br label %12

				; <label>:12 ; preds = %7
				%13 = load i32, i32* %i
				%14 = add nsw i32 %13, 1
				store i32 %14, i32* %i
				br label %3

				; <label>:15 ; preds = %3
				store i32 10, i32* getelementptr inbounds ([100 x i32], [100 x i32]* @A, i32 0, i64 0)
				ret void
				}

				; CHECK-LABEL: @test_02(
				; CHECK-NOT: store i32 %2, i32* getelementptr inbounds ([100 x i32], [100 x i32]* @A, i32 0, i64 0)
				; CHECK: store i32 %7, i32* %10
				; CHECK: store i32 10, i32* getelementptr inbounds ([100 x i32], [100 x i32]* @A, i32 0, i64 0)

This is an archive of the discontinued LLVM Phabricator instance.

[RFC] Cross Block DSEClosedPublic

Details

Diff Detail

Event Timeline

Revision Contents

Diff 32127

llvm/trunk/lib/Transforms/Scalar/DeadStoreElimination.cpp

llvm/trunk/test/Transforms/DeadStoreElimination/cross_block_dse.ll

llvm/trunk/test/Transforms/DeadStoreElimination/cross_block_dse_loop.ll

[RFC] Cross Block DSE
ClosedPublic