This is an archive of the discontinued LLVM Phabricator instance.

Paths

Table of Contentst

-
llvm/
-
lib/Transforms/Scalar/
-
Transforms/
-
Scalar/
3/11
DeadStoreElimination.cpp
-
test/
-
CodeGen/AMDGPU/
-
AMDGPU/
-
opt-pipeline.ll
-
Other/
-
opt-O2-pipeline.ll
-
opt-O3-pipeline-enable-matrix.ll
-
opt-O3-pipeline.ll
-
opt-Os-pipeline.ll
-
Transforms/DeadStoreElimination/
-
DeadStoreElimination/
-
multiblock-loops.ll

Differential D100464

[DSE] Remove stores in the same loop iteration
ClosedPublic

Authored by dmgreen on Apr 14 2021, 4:15 AM.

Download Raw Diff

Details

Reviewers

fhahn
asbirlea
nikic
hfinkel
clin1

Summary

DSE will currently only remove stores in the same block unless they can be guaranteed to be loop invariant. This expands that to any stores that are in the same Loop, at the same loop level. I believe this should still account for where AA/MSSA will not handle aliasing between loops, but allow the dead stores to be removed where they overlap in the same loop iteration. It requires adding loop info to DSE, but that looks fairly harmless.

The test case this helps is from code like this, which can come up in certain matrix operations:

  for(i=..)
    dst[i] = 0;
    for(j=..)
	  dst[i] += src[i*n+j];

After LICM, this becomes:

for(i=..)
  dst[i] = 0;
  sum = 0;
  for(j=..)
    sum += src[i*n+j];
  dst[i] = sum;

The first store is dead, but is not currently removed.

Diff Detail

Event Timeline

dmgreen created this revision.Apr 14 2021, 4:15 AM

Herald added subscribers: kerbowa, george.burgess.iv, hiraditya and 4 others. · View Herald TranscriptApr 14 2021, 4:15 AM

dmgreen requested review of this revision.Apr 14 2021, 4:15 AM

Herald added a project: Restricted Project. · View Herald TranscriptApr 14 2021, 4:15 AM

Harbormaster completed remote builds in B98662: Diff 337391.Apr 14 2021, 4:57 AM

xbolva00 added a subscriber: xbolva00.Apr 14 2021, 5:10 AM

Thanks for the patch! Given that this only shifts the invocation of LI, there shouldn't be any problems in terms of compile time.

I'm not sure how well the loop cases are covered in multiblock-loops.ll, but I think it might be worth to add a few more interesting cases that we can and can't optimize, especially with stores to different indices.

llvm/lib/Transforms/Scalar/DeadStoreElimination.cpp
1308	Can `LI` just be part of DSEState?
2048	Should also preserve `LoopAnalysis`?

nikic added inline comments.Apr 14 2021, 12:44 PM

llvm/lib/Transforms/Scalar/DeadStoreElimination.cpp
1424–1425	I'm not sure this logic is correct. The problem I see is that LoopInfo only tracks natural loops. This means that even though Current and KillingDef might be in the same natural loop (say the nullptr loop), Current might still be part of a (non-natural) cycle, in which case KillingDef may not kill it. I'm not particularly familiar with LoopInfo, but that's my understanding of how it works.

fhahn added inline comments.Apr 15 2021, 2:09 AM

llvm/lib/Transforms/Scalar/DeadStoreElimination.cpp
1424–1425	I'm not sure this logic is correct. The problem I see is that LoopInfo only tracks natural loops. This means that even though Current and KillingDef might be in the same natural loop (say the nullptr loop), Current might still be part of a (non-natural) cycle, in which case KillingDef may not kill it. I'm not particularly familiar with LoopInfo, but that's my understanding of how it works. Yes I think that's correct. There might be cycles that LoopInfo does not detect. So we should not rely on comparing the loops on its own. If there is no loop for a given block, they still might be in a cycle that has not been detected by LoopInfo due to irreducible control flow; if there's a loop for a given block, it could be in a cycle that's completely contained in the loop we found. If LI finds a loop for a block, I think it is guaranteed that the loop is only entered through the header though. I think this should allow us to treat the `phis` in the header as being in the same iteration or invariant in any undetected cycles in the loop.

Add a ContainsIrreducibleLoops flag, Move LoopInfo and add some irreducible tests.

Harbormaster completed remote builds in B99101: Diff 338014.Apr 16 2021, 12:54 AM

Thanks for the patch! Given that this only shifts the invocation of LI, there shouldn't be any problems in terms of compile time.

It should iterate the same amount too, so I'm hoping it's OK. (But I have no evidence one way of the other).

llvm/lib/Transforms/Scalar/DeadStoreElimination.cpp
1424–1425	Ah, excellent point. I had not considered irreducible loops and they are not the kind of thing that naturally comes up in testing. (Unfortunately I feel like all the testing I could try would already have been run by Florian before DSE was committed, and it didn't show anything then just as it would not show any problems now. csmith didn't feel like much help). I was originally more worried about opening the door to any number of unrelated issues coming up from allowing DSE from different blocks where both LI's return nullptr. For the moment, I have added a check using mayContainIrreducibleControl on the function and bailing if there are any. What do you think? Too much of a blunt hammer, or an OK way to handle this?

Could you please rebase this? The patch doesn't apply cleanly to main. (Insert rant here about Phabricator not providing 3way patches, making it impossible to resolve conflicts when applying.)

llvm/lib/Transforms/Scalar/DeadStoreElimination.cpp
1424–1425	I think that's a good way to handle it. Dealing with irreducible control flow in full generality is unlikely to be worthwhile. Might want to extract this condition into a separate function, as it's getting a bit hard to read...

Rebase and move the condition logic into the start of IsGuaranteedLoopInvariant.

Compile-time: http://llvm-compile-time-tracker.com/compare.php?from=da627258742ae638b813d0341f069d5b4a6bd9ae&to=6c119f84a7fa3d6ea3b82e89d482c31836d5126f&stat=instructions

There is some impact, with the largest regression being sqlite3 with ThinLTO. I expect the reason isn't any of the added infrastructure, but rather the fact that we will now perform a longer walk in many cases.

I believe this is still not quite correct. Consider the following variation:

@x = global [10 x i16] zeroinitializer, align 1

define i16 @test(i1 %cond) {
entry: 
  br label %do.body

do.body:                             
  %i.0 = phi i16 [ 0, %entry ], [ %inc, %if.end2 ]
  %arrayidx2 = getelementptr inbounds [10 x i16], [10 x i16]* @x, i16 0, i16 %i.0 
  store i16 2, i16* %arrayidx2, align 1 ;;; store is removed
  %exitcond = icmp eq i16 %i.0, 4
  br i1 %exitcond, label %exit, label %if.end

if.end:                   
  br i1 %cond, label %do.store, label %if.end2 
   
do.store:
  store i16 3, i16* %arrayidx2, align 1
  br label %if.end2 
   
if.end2:
  %inc = add nuw nsw i16 %i.0, 1
  br label %do.body 

exit:
  store i16 1, i16* %arrayidx2, align 1
  ret i16 0    
}

The current implementation will eliminate store store i16 2.

The reason is that DSE determines eliminable stores in two phases: First, a dominating access for the killing store is found. And second, all uses of the dominating access are checked. When store i16 1 is the killing access, we will bail out due to the loop invariance check. When store i16 3 is the killing store, we will not, because they are in the same loop. The store i16 1 will be examined as a use and skipped as a fully overwriting store.

This revision now requires changes to proceed.Apr 16 2021, 1:24 PM

Harbormaster completed remote builds in B99215: Diff 338166.Apr 16 2021, 1:36 PM

Sigh. I should probably have found that problem. I hadn't considered multiple stores overriding happening like that.

Hello. OK. I'm back

This makes two changes. isOverwrite now checks IsGuaranteedLoopInvariant, returning OW_Unknown if they are not loop invariant. Otherwise we cannot trust the MustAlias that AA provides.
It also limits the IsGuaranteedLoopInvariant check from including global scope, only checking inside the same loop (or same block). This should hopefully limit the iterating on blocks that are very unlikely to be helpful.

Together these can mean we keep removing stores in loops, but does mean that certain partial alias overlaps that were removed before may no longer be. I don't believe those will come up very often though.

This removes some extra stores in the llvm test suite:

Metric: dse.NumRemainingStores
test-suite...nchmarkGame/spectral-norm.test     9.00     7.00   -22.2%
test-suite...abench/jpeg/jpeg-6a/cjpeg.test   6029.00  5709.00  -5.3%
test-suite...nsumer-jpeg/consumer-jpeg.test   6171.00  5851.00  -5.2%
test-suite...nchmarks/Misc/ReedSolomon.test   248.00   239.00   -3.6%
test-suite...s/ASC_Sequoia/AMGmk/AMGmk.test   220.00   216.00   -1.8%
test-suite...marks/Ptrdist/yacr2/yacr2.test   314.00   310.00   -1.3%
test-suite...nchmarks/McCat/18-imp/imp.test   143.00   142.00   -0.7%
test-suite...ocBench/espresso/espresso.test   1776.00  1772.00  -0.2%
test-suite...pplications/oggenc/oggenc.test   4174.00  4165.00  -0.2%
test-suite.../Benchmarks/nbench/nbench.test   1375.00  1373.00  -0.1%
test-suite...lications/ClamAV/clamscan.test   10107.00 10099.00 -0.1%
test-suite...arks/mafft/pairlocalalign.test   5967.00  5963.00  -0.1%
test-suite...nsumer-lame/consumer-lame.test   4615.00  4612.00  -0.1%
test-suite...lications/sqlite3/sqlite3.test   16485.00 16478.00 -0.0%
test-suite...marks/7zip/7zip-benchmark.test   32005.00 31993.00 -0.0%
test-suite...ications/JM/lencod/lencod.test   13650.00 13645.00 -0.0%
test-suite...ications/JM/ldecod/ldecod.test   6124.00  6122.00  -0.0%
test-suite...rks/tramp3d-v4/tramp3d-v4.test   49150.00 49144.00 -0.0%
test-suite...-typeset/consumer-typeset.test   22146.00 22144.00 -0.0%
test-suite.../Benchmarks/Bullet/bullet.test   26355.00 26354.00 -0.0%

The number of defchecks and walks generally goes up a little on average, whilst the number of override checks goes down. But different cases can act very differently. sqlite is still doing more work.

WDYT? This is hopefully conservatively correct. Let me know if you think of ways it would not be.

Could you please pre-commit the move of isOverwrite? Hard to see what changed otherwise.

Harbormaster completed remote builds in B104232: Diff 345077.May 13 2021, 3:26 AM

I've rebased over the movement of isOverwrite into DSEState (but not committed that yet, will do soon). I've also rebased over the tests, to show the ones the improve/get worse (and that most don't change).

Harbormaster completed remote builds in B104281: Diff 345126.May 13 2021, 7:22 AM

dmgreen mentioned this in rGf7cb654763ec: [DSE] Move isOverwrite into DSEState. NFC.May 14 2021, 1:17 AM

Rebase onto main. Any thoughts or comments on this now?

Herald added a subscriber: foad. · View Herald TranscriptMay 24 2021, 2:34 AM

Harbormaster completed remote builds in B105863: Diff 347330.May 24 2021, 3:12 AM

Logic looks correct now, and new compile-time impact (https://llvm-compile-time-tracker.com/compare.php?from=e42636d3c1a41a9b7c5d8095ae5ef6682e26d4a2&to=502b1e5a2bd94706da3df367ddf558fdef7e2a2e&stat=instructions) also looks fine to me.

llvm/lib/Transforms/Scalar/DeadStoreElimination.cpp
1050	While it ultimately shouldn't make a difference, I think this should be passing `Earlier` as the memory location.
1271	Doc comment is outdated now -- and the function should probably also get renamed as well.
1273	`const MemoryLocation &`

Changed IsGuaranteedLoopInvariant to isGuaranteedLoopInvariant (if this wasn't what you meant, let me know, and do you have suggestions for a better name?)
Reworded comments.
Fixed Later->Earlier memref, which then needed a rejig of one of the test to move the initial memloc out of the entry block.

Harbormaster completed remote builds in B106162: Diff 347783.May 25 2021, 3:23 PM

LGTM

In D100464#2780642, @dmgreen wrote:

Changed IsGuaranteedLoopInvariant to isGuaranteedLoopInvariant (if this wasn't what you meant, let me know, and do you have suggestions for a better name?)

That was not what I had in mind, but after your comment update, I think this framing works :)

This revision is now accepted and ready to land.May 27 2021, 2:18 PM

LGTM, thanks! I think it would be good to update some of the naming/comments slightly (as per the comments) before committing.

llvm/lib/Transforms/Scalar/DeadStoreElimination.cpp
1048	nit: 'the same location' may be a bit confusing here. I guess it's meant as the same `MemoryLocation` value, but we can store to different concrete locations for locations that depend on a loop induction for example.
1279	Referring to candidates & elimination here seems a bit out-of-place given the name & doc-comment of the function. It would be good to update the function name (doesn't check for invariance any longer). It now more accurately tries to rule out cross-iteration dependencies?

dmgreen mentioned this in rG222aeb4d51a4: [DSE] Remove stores in the same loop iteration.May 31 2021, 2:23 AM

I did some rewording, but I'm not sure it's really better than it was before. Feel free to update further if you see fit.

rG222aeb4d51a4

@dmgreen, we found that DSE is removing a non dead store in a loop with multiple backedges. File "dse-double-loop.ll" attached. The 2nd store postdominates the 1st one, but it doesn't run on every iteration, so the 1st one is still needed. We worked around it by checking numBackEdges > 1 in addition to the irreducibility check. Would you mind taking a look? I hope bugpoint did not over-reduce the test case; it's derived from "real" code in spec2000 186.crafty.

dse-double-loop.ll1 KBDownload

Oh so it does. Thanks for the report. I thought we had a test case for that...

I'll revert this for the time being.

dmgreen mentioned this in rG297088d1add7: Revert "[DSE] Remove stores in the same loop iteration".Jun 8 2021, 1:23 PM

dmgreen mentioned this in rG0178ae734ca3: [DSE] Add another multiblock loop DSE test. NFC.Jun 8 2021, 1:55 PM

Thank you for the quick response! Much appreciated.

dmgreen mentioned this in rG562593ff82f8: [DSE] Extra multiblock loop tests, NFC..Jun 13 2021, 2:33 PM

Reopening with adjusted code (and more test cases). The issues were related to the checks for all paths leading to the exit collectively postdominating the earlier dead store. But:

A store that walked back to itself was ignored, effectively killing itself. This was this issue that came up from the reproducer. It is only true if the access is loop invariant, so it now uses an extra isGuaranteedLoopInvariant() check.
Stores in blocks with later PO numbers were again effectively ignored. With loops they need to be handled, in this case by returning None.

The number of stores remove in the llvm test suite was almost the same as with the previous version of this patch. The total remaining stores was 3 more than the last set of numbers.

Herald added a subscriber: ormris. · View Herald TranscriptJun 16 2021, 5:11 AM

dmgreen reopened this revision.Jun 16 2021, 5:12 AM

This revision is now accepted and ready to land.Jun 16 2021, 5:12 AM

dmgreen requested review of this revision.Jun 16 2021, 5:12 AM

spec2000/186 passes now -- thanks.

This revision is now accepted and ready to land.Jun 16 2021, 11:19 AM

Harbormaster completed remote builds in B109490: Diff 352395.Jun 16 2021, 12:58 PM

In D100464#2822499, @clin1 wrote:

spec2000/186 passes now -- thanks.

Thanks for checking! We didn't see any failures here in our running of SPEC, I don't believe, so it's good to see its fixed for you with the latest version. There must be something different about the way they get run, perhaps because of the differences in architecture.

dmgreen mentioned this in rGa24b02193a30: [DSE] Remove stores in the same loop iteration.Jun 20 2021, 9:05 AM

We see a big performance regression caused by this patch due to register pressure increase.
We found that remove stores in the same iteration could increase
the register pressure dramatically. In the following piece of code, group 1 and group 2 stores
are eliminated and the register usage increased from 70 to 171. This is from our
critical application, I am not sure whether this DSE could estimate RP?

// group 1 stores
sw[0][j][i] = r0 * du0;
sw[1][j][i] = r0 * du1;
sw[2][j][i] = r0 * du2;

.....

#pragma unroll 8

for (int m = 0; m < 8; m++) {
  const double vs =sv[m][j];
  du0 += vs * sq[0][m][i];
  du1 += vs * sq[1][m][i];
  du2 += vs * sq[2][m][i];
}
// group 2 stores
sw[0][j][i] += r1 * du0;
sw[1][j][i] += r1 * du1;
sw[2][j][i] += r1 * du2;

......

#pragma unroll 8

for (int m = 0; m < 8; m++) {
  const double xt = st[m][k];
  dx0 += xt * ru[0][m];
  dx1 += xt * ru[1][m];
  dx2 += xt * ru[2][m];
}
// group 3 stores
sw[0][j][i] += r2 * dx0;
sw[1][j][i] += r2 * dx1;
sw[2][j][i] += r2 * dx2;

Any suggestions are welcome to resolve this issue. Thanks.

Was that code being vectorized before this patch?

In D100464#3136624, @cfang wrote:

Any suggestions are welcome to resolve this issue. Thanks.

Which pass is actually causing the increase in register pressure? I don't think DSE would be directly responsible, but rather some other transformation will be enabled and may go crazy. Figuring out which one and why would be a good start.

@dmgreen this review is still open, but the patch got recommitted again, right? Can it be closed?

Hello. Do you have a more full example where this is causing the regressions? Is it something like this https://godbolt.org/z/fW79sMhWP? Or is there more to it than that?

Like the others have said - it's likely a good thing to remove dead stores and other passes that should be handling the increase in register pressure more elegantly. Without a more complete example it's hard to say what or where though.

I guess there must be more to it than the example above - if all the loops are unrolled then all the instructions end up in the same block and DSE can remove all the stores. even before this patch.

Right. If we manually LICM'ed the load/stores in group 1 and group2 (with something like sum_i),
we will see the higher register pressure. Further, if we intentionally insert dead stores to store
the intermediate sum_is, the register pressure can be lowered. Some later passes should be responsible.

In D100464#3136949, @fhahn wrote:

@dmgreen this review is still open, but the patch got recommitted again, right? Can it be closed?

Yep, this went in a long time ago now.

Revision Contents

Path

Size

llvm/

lib/

Transforms/

Scalar/

DeadStoreElimination.cpp

54 lines

test/

CodeGen/

AMDGPU/

opt-pipeline.ll

4 lines

Other/

opt-O2-pipeline.ll

2 lines

opt-O3-pipeline-enable-matrix.ll

2 lines

opt-O3-pipeline.ll

2 lines

opt-Os-pipeline.ll

2 lines

Transforms/

DeadStoreElimination/

multiblock-loops.ll

101 lines

Diff 338014

llvm/lib/Transforms/Scalar/DeadStoreElimination.cpp

Show All 34 Lines
#include "llvm/ADT/SetVector.h"		#include "llvm/ADT/SetVector.h"
#include "llvm/ADT/SmallPtrSet.h"		#include "llvm/ADT/SmallPtrSet.h"
#include "llvm/ADT/SmallVector.h"		#include "llvm/ADT/SmallVector.h"
#include "llvm/ADT/Statistic.h"		#include "llvm/ADT/Statistic.h"
#include "llvm/ADT/StringRef.h"		#include "llvm/ADT/StringRef.h"
#include "llvm/Analysis/AliasAnalysis.h"		#include "llvm/Analysis/AliasAnalysis.h"
#include "llvm/Analysis/CaptureTracking.h"		#include "llvm/Analysis/CaptureTracking.h"
#include "llvm/Analysis/GlobalsModRef.h"		#include "llvm/Analysis/GlobalsModRef.h"
		#include "llvm/Analysis/LoopInfo.h"
#include "llvm/Analysis/MemoryBuiltins.h"		#include "llvm/Analysis/MemoryBuiltins.h"
#include "llvm/Analysis/MemoryLocation.h"		#include "llvm/Analysis/MemoryLocation.h"
#include "llvm/Analysis/MemorySSA.h"		#include "llvm/Analysis/MemorySSA.h"
#include "llvm/Analysis/MemorySSAUpdater.h"		#include "llvm/Analysis/MemorySSAUpdater.h"
		#include "llvm/Analysis/MustExecute.h"
#include "llvm/Analysis/PostDominators.h"		#include "llvm/Analysis/PostDominators.h"
#include "llvm/Analysis/TargetLibraryInfo.h"		#include "llvm/Analysis/TargetLibraryInfo.h"
#include "llvm/Analysis/ValueTracking.h"		#include "llvm/Analysis/ValueTracking.h"
#include "llvm/IR/Argument.h"		#include "llvm/IR/Argument.h"
#include "llvm/IR/BasicBlock.h"		#include "llvm/IR/BasicBlock.h"
#include "llvm/IR/Constant.h"		#include "llvm/IR/Constant.h"
#include "llvm/IR/Constants.h"		#include "llvm/IR/Constants.h"
#include "llvm/IR/DataLayout.h"		#include "llvm/IR/DataLayout.h"
▲ Show 20 Lines • Show All 912 Lines • ▼ Show 20 Lines	struct DSEState {
/// value pointer.		/// value pointer.
BatchAAResults BatchAA;		BatchAAResults BatchAA;

MemorySSA &MSSA;		MemorySSA &MSSA;
DominatorTree &DT;		DominatorTree &DT;
PostDominatorTree &PDT;		PostDominatorTree &PDT;
const TargetLibraryInfo &TLI;		const TargetLibraryInfo &TLI;
const DataLayout &DL;		const DataLayout &DL;
		const LoopInfo &LI;

		// Whether the function contains any irreducible control flow, useful for
		// being accurately able to detect loops.
		bool ContainsIrreducibleLoops;

// All MemoryDefs that potentially could kill other MemDefs.		// All MemoryDefs that potentially could kill other MemDefs.
SmallVector<MemoryDef *, 64> MemDefs;		SmallVector<MemoryDef *, 64> MemDefs;
// Any that should be skipped as they are already deleted		// Any that should be skipped as they are already deleted
SmallPtrSet<MemoryAccess *, 4> SkipStores;		SmallPtrSet<MemoryAccess *, 4> SkipStores;
// Keep track of all of the objects that are invisible to the caller before		// Keep track of all of the objects that are invisible to the caller before
// the function returns.		// the function returns.
// SmallPtrSet<const Value *, 16> InvisibleToCallerBeforeRet;		// SmallPtrSet<const Value *, 16> InvisibleToCallerBeforeRet;
DenseMap<const Value *, bool> InvisibleToCallerBeforeRet;		DenseMap<const Value *, bool> InvisibleToCallerBeforeRet;
// Keep track of all of the objects that are invisible to the caller after		// Keep track of all of the objects that are invisible to the caller after
// the function returns.		// the function returns.
DenseMap<const Value *, bool> InvisibleToCallerAfterRet;		DenseMap<const Value *, bool> InvisibleToCallerAfterRet;
// Keep track of blocks with throwing instructions not modeled in MemorySSA.		// Keep track of blocks with throwing instructions not modeled in MemorySSA.
SmallPtrSet<BasicBlock *, 16> ThrowingBlocks;		SmallPtrSet<BasicBlock *, 16> ThrowingBlocks;
// Post-order numbers for each basic block. Used to figure out if memory		// Post-order numbers for each basic block. Used to figure out if memory
// accesses are executed before another access.		// accesses are executed before another access.
DenseMap<BasicBlock *, unsigned> PostOrderNumbers;		DenseMap<BasicBlock *, unsigned> PostOrderNumbers;

/// Keep track of instructions (partly) overlapping with killing MemoryDefs per		/// Keep track of instructions (partly) overlapping with killing MemoryDefs per
/// basic block.		/// basic block.
DenseMap<BasicBlock *, InstOverlapIntervalsTy> IOLs;		DenseMap<BasicBlock *, InstOverlapIntervalsTy> IOLs;

DSEState(Function &F, AliasAnalysis &AA, MemorySSA &MSSA, DominatorTree &DT,		DSEState(Function &F, AliasAnalysis &AA, MemorySSA &MSSA, DominatorTree &DT,
PostDominatorTree &PDT, const TargetLibraryInfo &TLI)		PostDominatorTree &PDT, const TargetLibraryInfo &TLI,
		const LoopInfo &LI)
: F(F), AA(AA), BatchAA(AA, /CacheOffsets =/true), MSSA(MSSA), DT(DT),		: F(F), AA(AA), BatchAA(AA, /CacheOffsets =/true), MSSA(MSSA), DT(DT),
PDT(PDT), TLI(TLI), DL(F.getParent()->getDataLayout()) {}		PDT(PDT), TLI(TLI), DL(F.getParent()->getDataLayout()), LI(LI) {}

static DSEState get(Function &F, AliasAnalysis &AA, MemorySSA &MSSA,		static DSEState get(Function &F, AliasAnalysis &AA, MemorySSA &MSSA,
DominatorTree &DT, PostDominatorTree &PDT,		DominatorTree &DT, PostDominatorTree &PDT,
const TargetLibraryInfo &TLI) {		const TargetLibraryInfo &TLI, const LoopInfo &LI) {
DSEState State(F, AA, MSSA, DT, PDT, TLI);		DSEState State(F, AA, MSSA, DT, PDT, TLI, LI);
// Collect blocks with throwing instructions not modeled in MemorySSA and		// Collect blocks with throwing instructions not modeled in MemorySSA and
// alloc-like objects.		// alloc-like objects.
unsigned PO = 0;		unsigned PO = 0;
for (BasicBlock *BB : post_order(&F)) {		for (BasicBlock *BB : post_order(&F)) {
State.PostOrderNumbers[BB] = PO++;		State.PostOrderNumbers[BB] = PO++;
for (Instruction &I : *BB) {		for (Instruction &I : *BB) {
MemoryAccess *MA = MSSA.getMemoryAccess(&I);		MemoryAccess *MA = MSSA.getMemoryAccess(&I);
if (I.mayThrow() && !MA)		if (I.mayThrow() && !MA)
Show All 11 Lines	static DSEState get(Function &F, AliasAnalysis &AA, MemorySSA &MSSA,
for (Argument &AI : F.args())		for (Argument &AI : F.args())
if (AI.hasPassPointeeByValueCopyAttr()) {		if (AI.hasPassPointeeByValueCopyAttr()) {
// For byval, the caller doesn't know the address of the allocation.		// For byval, the caller doesn't know the address of the allocation.
if (AI.hasByValAttr())		if (AI.hasByValAttr())
State.InvisibleToCallerBeforeRet.insert({&AI, true});		State.InvisibleToCallerBeforeRet.insert({&AI, true});
State.InvisibleToCallerAfterRet.insert({&AI, true});		State.InvisibleToCallerAfterRet.insert({&AI, true});
}		}

		// Collect whether there is any irreducible control flow in the function.
		State.ContainsIrreducibleLoops = mayContainIrreducibleControl(F, &LI);

return State;		return State;
}		}

bool isInvisibleToCallerAfterRet(const Value *V) {		bool isInvisibleToCallerAfterRet(const Value *V) {
if (isa<AllocaInst>(V))		if (isa<AllocaInst>(V))
		fhahnUnsubmitted Not Done Reply Inline Actions nit: 'the same location' may be a bit confusing here. I guess it's meant as the same `MemoryLocation` value, but we can store to different concrete locations for locations that depend on a loop induction for example. fhahn: nit: 'the same location' may be a bit confusing here. I guess it's meant as the same…
return true;		return true;
auto I = InvisibleToCallerAfterRet.insert({V, false});		auto I = InvisibleToCallerAfterRet.insert({V, false});
		nikicUnsubmitted Not Done Reply Inline Actions While it ultimately shouldn't make a difference, I think this should be passing `Earlier` as the memory location. nikic: While it ultimately shouldn't make a difference, I think this should be passing `Earlier` as…
if (I.second) {		if (I.second) {
if (!isInvisibleToCallerBeforeRet(V)) {		if (!isInvisibleToCallerBeforeRet(V)) {
I.first->second = false;		I.first->second = false;
} else {		} else {
auto *Inst = dyn_cast<Instruction>(V);		auto *Inst = dyn_cast<Instruction>(V);
if (Inst && isAllocLikeFn(Inst, &TLI))		if (Inst && isAllocLikeFn(Inst, &TLI))
I.first->second = !PointerMayBeCaptured(V, true, false);		I.first->second = !PointerMayBeCaptured(V, true, false);
}		}
▲ Show 20 Lines • Show All 204 Lines • ▼ Show 20 Lines	bool isReadClobber(const MemoryLocation &DefLoc, Instruction *UseInst) {
// expensive analysis to limit compile-time.		// expensive analysis to limit compile-time.
return isRefSet(BatchAA.getModRefInfo(UseInst, DefLoc));		return isRefSet(BatchAA.getModRefInfo(UseInst, DefLoc));
}		}

/// Returns true if \p Ptr is guaranteed to be loop invariant for any possible		/// Returns true if \p Ptr is guaranteed to be loop invariant for any possible
/// loop. In particular, this guarantees that it only references a single		/// loop. In particular, this guarantees that it only references a single
/// MemoryLocation during execution of the containing function.		/// MemoryLocation during execution of the containing function.
bool IsGuaranteedLoopInvariant(Value *Ptr) {		bool IsGuaranteedLoopInvariant(Value *Ptr) {
auto IsGuaranteedLoopInvariantBase = [this](Value *Ptr) {		auto IsGuaranteedLoopInvariantBase = [this](Value *Ptr) {
		nikicUnsubmitted Not Done Reply Inline Actions Doc comment is outdated now -- and the function should probably also get renamed as well. nikic: Doc comment is outdated now -- and the function should probably also get renamed as well.
Ptr = Ptr->stripPointerCasts();		Ptr = Ptr->stripPointerCasts();
if (auto *I = dyn_cast<Instruction>(Ptr)) {		if (auto *I = dyn_cast<Instruction>(Ptr)) {
		nikicUnsubmitted Not Done Reply Inline Actions `const MemoryLocation &` nikic: `const MemoryLocation &`
if (isa<AllocaInst>(Ptr))		if (isa<AllocaInst>(Ptr))
return true;		return true;

if (isAllocLikeFn(I, &TLI))		if (isAllocLikeFn(I, &TLI))
return true;		return true;

		fhahnUnsubmitted Not Done Reply Inline Actions Referring to candidates & elimination here seems a bit out-of-place given the name & doc-comment of the function. It would be good to update the function name (doesn't check for invariance any longer). It now more accurately tries to rule out cross-iteration dependencies? fhahn: Referring to candidates & elimination here seems a bit out-of-place given the name & doc…
return false;		return false;
}		}
return true;		return true;
};		};

Ptr = Ptr->stripPointerCasts();		Ptr = Ptr->stripPointerCasts();
if (auto *I = dyn_cast<Instruction>(Ptr)) {		if (auto *I = dyn_cast<Instruction>(Ptr)) {
if (I->getParent() == &I->getFunction()->getEntryBlock()) {		if (I->getParent() == &I->getFunction()->getEntryBlock()) {
Show All 12 Lines	struct DSEState {
// if \p DefLoc is not accessible after the function returns. If there is no		// if \p DefLoc is not accessible after the function returns. If there is no
// such MemoryDef, return None. The returned value may not (completely)		// such MemoryDef, return None. The returned value may not (completely)
// overwrite \p DefLoc. Currently we bail out when we encounter an aliasing		// overwrite \p DefLoc. Currently we bail out when we encounter an aliasing
// MemoryUse (read).		// MemoryUse (read).
Optional<MemoryAccess *>		Optional<MemoryAccess *>
getDomMemoryDef(MemoryDef KillingDef, MemoryAccess StartAccess,		getDomMemoryDef(MemoryDef KillingDef, MemoryAccess StartAccess,
const MemoryLocation &DefLoc, const Value *DefUO,		const MemoryLocation &DefLoc, const Value *DefUO,
unsigned &ScanLimit, unsigned &WalkerStepLimit,		unsigned &ScanLimit, unsigned &WalkerStepLimit,
bool IsMemTerm, unsigned &PartialLimit) {		bool IsMemTerm, unsigned &PartialLimit) {
		fhahnUnsubmitted Done Reply Inline Actions Can `LI` just be part of DSEState? fhahn: Can `LI` just be part of DSEState?
if (ScanLimit == 0 \|\| WalkerStepLimit == 0) {		if (ScanLimit == 0 \|\| WalkerStepLimit == 0) {
LLVM_DEBUG(dbgs() << "\n ... hit scan limit\n");		LLVM_DEBUG(dbgs() << "\n ... hit scan limit\n");
return None;		return None;
}		}

MemoryAccess *Current = StartAccess;		MemoryAccess *Current = StartAccess;
Instruction *KillingI = KillingDef->getMemoryInst();		Instruction *KillingI = KillingDef->getMemoryInst();
bool StepAgain;		bool StepAgain;
▲ Show 20 Lines • Show All 91 Lines • ▼ Show 20 Lines	do {
if (!CurrentLoc) {		if (!CurrentLoc) {
StepAgain = true;		StepAgain = true;
Current = CurrentDef->getDefiningAccess();		Current = CurrentDef->getDefiningAccess();
continue;		continue;
}		}

// AliasAnalysis does not account for loops. Limit elimination to		// AliasAnalysis does not account for loops. Limit elimination to
// candidates for which we can guarantee they always store to the same		// candidates for which we can guarantee they always store to the same
// memory location and not multiple locations in a loop.		// memory location and not located in different loops. But also be
if (Current->getBlock() != KillingDef->getBlock() &&		// careful with irreducible control flow, which can create cycles without
		// appearing as loops.
		if (((!ContainsIrreducibleLoops &&
		LI.getLoopFor(Current->getBlock()) !=
		LI.getLoopFor(KillingDef->getBlock())) \|\|
		(ContainsIrreducibleLoops &&
		Current->getBlock() != KillingDef->getBlock())) &&
!IsGuaranteedLoopInvariant(const_cast<Value *>(CurrentLoc->Ptr))) {		!IsGuaranteedLoopInvariant(const_cast<Value *>(CurrentLoc->Ptr))) {
		LLVM_DEBUG(dbgs() << " ... not guaranteed loop invariant\n");
		nikicUnsubmitted Not Done Reply Inline Actions I'm not sure this logic is correct. The problem I see is that LoopInfo only tracks natural loops. This means that even though Current and KillingDef might be in the same natural loop (say the nullptr loop), Current might still be part of a (non-natural) cycle, in which case KillingDef may not kill it. I'm not particularly familiar with LoopInfo, but that's my understanding of how it works. nikic: I'm not sure this logic is correct. The problem I see is that LoopInfo only tracks natural…
		fhahnUnsubmitted Not Done Reply Inline Actions I'm not sure this logic is correct. The problem I see is that LoopInfo only tracks natural loops. This means that even though Current and KillingDef might be in the same natural loop (say the nullptr loop), Current might still be part of a (non-natural) cycle, in which case KillingDef may not kill it. I'm not particularly familiar with LoopInfo, but that's my understanding of how it works. Yes I think that's correct. There might be cycles that LoopInfo does not detect. So we should not rely on comparing the loops on its own. If there is no loop for a given block, they still might be in a cycle that has not been detected by LoopInfo due to irreducible control flow; if there's a loop for a given block, it could be in a cycle that's completely contained in the loop we found. If LI finds a loop for a block, I think it is guaranteed that the loop is only entered through the header though. I think this should allow us to treat the `phis` in the header as being in the same iteration or invariant in any undetected cycles in the loop. fhahn: > I'm not sure this logic is correct. The problem I see is that LoopInfo only tracks natural…
		dmgreenAuthorUnsubmitted Done Reply Inline Actions Ah, excellent point. I had not considered irreducible loops and they are not the kind of thing that naturally comes up in testing. (Unfortunately I feel like all the testing I could try would already have been run by Florian before DSE was committed, and it didn't show anything then just as it would not show any problems now. csmith didn't feel like much help). I was originally more worried about opening the door to any number of unrelated issues coming up from allowing DSE from different blocks where both LI's return nullptr. For the moment, I have added a check using mayContainIrreducibleControl on the function and bailing if there are any. What do you think? Too much of a blunt hammer, or an OK way to handle this? dmgreen: Ah, excellent point. I had not considered irreducible loops and they are not the kind of thing…
		nikicUnsubmitted Not Done Reply Inline Actions I think that's a good way to handle it. Dealing with irreducible control flow in full generality is unlikely to be worthwhile. Might want to extract this condition into a separate function, as it's getting a bit hard to read... nikic: I think that's a good way to handle it. Dealing with irreducible control flow in full…
StepAgain = true;		StepAgain = true;
Current = CurrentDef->getDefiningAccess();		Current = CurrentDef->getDefiningAccess();
WalkerStepLimit -= 1;		WalkerStepLimit -= 1;
continue;		continue;
}		}

if (IsMemTerm) {		if (IsMemTerm) {
// If the killing def is a memory terminator (e.g. lifetime.end), check		// If the killing def is a memory terminator (e.g. lifetime.end), check
▲ Show 20 Lines • Show All 412 Lines • ▼ Show 20 Lines	if (StoredConstant && StoredConstant->isNullValue()) {
MSSA.getSkipSelfWalker()->getClobberingMemoryAccess(Def);		MSSA.getSkipSelfWalker()->getClobberingMemoryAccess(Def);
return UnderlyingDef == ClobberDef;		return UnderlyingDef == ClobberDef;
}		}
}		}
return false;		return false;
}		}
};		};

bool eliminateDeadStores(Function &F, AliasAnalysis &AA, MemorySSA &MSSA,		static bool eliminateDeadStores(Function &F, AliasAnalysis &AA, MemorySSA &MSSA,
DominatorTree &DT, PostDominatorTree &PDT,		DominatorTree &DT, PostDominatorTree &PDT,
const TargetLibraryInfo &TLI) {		const TargetLibraryInfo &TLI,
		const LoopInfo &LI) {
bool MadeChange = false;		bool MadeChange = false;

DSEState State = DSEState::get(F, AA, MSSA, DT, PDT, TLI);		DSEState State = DSEState::get(F, AA, MSSA, DT, PDT, TLI, LI);
// For each store:		// For each store:
for (unsigned I = 0; I < State.MemDefs.size(); I++) {		for (unsigned I = 0; I < State.MemDefs.size(); I++) {
MemoryDef *KillingDef = State.MemDefs[I];		MemoryDef *KillingDef = State.MemDefs[I];
if (State.SkipStores.count(KillingDef))		if (State.SkipStores.count(KillingDef))
continue;		continue;
Instruction *SI = KillingDef->getMemoryInst();		Instruction *SI = KillingDef->getMemoryInst();

Optional<MemoryLocation> MaybeSILoc;		Optional<MemoryLocation> MaybeSILoc;
Show All 28 Lines	for (unsigned I = 0; I < State.MemDefs.size(); I++) {
bool Shortend = false;		bool Shortend = false;
bool IsMemTerm = State.isMemTerminatorInst(SI);		bool IsMemTerm = State.isMemTerminatorInst(SI);
// Check if MemoryAccesses in the worklist are killed by KillingDef.		// Check if MemoryAccesses in the worklist are killed by KillingDef.
for (unsigned I = 0; I < ToCheck.size(); I++) {		for (unsigned I = 0; I < ToCheck.size(); I++) {
Current = ToCheck[I];		Current = ToCheck[I];
if (State.SkipStores.count(Current))		if (State.SkipStores.count(Current))
continue;		continue;

Optional<MemoryAccess *> Next = State.getDomMemoryDef(		Optional<MemoryAccess *> Next =
KillingDef, Current, SILoc, SILocUnd, ScanLimit, WalkerStepLimit,		State.getDomMemoryDef(KillingDef, Current, SILoc, SILocUnd, ScanLimit,
IsMemTerm, PartialLimit);		WalkerStepLimit, IsMemTerm, PartialLimit);

if (!Next) {		if (!Next) {
LLVM_DEBUG(dbgs() << " finished walk\n");		LLVM_DEBUG(dbgs() << " finished walk\n");
continue;		continue;
}		}

MemoryAccess EarlierAccess = Next;		MemoryAccess EarlierAccess = Next;
LLVM_DEBUG(dbgs() << " Checking if we can kill " << *EarlierAccess);		LLVM_DEBUG(dbgs() << " Checking if we can kill " << *EarlierAccess);
▲ Show 20 Lines • Show All 110 Lines • ▼ Show 20 Lines
// DSE Pass		// DSE Pass
//===----------------------------------------------------------------------===//		//===----------------------------------------------------------------------===//
PreservedAnalyses DSEPass::run(Function &F, FunctionAnalysisManager &AM) {		PreservedAnalyses DSEPass::run(Function &F, FunctionAnalysisManager &AM) {
AliasAnalysis &AA = AM.getResult<AAManager>(F);		AliasAnalysis &AA = AM.getResult<AAManager>(F);
const TargetLibraryInfo &TLI = AM.getResult<TargetLibraryAnalysis>(F);		const TargetLibraryInfo &TLI = AM.getResult<TargetLibraryAnalysis>(F);
DominatorTree &DT = AM.getResult<DominatorTreeAnalysis>(F);		DominatorTree &DT = AM.getResult<DominatorTreeAnalysis>(F);
MemorySSA &MSSA = AM.getResult<MemorySSAAnalysis>(F).getMSSA();		MemorySSA &MSSA = AM.getResult<MemorySSAAnalysis>(F).getMSSA();
PostDominatorTree &PDT = AM.getResult<PostDominatorTreeAnalysis>(F);		PostDominatorTree &PDT = AM.getResult<PostDominatorTreeAnalysis>(F);
		LoopInfo &LI = AM.getResult<LoopAnalysis>(F);

bool Changed = eliminateDeadStores(F, AA, MSSA, DT, PDT, TLI);		bool Changed = eliminateDeadStores(F, AA, MSSA, DT, PDT, TLI, LI);

#ifdef LLVM_ENABLE_STATS		#ifdef LLVM_ENABLE_STATS
if (AreStatisticsEnabled())		if (AreStatisticsEnabled())
for (auto &I : instructions(F))		for (auto &I : instructions(F))
NumRemainingStores += isa<StoreInst>(&I);		NumRemainingStores += isa<StoreInst>(&I);
#endif		#endif

if (!Changed)		if (!Changed)
return PreservedAnalyses::all();		return PreservedAnalyses::all();

PreservedAnalyses PA;		PreservedAnalyses PA;
PA.preserveSet<CFGAnalyses>();		PA.preserveSet<CFGAnalyses>();
		fhahnUnsubmitted Done Reply Inline Actions Should also preserve `LoopAnalysis`? fhahn: Should also preserve `LoopAnalysis`?
PA.preserve<GlobalsAA>();		PA.preserve<GlobalsAA>();
PA.preserve<MemorySSAAnalysis>();		PA.preserve<MemorySSAAnalysis>();
		PA.preserve<LoopAnalysis>();
return PA;		return PA;
}		}

namespace {		namespace {

/// A legacy pass for the legacy pass manager that wraps \c DSEPass.		/// A legacy pass for the legacy pass manager that wraps \c DSEPass.
class DSELegacyPass : public FunctionPass {		class DSELegacyPass : public FunctionPass {
public:		public:
Show All 9 Lines	bool runOnFunction(Function &F) override {

AliasAnalysis &AA = getAnalysis<AAResultsWrapperPass>().getAAResults();		AliasAnalysis &AA = getAnalysis<AAResultsWrapperPass>().getAAResults();
DominatorTree &DT = getAnalysis<DominatorTreeWrapperPass>().getDomTree();		DominatorTree &DT = getAnalysis<DominatorTreeWrapperPass>().getDomTree();
const TargetLibraryInfo &TLI =		const TargetLibraryInfo &TLI =
getAnalysis<TargetLibraryInfoWrapperPass>().getTLI(F);		getAnalysis<TargetLibraryInfoWrapperPass>().getTLI(F);
MemorySSA &MSSA = getAnalysis<MemorySSAWrapperPass>().getMSSA();		MemorySSA &MSSA = getAnalysis<MemorySSAWrapperPass>().getMSSA();
PostDominatorTree &PDT =		PostDominatorTree &PDT =
getAnalysis<PostDominatorTreeWrapperPass>().getPostDomTree();		getAnalysis<PostDominatorTreeWrapperPass>().getPostDomTree();
		LoopInfo &LI = getAnalysis<LoopInfoWrapperPass>().getLoopInfo();

bool Changed = eliminateDeadStores(F, AA, MSSA, DT, PDT, TLI);		bool Changed = eliminateDeadStores(F, AA, MSSA, DT, PDT, TLI, LI);

#ifdef LLVM_ENABLE_STATS		#ifdef LLVM_ENABLE_STATS
if (AreStatisticsEnabled())		if (AreStatisticsEnabled())
for (auto &I : instructions(F))		for (auto &I : instructions(F))
NumRemainingStores += isa<StoreInst>(&I);		NumRemainingStores += isa<StoreInst>(&I);
#endif		#endif

return Changed;		return Changed;
}		}

void getAnalysisUsage(AnalysisUsage &AU) const override {		void getAnalysisUsage(AnalysisUsage &AU) const override {
AU.setPreservesCFG();		AU.setPreservesCFG();
AU.addRequired<AAResultsWrapperPass>();		AU.addRequired<AAResultsWrapperPass>();
AU.addRequired<TargetLibraryInfoWrapperPass>();		AU.addRequired<TargetLibraryInfoWrapperPass>();
AU.addPreserved<GlobalsAAWrapperPass>();		AU.addPreserved<GlobalsAAWrapperPass>();
AU.addRequired<DominatorTreeWrapperPass>();		AU.addRequired<DominatorTreeWrapperPass>();
AU.addPreserved<DominatorTreeWrapperPass>();		AU.addPreserved<DominatorTreeWrapperPass>();
AU.addRequired<PostDominatorTreeWrapperPass>();		AU.addRequired<PostDominatorTreeWrapperPass>();
AU.addRequired<MemorySSAWrapperPass>();		AU.addRequired<MemorySSAWrapperPass>();
AU.addPreserved<PostDominatorTreeWrapperPass>();		AU.addPreserved<PostDominatorTreeWrapperPass>();
AU.addPreserved<MemorySSAWrapperPass>();		AU.addPreserved<MemorySSAWrapperPass>();
		AU.addRequired<LoopInfoWrapperPass>();
		AU.addPreserved<LoopInfoWrapperPass>();
}		}
};		};

} // end anonymous namespace		} // end anonymous namespace

char DSELegacyPass::ID = 0;		char DSELegacyPass::ID = 0;

INITIALIZE_PASS_BEGIN(DSELegacyPass, "dse", "Dead Store Elimination", false,		INITIALIZE_PASS_BEGIN(DSELegacyPass, "dse", "Dead Store Elimination", false,
Show All 14 Lines

llvm/test/CodeGen/AMDGPU/opt-pipeline.ll

	Show First 20 Lines • Show All 517 Lines • ▼ Show 20 Lines
	; GCN-O2-NEXT: Jump Threading			; GCN-O2-NEXT: Jump Threading
	; GCN-O2-NEXT: Value Propagation			; GCN-O2-NEXT: Value Propagation
	; GCN-O2-NEXT: Post-Dominator Tree Construction			; GCN-O2-NEXT: Post-Dominator Tree Construction
	; GCN-O2-NEXT: Aggressive Dead Code Elimination			; GCN-O2-NEXT: Aggressive Dead Code Elimination
	; GCN-O2-NEXT: Basic Alias Analysis (stateless AA impl)			; GCN-O2-NEXT: Basic Alias Analysis (stateless AA impl)
	; GCN-O2-NEXT: Function Alias Analysis Results			; GCN-O2-NEXT: Function Alias Analysis Results
	; GCN-O2-NEXT: Memory SSA			; GCN-O2-NEXT: Memory SSA
	; GCN-O2-NEXT: MemCpy Optimization			; GCN-O2-NEXT: MemCpy Optimization
	; GCN-O2-NEXT: Dead Store Elimination
	; GCN-O2-NEXT: Natural Loop Information			; GCN-O2-NEXT: Natural Loop Information
				; GCN-O2-NEXT: Dead Store Elimination
	; GCN-O2-NEXT: Canonicalize natural loops			; GCN-O2-NEXT: Canonicalize natural loops
	; GCN-O2-NEXT: LCSSA Verifier			; GCN-O2-NEXT: LCSSA Verifier
	; GCN-O2-NEXT: Loop-Closed SSA Form Pass			; GCN-O2-NEXT: Loop-Closed SSA Form Pass
	; GCN-O2-NEXT: Function Alias Analysis Results			; GCN-O2-NEXT: Function Alias Analysis Results
	; GCN-O2-NEXT: Scalar Evolution Analysis			; GCN-O2-NEXT: Scalar Evolution Analysis
	; GCN-O2-NEXT: Lazy Branch Probability Analysis			; GCN-O2-NEXT: Lazy Branch Probability Analysis
	; GCN-O2-NEXT: Lazy Block Frequency Analysis			; GCN-O2-NEXT: Lazy Block Frequency Analysis
	; GCN-O2-NEXT: Loop Pass Manager			; GCN-O2-NEXT: Loop Pass Manager
	▲ Show 20 Lines • Show All 345 Lines • ▼ Show 20 Lines
	; GCN-O3-NEXT: Jump Threading			; GCN-O3-NEXT: Jump Threading
	; GCN-O3-NEXT: Value Propagation			; GCN-O3-NEXT: Value Propagation
	; GCN-O3-NEXT: Post-Dominator Tree Construction			; GCN-O3-NEXT: Post-Dominator Tree Construction
	; GCN-O3-NEXT: Aggressive Dead Code Elimination			; GCN-O3-NEXT: Aggressive Dead Code Elimination
	; GCN-O3-NEXT: Basic Alias Analysis (stateless AA impl)			; GCN-O3-NEXT: Basic Alias Analysis (stateless AA impl)
	; GCN-O3-NEXT: Function Alias Analysis Results			; GCN-O3-NEXT: Function Alias Analysis Results
	; GCN-O3-NEXT: Memory SSA			; GCN-O3-NEXT: Memory SSA
	; GCN-O3-NEXT: MemCpy Optimization			; GCN-O3-NEXT: MemCpy Optimization
	; GCN-O3-NEXT: Dead Store Elimination
	; GCN-O3-NEXT: Natural Loop Information			; GCN-O3-NEXT: Natural Loop Information
				; GCN-O3-NEXT: Dead Store Elimination
	; GCN-O3-NEXT: Canonicalize natural loops			; GCN-O3-NEXT: Canonicalize natural loops
	; GCN-O3-NEXT: LCSSA Verifier			; GCN-O3-NEXT: LCSSA Verifier
	; GCN-O3-NEXT: Loop-Closed SSA Form Pass			; GCN-O3-NEXT: Loop-Closed SSA Form Pass
	; GCN-O3-NEXT: Function Alias Analysis Results			; GCN-O3-NEXT: Function Alias Analysis Results
	; GCN-O3-NEXT: Scalar Evolution Analysis			; GCN-O3-NEXT: Scalar Evolution Analysis
	; GCN-O3-NEXT: Lazy Branch Probability Analysis			; GCN-O3-NEXT: Lazy Branch Probability Analysis
	; GCN-O3-NEXT: Lazy Block Frequency Analysis			; GCN-O3-NEXT: Lazy Block Frequency Analysis
	; GCN-O3-NEXT: Loop Pass Manager			; GCN-O3-NEXT: Loop Pass Manager
	▲ Show 20 Lines • Show All 166 Lines • Show Last 20 Lines

llvm/test/Other/opt-O2-pipeline.ll

	Show First 20 Lines • Show All 160 Lines • ▼ Show 20 Lines
	; CHECK-NEXT: Jump Threading			; CHECK-NEXT: Jump Threading
	; CHECK-NEXT: Value Propagation			; CHECK-NEXT: Value Propagation
	; CHECK-NEXT: Post-Dominator Tree Construction			; CHECK-NEXT: Post-Dominator Tree Construction
	; CHECK-NEXT: Aggressive Dead Code Elimination			; CHECK-NEXT: Aggressive Dead Code Elimination
	; CHECK-NEXT: Basic Alias Analysis (stateless AA impl)			; CHECK-NEXT: Basic Alias Analysis (stateless AA impl)
	; CHECK-NEXT: Function Alias Analysis Results			; CHECK-NEXT: Function Alias Analysis Results
	; CHECK-NEXT: Memory SSA			; CHECK-NEXT: Memory SSA
	; CHECK-NEXT: MemCpy Optimization			; CHECK-NEXT: MemCpy Optimization
	; CHECK-NEXT: Dead Store Elimination
	; CHECK-NEXT: Natural Loop Information			; CHECK-NEXT: Natural Loop Information
				; CHECK-NEXT: Dead Store Elimination
	; CHECK-NEXT: Canonicalize natural loops			; CHECK-NEXT: Canonicalize natural loops
	; CHECK-NEXT: LCSSA Verifier			; CHECK-NEXT: LCSSA Verifier
	; CHECK-NEXT: Loop-Closed SSA Form Pass			; CHECK-NEXT: Loop-Closed SSA Form Pass
	; CHECK-NEXT: Function Alias Analysis Results			; CHECK-NEXT: Function Alias Analysis Results
	; CHECK-NEXT: Scalar Evolution Analysis			; CHECK-NEXT: Scalar Evolution Analysis
	; CHECK-NEXT: Lazy Branch Probability Analysis			; CHECK-NEXT: Lazy Branch Probability Analysis
	; CHECK-NEXT: Lazy Block Frequency Analysis			; CHECK-NEXT: Lazy Block Frequency Analysis
	; CHECK-NEXT: Loop Pass Manager			; CHECK-NEXT: Loop Pass Manager
	▲ Show 20 Lines • Show All 161 Lines • Show Last 20 Lines

llvm/test/Other/opt-O3-pipeline-enable-matrix.ll

	Show First 20 Lines • Show All 165 Lines • ▼ Show 20 Lines
	; CHECK-NEXT: Jump Threading			; CHECK-NEXT: Jump Threading
	; CHECK-NEXT: Value Propagation			; CHECK-NEXT: Value Propagation
	; CHECK-NEXT: Post-Dominator Tree Construction			; CHECK-NEXT: Post-Dominator Tree Construction
	; CHECK-NEXT: Aggressive Dead Code Elimination			; CHECK-NEXT: Aggressive Dead Code Elimination
	; CHECK-NEXT: Basic Alias Analysis (stateless AA impl)			; CHECK-NEXT: Basic Alias Analysis (stateless AA impl)
	; CHECK-NEXT: Function Alias Analysis Results			; CHECK-NEXT: Function Alias Analysis Results
	; CHECK-NEXT: Memory SSA			; CHECK-NEXT: Memory SSA
	; CHECK-NEXT: MemCpy Optimization			; CHECK-NEXT: MemCpy Optimization
	; CHECK-NEXT: Dead Store Elimination
	; CHECK-NEXT: Natural Loop Information			; CHECK-NEXT: Natural Loop Information
				; CHECK-NEXT: Dead Store Elimination
	; CHECK-NEXT: Canonicalize natural loops			; CHECK-NEXT: Canonicalize natural loops
	; CHECK-NEXT: LCSSA Verifier			; CHECK-NEXT: LCSSA Verifier
	; CHECK-NEXT: Loop-Closed SSA Form Pass			; CHECK-NEXT: Loop-Closed SSA Form Pass
	; CHECK-NEXT: Function Alias Analysis Results			; CHECK-NEXT: Function Alias Analysis Results
	; CHECK-NEXT: Scalar Evolution Analysis			; CHECK-NEXT: Scalar Evolution Analysis
	; CHECK-NEXT: Lazy Branch Probability Analysis			; CHECK-NEXT: Lazy Branch Probability Analysis
	; CHECK-NEXT: Lazy Block Frequency Analysis			; CHECK-NEXT: Lazy Block Frequency Analysis
	; CHECK-NEXT: Loop Pass Manager			; CHECK-NEXT: Loop Pass Manager
	▲ Show 20 Lines • Show All 168 Lines • Show Last 20 Lines

llvm/test/Other/opt-O3-pipeline.ll

	Show First 20 Lines • Show All 165 Lines • ▼ Show 20 Lines
	; CHECK-NEXT: Jump Threading			; CHECK-NEXT: Jump Threading
	; CHECK-NEXT: Value Propagation			; CHECK-NEXT: Value Propagation
	; CHECK-NEXT: Post-Dominator Tree Construction			; CHECK-NEXT: Post-Dominator Tree Construction
	; CHECK-NEXT: Aggressive Dead Code Elimination			; CHECK-NEXT: Aggressive Dead Code Elimination
	; CHECK-NEXT: Basic Alias Analysis (stateless AA impl)			; CHECK-NEXT: Basic Alias Analysis (stateless AA impl)
	; CHECK-NEXT: Function Alias Analysis Results			; CHECK-NEXT: Function Alias Analysis Results
	; CHECK-NEXT: Memory SSA			; CHECK-NEXT: Memory SSA
	; CHECK-NEXT: MemCpy Optimization			; CHECK-NEXT: MemCpy Optimization
	; CHECK-NEXT: Dead Store Elimination
	; CHECK-NEXT: Natural Loop Information			; CHECK-NEXT: Natural Loop Information
				; CHECK-NEXT: Dead Store Elimination
	; CHECK-NEXT: Canonicalize natural loops			; CHECK-NEXT: Canonicalize natural loops
	; CHECK-NEXT: LCSSA Verifier			; CHECK-NEXT: LCSSA Verifier
	; CHECK-NEXT: Loop-Closed SSA Form Pass			; CHECK-NEXT: Loop-Closed SSA Form Pass
	; CHECK-NEXT: Function Alias Analysis Results			; CHECK-NEXT: Function Alias Analysis Results
	; CHECK-NEXT: Scalar Evolution Analysis			; CHECK-NEXT: Scalar Evolution Analysis
	; CHECK-NEXT: Lazy Branch Probability Analysis			; CHECK-NEXT: Lazy Branch Probability Analysis
	; CHECK-NEXT: Lazy Block Frequency Analysis			; CHECK-NEXT: Lazy Block Frequency Analysis
	; CHECK-NEXT: Loop Pass Manager			; CHECK-NEXT: Loop Pass Manager
	▲ Show 20 Lines • Show All 161 Lines • Show Last 20 Lines

llvm/test/Other/opt-Os-pipeline.ll

	Show First 20 Lines • Show All 146 Lines • ▼ Show 20 Lines
	; CHECK-NEXT: Jump Threading			; CHECK-NEXT: Jump Threading
	; CHECK-NEXT: Value Propagation			; CHECK-NEXT: Value Propagation
	; CHECK-NEXT: Post-Dominator Tree Construction			; CHECK-NEXT: Post-Dominator Tree Construction
	; CHECK-NEXT: Aggressive Dead Code Elimination			; CHECK-NEXT: Aggressive Dead Code Elimination
	; CHECK-NEXT: Basic Alias Analysis (stateless AA impl)			; CHECK-NEXT: Basic Alias Analysis (stateless AA impl)
	; CHECK-NEXT: Function Alias Analysis Results			; CHECK-NEXT: Function Alias Analysis Results
	; CHECK-NEXT: Memory SSA			; CHECK-NEXT: Memory SSA
	; CHECK-NEXT: MemCpy Optimization			; CHECK-NEXT: MemCpy Optimization
	; CHECK-NEXT: Dead Store Elimination
	; CHECK-NEXT: Natural Loop Information			; CHECK-NEXT: Natural Loop Information
				; CHECK-NEXT: Dead Store Elimination
	; CHECK-NEXT: Canonicalize natural loops			; CHECK-NEXT: Canonicalize natural loops
	; CHECK-NEXT: LCSSA Verifier			; CHECK-NEXT: LCSSA Verifier
	; CHECK-NEXT: Loop-Closed SSA Form Pass			; CHECK-NEXT: Loop-Closed SSA Form Pass
	; CHECK-NEXT: Function Alias Analysis Results			; CHECK-NEXT: Function Alias Analysis Results
	; CHECK-NEXT: Scalar Evolution Analysis			; CHECK-NEXT: Scalar Evolution Analysis
	; CHECK-NEXT: Lazy Branch Probability Analysis			; CHECK-NEXT: Lazy Branch Probability Analysis
	; CHECK-NEXT: Lazy Block Frequency Analysis			; CHECK-NEXT: Lazy Block Frequency Analysis
	; CHECK-NEXT: Loop Pass Manager			; CHECK-NEXT: Loop Pass Manager
	▲ Show 20 Lines • Show All 161 Lines • Show Last 20 Lines

llvm/test/Transforms/DeadStoreElimination/multiblock-loops.ll

	Show First 20 Lines • Show All 105 Lines • ▼ Show 20 Lines
	; CHECK-NEXT: br i1 [[CMP27]], label [[FOR_BODY4_LR_PH_PREHEADER:%.]], label [[FOR_COND_CLEANUP:%.]]			; CHECK-NEXT: br i1 [[CMP27]], label [[FOR_BODY4_LR_PH_PREHEADER:%.]], label [[FOR_COND_CLEANUP:%.]]
	; CHECK: for.body4.lr.ph.preheader:			; CHECK: for.body4.lr.ph.preheader:
	; CHECK-NEXT: br label [[FOR_BODY4_LR_PH:%.*]]			; CHECK-NEXT: br label [[FOR_BODY4_LR_PH:%.*]]
	; CHECK: for.cond.cleanup:			; CHECK: for.cond.cleanup:
	; CHECK-NEXT: ret void			; CHECK-NEXT: ret void
	; CHECK: for.body4.lr.ph:			; CHECK: for.body4.lr.ph:
	; CHECK-NEXT: [[I_028:%.]] = phi i32 [ [[INC11:%.]], [[FOR_COND_CLEANUP3:%.*]] ], [ 0, [[FOR_BODY4_LR_PH_PREHEADER]] ]			; CHECK-NEXT: [[I_028:%.]] = phi i32 [ [[INC11:%.]], [[FOR_COND_CLEANUP3:%.*]] ], [ 0, [[FOR_BODY4_LR_PH_PREHEADER]] ]
	; CHECK-NEXT: [[ARRAYIDX:%.]] = getelementptr inbounds i32, i32 [[B:%.*]], i32 [[I_028]]			; CHECK-NEXT: [[ARRAYIDX:%.]] = getelementptr inbounds i32, i32 [[B:%.*]], i32 [[I_028]]
	; CHECK-NEXT: store i32 0, i32* [[ARRAYIDX]], align 4
	; CHECK-NEXT: [[MUL:%.*]] = mul nsw i32 [[I_028]], [[N]]			; CHECK-NEXT: [[MUL:%.*]] = mul nsw i32 [[I_028]], [[N]]
	; CHECK-NEXT: br label [[FOR_BODY4:%.*]]			; CHECK-NEXT: br label [[FOR_BODY4:%.*]]
	; CHECK: for.body4:			; CHECK: for.body4:
	; CHECK-NEXT: [[TMP0:%.]] = phi i32 [ 0, [[FOR_BODY4_LR_PH]] ], [ [[ADD9:%.]], [[FOR_BODY4]] ]			; CHECK-NEXT: [[TMP0:%.]] = phi i32 [ 0, [[FOR_BODY4_LR_PH]] ], [ [[ADD9:%.]], [[FOR_BODY4]] ]
	; CHECK-NEXT: [[J_026:%.]] = phi i32 [ 0, [[FOR_BODY4_LR_PH]] ], [ [[INC:%.]], [[FOR_BODY4]] ]			; CHECK-NEXT: [[J_026:%.]] = phi i32 [ 0, [[FOR_BODY4_LR_PH]] ], [ [[INC:%.]], [[FOR_BODY4]] ]
	; CHECK-NEXT: [[ADD:%.*]] = add nsw i32 [[J_026]], [[MUL]]			; CHECK-NEXT: [[ADD:%.*]] = add nsw i32 [[J_026]], [[MUL]]
	; CHECK-NEXT: [[ARRAYIDX5:%.]] = getelementptr inbounds i32, i32 [[A:%.*]], i32 [[ADD]]			; CHECK-NEXT: [[ARRAYIDX5:%.]] = getelementptr inbounds i32, i32 [[A:%.*]], i32 [[ADD]]
	; CHECK-NEXT: [[TMP1:%.]] = load i32, i32 [[ARRAYIDX5]], align 4			; CHECK-NEXT: [[TMP1:%.]] = load i32, i32 [[ARRAYIDX5]], align 4
	▲ Show 20 Lines • Show All 228 Lines • ▼ Show 20 Lines
	if.end: ; preds = %do.body			if.end: ; preds = %do.body
	%inc = add nuw nsw i16 %i.0, 1			%inc = add nuw nsw i16 %i.0, 1
	br label %do.body			br label %do.body

	if.end10: ; preds = %do.body			if.end10: ; preds = %do.body
	store i16 1, i16* %arrayidx2, align 1			store i16 1, i16* %arrayidx2, align 1
	ret i16 0			ret i16 0
	}			}

				; Similar to above, but with an irreducible loop. The stores should not be removed.
				define i16 @irreducible(i1 %c) {
				; CHECK-LABEL: @irreducible(
				; CHECK-NEXT: entry:
				; CHECK-NEXT: br i1 [[C:%.]], label [[A:%.]], label [[B:%.*]]
				; CHECK: A:
				; CHECK-NEXT: [[I_0:%.]] = phi i16 [ 0, [[ENTRY:%.]] ], [ [[INC:%.*]], [[B]] ]
				; CHECK-NEXT: [[ARRAYIDX2:%.]] = getelementptr inbounds [10 x i16], [10 x i16] @x, i16 0, i16 [[I_0]]
				; CHECK-NEXT: br label [[B]]
				; CHECK: B:
				; CHECK-NEXT: [[J_0:%.*]] = phi i16 [ 0, [[ENTRY]] ], [ [[I_0]], [[A]] ]
				; CHECK-NEXT: [[ARRAYIDX:%.]] = getelementptr inbounds [10 x i16], [10 x i16] @x, i16 0, i16 [[J_0]]
				; CHECK-NEXT: store i16 2, i16* [[ARRAYIDX]], align 1
				; CHECK-NEXT: [[INC]] = add nuw nsw i16 [[J_0]], 1
				; CHECK-NEXT: [[EXITCOND:%.*]] = icmp eq i16 [[J_0]], 4
				; CHECK-NEXT: br i1 [[EXITCOND]], label [[EXIT:%.*]], label [[A]]
				; CHECK: exit:
				; CHECK-NEXT: store i16 1, i16* [[ARRAYIDX]], align 1
				; CHECK-NEXT: ret i16 0
				;
				entry:
				br i1 %c, label %A, label %B

				A:
				%i.0 = phi i16 [ 0, %entry ], [ %inc, %B ]
				%arrayidx2 = getelementptr inbounds [10 x i16], [10 x i16]* @x, i16 0, i16 %i.0
				br label %B

				B:
				%j.0 = phi i16 [ 0, %entry ], [ %i.0, %A ]
				%arrayidx = getelementptr inbounds [10 x i16], [10 x i16]* @x, i16 0, i16 %j.0
				store i16 2, i16* %arrayidx, align 1
				%inc = add nuw nsw i16 %j.0, 1
				%exitcond = icmp eq i16 %j.0, 4
				br i1 %exitcond, label %exit, label %A

				exit:
				store i16 1, i16* %arrayidx, align 1
				ret i16 0
				}

				; An irreducible loop inside another loop.
				define i16 @irreducible_nested() {
				; CHECK-LABEL: @irreducible_nested(
				; CHECK-NEXT: entry:
				; CHECK-NEXT: br label [[OUTER:%.*]]
				; CHECK: outer:
				; CHECK-NEXT: [[X:%.]] = phi i16 [ 0, [[ENTRY:%.]] ], [ [[INCX:%.]], [[OUTERL:%.]] ]
				; CHECK-NEXT: [[C:%.*]] = icmp sgt i16 [[X]], 2
				; CHECK-NEXT: br i1 [[C]], label [[A:%.]], label [[B:%.]]
				; CHECK: A:
				; CHECK-NEXT: [[I_0:%.]] = phi i16 [ 0, [[OUTER]] ], [ [[INC:%.]], [[B]] ]
				; CHECK-NEXT: [[ARRAYIDX2:%.]] = getelementptr inbounds [10 x i16], [10 x i16] @x, i16 0, i16 [[I_0]]
				; CHECK-NEXT: br label [[B]]
				; CHECK: B:
				; CHECK-NEXT: [[J_0:%.*]] = phi i16 [ 0, [[OUTER]] ], [ [[I_0]], [[A]] ]
				; CHECK-NEXT: [[ARRAYIDX:%.]] = getelementptr inbounds [10 x i16], [10 x i16] @x, i16 0, i16 [[J_0]]
				; CHECK-NEXT: store i16 2, i16* [[ARRAYIDX]], align 1
				; CHECK-NEXT: [[INC]] = add nuw nsw i16 [[J_0]], 1
				; CHECK-NEXT: [[EXITCOND:%.*]] = icmp eq i16 [[J_0]], 4
				; CHECK-NEXT: br i1 [[EXITCOND]], label [[OUTERL]], label [[A]]
				; CHECK: outerl:
				; CHECK-NEXT: store i16 1, i16* [[ARRAYIDX]], align 1
				; CHECK-NEXT: [[INCX]] = add nuw nsw i16 [[X]], 1
				; CHECK-NEXT: [[EXITCONDX:%.*]] = icmp eq i16 [[X]], 4
				; CHECK-NEXT: br i1 [[EXITCONDX]], label [[END:%.*]], label [[OUTER]]
				; CHECK: end:
				; CHECK-NEXT: ret i16 0
				;
				entry:
				br label %outer

				outer:
				%x = phi i16 [ 0, %entry ], [ %incx, %outerl ]
				%c = icmp sgt i16 %x, 2
				br i1 %c, label %A, label %B

				A:
				%i.0 = phi i16 [ 0, %outer ], [ %inc, %B ]
				%arrayidx2 = getelementptr inbounds [10 x i16], [10 x i16]* @x, i16 0, i16 %i.0
				br label %B

				B:
				%j.0 = phi i16 [ 0, %outer ], [ %i.0, %A ]
				%arrayidx = getelementptr inbounds [10 x i16], [10 x i16]* @x, i16 0, i16 %j.0
				store i16 2, i16* %arrayidx, align 1
				%inc = add nuw nsw i16 %j.0, 1
				%exitcond = icmp eq i16 %j.0, 4
				br i1 %exitcond, label %outerl, label %A

				outerl:
				store i16 1, i16* %arrayidx, align 1
				%incx = add nuw nsw i16 %x, 1
				%exitcondx = icmp eq i16 %x, 4
				br i1 %exitcondx, label %end, label %outer

				end:
				ret i16 0
				}

This is an archive of the discontinued LLVM Phabricator instance.

[DSE] Remove stores in the same loop iterationClosedPublic

Details

Diff Detail

Event Timeline

Revision Contents

Diff 338014

llvm/lib/Transforms/Scalar/DeadStoreElimination.cpp

llvm/test/CodeGen/AMDGPU/opt-pipeline.ll

llvm/test/Other/opt-O2-pipeline.ll

llvm/test/Other/opt-O3-pipeline-enable-matrix.ll

llvm/test/Other/opt-O3-pipeline.ll

llvm/test/Other/opt-Os-pipeline.ll

llvm/test/Transforms/DeadStoreElimination/multiblock-loops.ll

[DSE] Remove stores in the same loop iteration
ClosedPublic