This is an archive of the discontinued LLVM Phabricator instance.

Paths

Table of Contentst

-
llvm/
-
lib/Transforms/Scalar/
-
Transforms/
-
Scalar/
-
DeadStoreElimination.cpp
-
test/Transforms/DeadStoreElimination/
-
Transforms/
-
DeadStoreElimination/
-
loop-variant-store-complete-overwrite.ll

Differential D109280

[WIP][DSE] Remove memset that is overwritten by a store in a loop
Needs ReviewPublic

Authored by vdsered on Sep 4 2021, 1:45 PM.

Download Raw Diff

This revision needs review, but there are no reviewers specified.

Details

Reviewers: None

Summary

NOTE: This patch is still in progress

LLVM cannot optimize out some excessive store in loop/memset when another store in loop later writes to the same memory region

Practically, it might happen in cases like Fail to eliminate deadstore from vector resize when an array/vector is prefilled with some constant and for-loop writes something else there

These use cases basically look like this

for (int i = 0; i < N; ++i)
   P[i] = 1;
for (int i = 0; i < N; ++i)
   P[i] = 1;

memset(P, 0, N * sizeof(int))
for (int i = 0; i < N; ++i)
   P[i] = 1;

We have already partially solved this problem by LoopIdiomRecognize that transforms loop into memset which DSE can successfully remove like in this example

This patch should only generalize this to cases when LIR fails to transform loop with single basic block like in this example, so we cannot remove a memset because alias analysis gives up here

There are going to be subsequent patches that solve this for more cases so it is incrementally fixed

Diff Detail

Event Timeline

vdsered created this revision.Sep 4 2021, 1:45 PM

Herald added a subscriber: hiraditya. · View Herald TranscriptSep 4 2021, 1:45 PM

vdsered requested review of this revision.Sep 4 2021, 1:45 PM

Herald added a project: Restricted Project. · View Herald TranscriptSep 4 2021, 1:45 PM

Herald added a subscriber: llvm-commits. · View Herald Transcript

Harbormaster completed remote builds in B122650: Diff 370761.Sep 4 2021, 2:27 PM

• hafixo added a commit: rCRT373035: hwasan: Compatibility fixes for short granules..Sep 6 2021, 12:44 AM

• hafixo added a commit: rGc336557f0238: hwasan: Compatibility fixes for short granules..Sep 6 2021, 12:47 AM

thopre removed a commit: rGc336557f0238: hwasan: Compatibility fixes for short granules..Sep 7 2021, 2:47 AM

thopre removed a commit: rCRT373035: hwasan: Compatibility fixes for short granules..Sep 7 2021, 2:51 AM

Added one test for a loop with several latches where we don't always overwrite the region where memory intrinsic writes to
Fixed formatting
Replaced typed pointers with opaque pointers (--force-opaque-pointers is used in tests)
Fixed crash caused by an assertion for multiplication of operands with different types
Optimized ContinuousMemoryRegion and removed one field from there
Added a feature flag EnableMemIntrinsicEliminationByStores to enable/disable this feature at runtime (probably not the best naming)

Harbormaster completed remote builds in B123108: Diff 371432.Sep 8 2021, 2:23 PM

Rebase + fixed opt-pipeline.ll tests for AMDGPU

Herald added subscribers: kerbowa, nhaehnle, jvesely. · View Herald TranscriptSep 8 2021, 8:20 PM

Harbormaster completed remote builds in B123154: Diff 371495.Sep 8 2021, 9:11 PM

Whitney added a subscriber: Whitney.Sep 13 2021, 8:59 AM

Simplified code by removing unnecessary if-statements
Turned AddMemRange lambda to a method
memset can be eliminated when memset has constant length and loop has constant iteration count + added a test for this

Harbormaster completed remote builds in B124048: Diff 372746.Sep 15 2021, 11:40 AM

Transform loops into loop rotated simplify form
Use guards instead of directly finding icmps
Implemented optimization for loop + loop
Decreased the number of times when we compute range for memory intrinsics
Running DSE loop twice, 1 as it was before and 2 for optimization across loops

All your 3 examples - should just loopidiomrecognize just catch it and produce intrinsics? (Not sure about ordering of DSE and LoopIdiom)

Any other motivating cases/benchmarks?

Currently I see no reason to increase complexivity of DSE as LoopIdiom + DSE should work fine together to handle them.

Results for current patch for SingleSource/MultiSource are in the table below. Other don't show any different. Metric is dse.RemainingNumStores

name	baseline	experiment	diff
security-blowfish	159.00	160.00	0.6%
ReedSolomon	209.00	210.00	0.5%
paq8p	2041.00	2044.00	0.1%
kc	51975.00	51974.00	-0.0%
CLAMR	20194.00	20192.00	-0.0%
consumer-typeset	22144.00	22139.00	-0.0%
oggenc	4229.00	4228.00	-0.0%
make_dparser	3033.00	3032.00	-0.0%
ldecod	6044.00	6042.00	-0.0%
sim	329.00	328.00	-0.3%
espresso	1772.00	1766.00	-0.3%
anagram	120.00	119.00	-0.8%

I'm not sure why it removes less stores in the three tests.

Hi, @xbolva00. Are you sure that we can use loop idiom and transform arbitrary loop with a store into let's a memset?

Regardning memset, it accepts i8 type as a value. I know how it can transform a loop with stores like store i32 0, i32* %9 or e.g. store i32 286331153, i32* %9, but it'd fail on an example where we'd write store i32 1, i32* %9

Harbormaster completed remote builds in B127090: Diff 377258.Oct 5 2021, 8:48 AM

In D109280#3043121, @vdsered wrote:

Hi, @xbolva00. Are you sure that we can use loop idiom and transform arbitrary loop with a store into let's a memset?

Regardning memset, it accepts i8 type as a value. I know how it can transform a loop with stores like store i32 0, i32* %9 or e.g. store i32 286331153, i32* %9, but it'd fail on an example where we'd write store i32 1, i32* %9

Ah, right.

I think this would have nontrivial impact on compile time (@nikic ?) and the results from testsuite do not look so promising - I would expect more “hits” to justify (small if possible) compile time regression.

In D109280#3043179, @xbolva00 wrote:

I think this would have nontrivial impact on compile time (@nikic ?) and the results from testsuite do not look so promising - I would expect more “hits” to justify (small if possible) compile time regression.

Yes, this has a large negative effect: https://llvm-compile-time-tracker.com/compare.php?from=64eaffb613d0cb7fa7542fa48281a2e617ad8ee9&to=6e7452757e16c4260fa9a5862761a68ed778dbf9&stat=instructions

@vdsered are you still planning on pushing this forward?

In D109280#3043563, @nikic wrote:

In D109280#3043179, @xbolva00 wrote:

I think this would have nontrivial impact on compile time (@nikic ?) and the results from testsuite do not look so promising - I would expect more “hits” to justify (small if possible) compile time regression.

Yes, this has a large negative effect: https://llvm-compile-time-tracker.com/compare.php?from=64eaffb613d0cb7fa7542fa48281a2e617ad8ee9&to=6e7452757e16c4260fa9a5862761a68ed778dbf9&stat=instructions

Agreed that the compile-time impact looks very large, but I *think* it may not be as bad as it looks at the moment. There seems to be a bug in the code that means we effectively run the main DSE loop twice if EnableOptimizationAcrossLoops = false (as in this patch).

Thee patch unconditionally runs the whole main DSE loop again, as below. prepareStateForAcrossLoopOptimization has en early exit if !EnableOptimizationAcrossLoops. Otherwise clears MemDeps and roughly only adds MemDeps with loop ranges. So as a consequence, if !EnableOptimizationAcrossLoops we process *all* MemoryDeps again.

MadeChange |= State.prepareStateForAcrossLoopOptimization();
MadeChange |= runDSEOptimizationLoop(State);

I tried to rebase the patches and collect compile-time numbers with that issue fixed: https://llvm-compile-time-tracker.com/compare.php?from=f622c7b7d33b211517d8fe4f725d1028d786fc08&to=00a8810b123e606c19d9926d11183318323a8752&stat=instructions

NewPM-O3: +0.24%
NewPM-ReleaseThinLTO: +0.39%
NewPM-ReleaseLTO-g: +0.26%

We should still see if we can get those down further, but I think they look more encouraging to start with.

In D109280#3291535, @fhahn wrote:
@vdsered are you still planning on pushing this forward?

In D109280#3043563, @nikic wrote:

In D109280#3043179, @xbolva00 wrote:

I think this would have nontrivial impact on compile time (@nikic ?) and the results from testsuite do not look so promising - I would expect more “hits” to justify (small if possible) compile time regression.

Yes, this has a large negative effect: https://llvm-compile-time-tracker.com/compare.php?from=64eaffb613d0cb7fa7542fa48281a2e617ad8ee9&to=6e7452757e16c4260fa9a5862761a68ed778dbf9&stat=instructions

Agreed that the compile-time impact looks very large, but I *think* it may not be as bad as it looks at the moment. There seems to be a bug in the code that means we effectively run the main DSE loop twice if EnableOptimizationAcrossLoops = false (as in this patch).

Thee patch unconditionally runs the whole main DSE loop again, as below. prepareStateForAcrossLoopOptimization has en early exit if !EnableOptimizationAcrossLoops. Otherwise clears MemDeps and roughly only adds MemDeps with loop ranges. So as a consequence, if !EnableOptimizationAcrossLoops we process *all* MemoryDeps again.
MadeChange |= State.prepareStateForAcrossLoopOptimization();
MadeChange |= runDSEOptimizationLoop(State);
I tried to rebase the patches and collect compile-time numbers with that issue fixed: https://llvm-compile-time-tracker.com/compare.php?from=f622c7b7d33b211517d8fe4f725d1028d786fc08&to=00a8810b123e606c19d9926d11183318323a8752&stat=instructions

NewPM-O3: +0.24%
NewPM-ReleaseThinLTO: +0.39%
NewPM-ReleaseLTO-g: +0.26%

We should still see if we can get those down further, but I think they look more encouraging to start with.

@fhahn Yes, I do

Thank you for this analysis.

I think there are potential enhancements. For example, loop transformations shouldn't probably be done in this pass. It'd be probably better move this whole optimization into a specialized pass and run it closer to other loop passes in the default pipeline because loops'd be in the right form for free or loop deletion would remove loops that become empty after DSE and so on. However, I'm still not sure if this is a better idea than implementing it right here in DSE.

Plus, it'd be good to see how this behaves on larger projects (LLVM itself and so on).

In D109280#3292637, @vdsered wrote:

snip

I think there are potential enhancements. For example, loop transformations shouldn't probably be done in this pass. It'd be probably better move this whole optimization into a specialized pass and run it closer to other loop passes in the default pipeline because loops'd be in the right form for free or loop deletion would remove loops that become empty after DSE and so on. However, I'm still not sure if this is a better idea than implementing it right here in DSE.

DSE shouldn't rotate loops, yes! Not sure if moving it to a separate pass is necessary to start with. The first set of loop optimizations should already be completed before DSE runs.

Plus, it'd be good to see how this behaves on larger projects (LLVM itself and so on).

Sure. Another interesting data point would be the impact with variable auto-init enabled. Also, another motivating case was raised as issue https://github.com/llvm/llvm-project/issues/53473

Rebased
If EnableOptimizationAcrossLoops is false, then DSE does not run twice
Added two more negative tests cases
Removed loop transformations from this pass
Skip killing store which is not guaranteed to execute

Harbormaster completed remote builds in B147776: Diff 406195.Feb 5 2022, 11:41 AM

Rebase
Simplified this patch

Herald added a project: Restricted Project. · View Herald TranscriptAug 7 2022, 3:16 AM

Harbormaster completed remote builds in B179765: Diff 450613.Aug 7 2022, 5:12 AM

Revision Contents

Path

Size

llvm/

lib/

Transforms/

Scalar/

DeadStoreElimination.cpp

270 lines

test/

Transforms/

DeadStoreElimination/

loop-variant-store-complete-overwrite.ll

109 lines

Diff 450613

llvm/lib/Transforms/Scalar/DeadStoreElimination.cpp

Show First 20 Lines • Show All 45 Lines • ▼ Show 20 Lines
#include "llvm/Analysis/MemoryBuiltins.h"		#include "llvm/Analysis/MemoryBuiltins.h"
#include "llvm/Analysis/MemoryLocation.h"		#include "llvm/Analysis/MemoryLocation.h"
#include "llvm/Analysis/MemorySSA.h"		#include "llvm/Analysis/MemorySSA.h"
#include "llvm/Analysis/MemorySSAUpdater.h"		#include "llvm/Analysis/MemorySSAUpdater.h"
#include "llvm/Analysis/MustExecute.h"		#include "llvm/Analysis/MustExecute.h"
#include "llvm/Analysis/PostDominators.h"		#include "llvm/Analysis/PostDominators.h"
#include "llvm/Analysis/TargetLibraryInfo.h"		#include "llvm/Analysis/TargetLibraryInfo.h"
#include "llvm/Analysis/ValueTracking.h"		#include "llvm/Analysis/ValueTracking.h"
		#include "llvm/Analysis/ScalarEvolution.h"
		#include "llvm/Analysis/ScalarEvolutionExpressions.h"
#include "llvm/IR/Argument.h"		#include "llvm/IR/Argument.h"
#include "llvm/IR/BasicBlock.h"		#include "llvm/IR/BasicBlock.h"
#include "llvm/IR/Constant.h"		#include "llvm/IR/Constant.h"
#include "llvm/IR/Constants.h"		#include "llvm/IR/Constants.h"
#include "llvm/IR/DataLayout.h"		#include "llvm/IR/DataLayout.h"
#include "llvm/IR/Dominators.h"		#include "llvm/IR/Dominators.h"
#include "llvm/IR/Function.h"		#include "llvm/IR/Function.h"
#include "llvm/IR/IRBuilder.h"		#include "llvm/IR/IRBuilder.h"
▲ Show 20 Lines • Show All 42 Lines • ▼ Show 20 Lines
STATISTIC(NumGetDomMemoryDefPassed,		STATISTIC(NumGetDomMemoryDefPassed,
"Number of times a valid candidate is returned from getDomMemoryDef");		"Number of times a valid candidate is returned from getDomMemoryDef");
STATISTIC(NumDomMemDefChecks,		STATISTIC(NumDomMemDefChecks,
"Number iterations check for reads in getDomMemoryDef");		"Number iterations check for reads in getDomMemoryDef");

DEBUG_COUNTER(MemorySSACounter, "dse-memoryssa",		DEBUG_COUNTER(MemorySSACounter, "dse-memoryssa",
"Controls which MemoryDefs are eliminated.");		"Controls which MemoryDefs are eliminated.");

		// If enabled, this optimizes loop patterns like this
		// memset(P, 0, sizeof(int) * N); <- removed
		// for (int i = 0; i < N; ++i)
		// P[i] = 1;
		static cl::opt<bool> EnableOptimizationAcrossLoops(
		"enable-dse-optimize-across-loops", cl::init(true), cl::Hidden,
		cl::desc("Enable elimination for redundant "
		"loop-variant stores and memory intrinsics"));

		static cl::opt<unsigned> AcrossLoopStoreThreshold(
		"dse-across-loop-store-threshold", cl::init(4),
		cl::Hidden,
		cl::desc("The maximum number of stores across loops to optimize"));

static cl::opt<bool>		static cl::opt<bool>
EnablePartialOverwriteTracking("enable-dse-partial-overwrite-tracking",		EnablePartialOverwriteTracking("enable-dse-partial-overwrite-tracking",
cl::init(true), cl::Hidden,		cl::init(true), cl::Hidden,
cl::desc("Enable partial-overwrite tracking in DSE"));		cl::desc("Enable partial-overwrite tracking in DSE"));

static cl::opt<bool>		static cl::opt<bool>
EnablePartialStoreMerging("enable-dse-partial-store-merging",		EnablePartialStoreMerging("enable-dse-partial-store-merging",
cl::init(true), cl::Hidden,		cl::init(true), cl::Hidden,
▲ Show 20 Lines • Show All 600 Lines • ▼ Show 20 Lines	bool canSkipDef(MemoryDef *D, bool DefVisibleToCaller) {

// Skip intrinsics that do not really read or modify memory.		// Skip intrinsics that do not really read or modify memory.
if (isNoopIntrinsic(DI))		if (isNoopIntrinsic(DI))
return true;		return true;

return false;		return false;
}		}

		struct ContinuousMemoryRange {
		const SCEV *Start;
		const SCEV *Length;

		ContinuousMemoryRange(const SCEV Start, const SCEV Length)
		: Start(Start), Length(Length) {}

		static ContinuousMemoryRange createEmpty() {
		return ContinuousMemoryRange(nullptr, nullptr);
		}

		bool isEmpty() { return Start == nullptr \|\| Length == nullptr; }

		bool operator==(const ContinuousMemoryRange &Other) const {
		return Start == Other.Start && Length == Other.Length;
		}

		bool operator!=(const ContinuousMemoryRange &Other) const {
		return !(*this == Other);
		}
		};

struct DSEState {		struct DSEState {
Function &F;		Function &F;
AliasAnalysis &AA;		AliasAnalysis &AA;
EarliestEscapeInfo EI;		EarliestEscapeInfo EI;
		ScalarEvolution *SE;

/// The single BatchAA instance that is used to cache AA queries. It will		/// The single BatchAA instance that is used to cache AA queries. It will
/// not be invalidated over the whole run. This is safe, because:		/// not be invalidated over the whole run. This is safe, because:
/// 1. Only memory writes are removed, so the alias cache for memory		/// 1. Only memory writes are removed, so the alias cache for memory
/// locations remains valid.		/// locations remains valid.
/// 2. No new instructions are added (only instructions removed), so cached		/// 2. No new instructions are added (only instructions removed), so cached
/// information for a deleted value cannot be accessed by a re-used new		/// information for a deleted value cannot be accessed by a re-used new
/// value pointer.		/// value pointer.
Show All 23 Lines	struct DSEState {
SmallPtrSet<BasicBlock *, 16> ThrowingBlocks;		SmallPtrSet<BasicBlock *, 16> ThrowingBlocks;
// Post-order numbers for each basic block. Used to figure out if memory		// Post-order numbers for each basic block. Used to figure out if memory
// accesses are executed before another access.		// accesses are executed before another access.
DenseMap<BasicBlock *, unsigned> PostOrderNumbers;		DenseMap<BasicBlock *, unsigned> PostOrderNumbers;
// Values that are only used with assumes. Used to refine pointer escape		// Values that are only used with assumes. Used to refine pointer escape
// analysis.		// analysis.
SmallPtrSet<const Value *, 32> EphValues;		SmallPtrSet<const Value *, 32> EphValues;

		DenseMap<Instruction *, ContinuousMemoryRange> MemRanges;

/// Keep track of instructions (partly) overlapping with killing MemoryDefs per		/// Keep track of instructions (partly) overlapping with killing MemoryDefs per
/// basic block.		/// basic block.
MapVector<BasicBlock *, InstOverlapIntervalsTy> IOLs;		MapVector<BasicBlock *, InstOverlapIntervalsTy> IOLs;
// Check if there are root nodes that are terminated by UnreachableInst.		// Check if there are root nodes that are terminated by UnreachableInst.
// Those roots pessimize post-dominance queries. If there are such roots,		// Those roots pessimize post-dominance queries. If there are such roots,
// fall back to CFG scan starting from all non-unreachable roots.		// fall back to CFG scan starting from all non-unreachable roots.
bool AnyUnreachableExit;		bool AnyUnreachableExit;

// Whether or not we should iterate on removing dead stores at the end of the		// Whether or not we should iterate on removing dead stores at the end of the
// function due to removing a store causing a previously captured pointer to		// function due to removing a store causing a previously captured pointer to
// no longer be captured.		// no longer be captured.
bool ShouldIterateEndOfFunctionDSE;		bool ShouldIterateEndOfFunctionDSE;

// Class contains self-reference, make sure it's not copied/moved.		// Class contains self-reference, make sure it's not copied/moved.
DSEState(const DSEState &) = delete;		DSEState(const DSEState &) = delete;
DSEState &operator=(const DSEState &) = delete;		DSEState &operator=(const DSEState &) = delete;

		void prepareStateForAcrossLoopOptimization() {
		MemDefs.clear();
		SkipStores.clear();
		for (auto *L : LI) {
		if (!canAddMoreStoresFromLoop()) {
		break;
		}
		if (!L->isInnermost() \|\| !L->isOutermost()) {
		continue;
		}

		if (L->getNumBlocks() != 1) {
		continue;
		}

		prepareLoopVariantStoresWithRanges(L);
		}
		}

DSEState(Function &F, AliasAnalysis &AA, MemorySSA &MSSA, DominatorTree &DT,		DSEState(Function &F, AliasAnalysis &AA, MemorySSA &MSSA, DominatorTree &DT,
PostDominatorTree &PDT, AssumptionCache &AC,		PostDominatorTree &PDT, ScalarEvolution *SE, AssumptionCache &AC,
const TargetLibraryInfo &TLI, const LoopInfo &LI)		const TargetLibraryInfo &TLI, const LoopInfo &LI)
: F(F), AA(AA), EI(DT, LI, EphValues), BatchAA(AA, &EI), MSSA(MSSA),		: F(F), AA(AA), EI(DT, LI, EphValues), BatchAA(AA, &EI), MSSA(MSSA),
DT(DT), PDT(PDT), TLI(TLI), DL(F.getParent()->getDataLayout()), LI(LI) {		DT(DT), PDT(PDT), TLI(TLI), DL(F.getParent()->getDataLayout()), LI(LI), SE(SE) {
// Collect blocks with throwing instructions not modeled in MemorySSA and		// Collect blocks with throwing instructions not modeled in MemorySSA and
// alloc-like objects.		// alloc-like objects.
unsigned PO = 0;		unsigned PO = 0;
for (BasicBlock *BB : post_order(&F)) {		for (BasicBlock *BB : post_order(&F)) {
PostOrderNumbers[BB] = PO++;		PostOrderNumbers[BB] = PO++;
for (Instruction &I : *BB) {		for (Instruction &I : *BB) {
MemoryAccess *MA = MSSA.getMemoryAccess(&I);		MemoryAccess *MA = MSSA.getMemoryAccess(&I);
if (I.mayThrow() && !MA)		if (I.mayThrow() && !MA)
Show All 17 Lines	DSEState(Function &F, AliasAnalysis &AA, MemorySSA &MSSA, DominatorTree &DT,

AnyUnreachableExit = any_of(PDT.roots(), [](const BasicBlock *E) {		AnyUnreachableExit = any_of(PDT.roots(), [](const BasicBlock *E) {
return isa<UnreachableInst>(E->getTerminator());		return isa<UnreachableInst>(E->getTerminator());
});		});

CodeMetrics::collectEphemeralValues(&F, &AC, EphValues);		CodeMetrics::collectEphemeralValues(&F, &AC, EphValues);
}		}

		bool canAddMoreStoresFromLoop() {
		return MemDefs.size() <= AcrossLoopStoreThreshold;
		}

		void prepareLoopVariantStoresWithRanges(Loop *L) {
		auto *BB = L->getLoopLatch();
		if (ThrowingBlocks.count(BB))
		return;

		for (auto &I : *BB) {
		if (I.mayThrow())
		break;
		if (auto *S = dyn_cast<StoreInst>(&I))
		if (auto *MD = dyn_cast<MemoryDef>(MSSA.getMemoryAccess(S))) {
		if (S->isVolatile() \|\| S->isAtomic()) {
		continue;
		}

		auto Range = computeMemoryRange(L, S);
		if (!Range.isEmpty()) {
		MemDefs.push_back(MD);
		addMemRange(S, Range);
		// Do not waste resources on analyzing this block further
		// One store should be enough for eliminating
		// memsets or stores that killed by this loop
		return;
		}
		}
		}
		}

		void addMemRange(Instruction *I, ContinuousMemoryRange &Range) {
		if (Range.isEmpty())
		return;
		MemRanges.insert({I, Range});
		}

		ContinuousMemoryRange computeMemoryRange(const AnyMemIntrinsic *MemIntr) {
		const auto *Length = SE->getSCEV(MemIntr->getLength());
		if (isa<SCEVCouldNotCompute>(Length))
		return ContinuousMemoryRange::createEmpty();
		auto *Start = SE->getSCEV(MemIntr->getDest());
		return ContinuousMemoryRange(Start, Length);
		}

		ContinuousMemoryRange computeMemoryRange(Loop L, StoreInst Store) {
		auto *GEP = dyn_cast<GetElementPtrInst>(Store->getPointerOperand());
		if (!GEP)
		return ContinuousMemoryRange::createEmpty();

		// GEP must be in the same loop as store that uses it
		if (!L->contains(GEP))
		return ContinuousMemoryRange::createEmpty();

		auto *Type = Store->getValueOperand()->getType();
		if (!Type->isFloatingPointTy() && !Type->isIntegerTy())
		return ContinuousMemoryRange::createEmpty();

		auto *StoreRec = dyn_cast<SCEVAddRecExpr>(SE->getSCEV(GEP));
		if (!StoreRec)
		return ContinuousMemoryRange::createEmpty();

		if (!StoreRec->isAffine())
		return ContinuousMemoryRange::createEmpty();

		auto *BTC = SE->getBackedgeTakenCount(L);
		if (isa<SCEVCouldNotCompute>(BTC))
		return ContinuousMemoryRange::createEmpty();

		const auto *Start = StoreRec->getOperand(0);
		if (isa<SCEVCouldNotCompute>(Start))
		return ContinuousMemoryRange::createEmpty();

		auto *Count = SE->getAddExpr(BTC, SE->getOne(BTC->getType()));
		auto SizeInBytes =
		Store->getValueOperand()->getType()->getScalarSizeInBits() / 8;
		auto *StoreSize = SE->getConstant(Count->getType(), SizeInBytes);

		auto *Step = StoreRec->getOperand(1);
		if (Step != StoreSize)
		return ContinuousMemoryRange::createEmpty();

		const auto *Length = SE->getMulExpr(Count, StoreSize);
		if (isa<SCEVCouldNotCompute>(Length))
		return ContinuousMemoryRange::createEmpty();
		return ContinuousMemoryRange(Start, Length);
		}

		bool isOptimizableAcrossLoops(const Instruction *KillingI,
		const Instruction *DeadI) {
		if (!EnableOptimizationAcrossLoops)
		return false;

		// Alias Analysis can handle two memory intrinsics, skip it
		if (isa<AnyMemIntrinsic>(DeadI) && isa<AnyMemIntrinsic>(KillingI))
		return false;
		return true;
		}


/// Return 'OW_Complete' if a store to the 'KillingLoc' location (by \p		/// Return 'OW_Complete' if a store to the 'KillingLoc' location (by \p
/// KillingI instruction) completely overwrites a store to the 'DeadLoc'		/// KillingI instruction) completely overwrites a store to the 'DeadLoc'
/// location (by \p DeadI instruction).		/// location (by \p DeadI instruction).
/// Return OW_MaybePartial if \p KillingI does not completely overwrite		/// Return OW_MaybePartial if \p KillingI does not completely overwrite
/// \p DeadI, but they both write to the same underlying object. In that		/// \p DeadI, but they both write to the same underlying object. In that
/// case, use isPartialOverwrite to check if \p KillingI partially overwrites		/// case, use isPartialOverwrite to check if \p KillingI partially overwrites
/// \p DeadI. Returns 'OR_None' if \p KillingI is known to not overwrite the		/// \p DeadI. Returns 'OR_None' if \p KillingI is known to not overwrite the
/// \p DeadI. Returns 'OW_Unknown' if nothing can be determined.		/// \p DeadI. Returns 'OW_Unknown' if nothing can be determined.
OverwriteResult isOverwrite(const Instruction *KillingI,		OverwriteResult isOverwrite(const Instruction *KillingI,
const Instruction *DeadI,		const Instruction *DeadI,
const MemoryLocation &KillingLoc,		const MemoryLocation &KillingLoc,
const MemoryLocation &DeadLoc,		const MemoryLocation &DeadLoc,
int64_t &KillingOff, int64_t &DeadOff) {		int64_t &KillingOff, int64_t &DeadOff) {

		if (isOptimizableAcrossLoops(KillingI, DeadI))
		if (isLoopVariantStoreOverwrite(KillingI, DeadI))
		return OW_Complete;
// AliasAnalysis does not always account for loops. Limit overwrite checks		// AliasAnalysis does not always account for loops. Limit overwrite checks
// to dependencies for which we can guarantee they are independent of any		// to dependencies for which we can guarantee they are independent of any
// loops they are in.		// loops they are in.
if (!isGuaranteedLoopIndependent(DeadI, KillingI, DeadLoc))		if (!isGuaranteedLoopIndependent(DeadI, KillingI, DeadLoc))
return OW_Unknown;		return OW_Unknown;

const Value *DeadPtr = DeadLoc.Ptr->stripPointerCasts();		const Value *DeadPtr = DeadLoc.Ptr->stripPointerCasts();
const Value *KillingPtr = KillingLoc.Ptr->stripPointerCasts();		const Value *KillingPtr = KillingLoc.Ptr->stripPointerCasts();
▲ Show 20 Lines • Show All 475 Lines • ▼ Show 20 Lines	for (;; Current = cast<MemoryDef>(Current)->getDefiningAccess()) {
CanOptimize = false;		CanOptimize = false;
continue;		continue;
}		}

// AliasAnalysis does not account for loops. Limit elimination to		// AliasAnalysis does not account for loops. Limit elimination to
// candidates for which we can guarantee they always store to the same		// candidates for which we can guarantee they always store to the same
// memory location and not located in different loops.		// memory location and not located in different loops.
if (!isGuaranteedLoopIndependent(CurrentI, KillingI, *CurrentLoc)) {		if (!isGuaranteedLoopIndependent(CurrentI, KillingI, *CurrentLoc)) {
		if (!EnableOptimizationAcrossLoops \|\| MemRanges.empty()) {
LLVM_DEBUG(dbgs() << " ... not guaranteed loop independent\n");		LLVM_DEBUG(dbgs() << " ... not guaranteed loop independent\n");
CanOptimize = false;		CanOptimize = false;
continue;		continue;
}		}
		}

if (IsMemTerm) {		if (IsMemTerm) {
// If the killing def is a memory terminator (e.g. lifetime.end), check		// If the killing def is a memory terminator (e.g. lifetime.end), check
// the next candidate if the current Current does not write the same		// the next candidate if the current Current does not write the same
// underlying object as the terminator.		// underlying object as the terminator.
if (!isMemTerminator(*CurrentLoc, CurrentI, KillingI)) {		if (!isMemTerminator(*CurrentLoc, CurrentI, KillingI)) {
CanOptimize = false;		CanOptimize = false;
continue;		continue;
▲ Show 20 Lines • Show All 119 Lines • ▼ Show 20 Lines	for (unsigned I = 0; I < WorkList.size(); I++) {
return None;		return None;
}		}

// If this worklist walks back to the original memory access (and the		// If this worklist walks back to the original memory access (and the
// pointer is not guarenteed loop invariant) then we cannot assume that a		// pointer is not guarenteed loop invariant) then we cannot assume that a
// store kills itself.		// store kills itself.
if (MaybeDeadAccess == UseAccess &&		if (MaybeDeadAccess == UseAccess &&
!isGuaranteedLoopInvariant(MaybeDeadLoc.Ptr)) {		!isGuaranteedLoopInvariant(MaybeDeadLoc.Ptr)) {
LLVM_DEBUG(dbgs() << " ... found not loop invariant self access\n");		if (!EnableOptimizationAcrossLoops \|\| !MemRanges.count(MaybeDeadI)) {
		LLVM_DEBUG(dbgs()
		<< " ... found not loop invariant self access\n");
return None;		return None;
}		}
		}
// Otherwise, for the KillingDef and MaybeDeadAccess we only have to check		// Otherwise, for the KillingDef and MaybeDeadAccess we only have to check
// if it reads the memory location.		// if it reads the memory location.
// TODO: It would probably be better to check for self-reads before		// TODO: It would probably be better to check for self-reads before
// calling the function.		// calling the function.
if (KillingDef == UseAccess \|\| MaybeDeadAccess == UseAccess) {		if (KillingDef == UseAccess \|\| MaybeDeadAccess == UseAccess) {
LLVM_DEBUG(dbgs() << " ... skipping killing def/dom access\n");		LLVM_DEBUG(dbgs() << " ... skipping killing def/dom access\n");
continue;		continue;
}		}
Show All 30 Lines	getDomMemoryDef(MemoryDef KillingDef, MemoryAccess StartAccess,

// For accesses to locations visible after the function returns, make sure		// For accesses to locations visible after the function returns, make sure
// that the location is dead (=overwritten) along all paths from		// that the location is dead (=overwritten) along all paths from
// MaybeDeadAccess to the exit.		// MaybeDeadAccess to the exit.
if (!isInvisibleToCallerAfterRet(KillingUndObj)) {		if (!isInvisibleToCallerAfterRet(KillingUndObj)) {
SmallPtrSet<BasicBlock *, 16> KillingBlocks;		SmallPtrSet<BasicBlock *, 16> KillingBlocks;
for (Instruction *KD : KillingDefs)		for (Instruction *KD : KillingDefs)
KillingBlocks.insert(KD->getParent());		KillingBlocks.insert(KD->getParent());

assert(!KillingBlocks.empty() &&		assert(!KillingBlocks.empty() &&
"Expected at least a single killing block");		"Expected at least a single killing block");

// Find the common post-dominator of all killing blocks.		// Find the common post-dominator of all killing blocks.
BasicBlock CommonPred = KillingBlocks.begin();		BasicBlock CommonPred = KillingBlocks.begin();
for (BasicBlock *BB : llvm::drop_begin(KillingBlocks)) {		for (BasicBlock *BB : llvm::drop_begin(KillingBlocks)) {
if (!CommonPred)		if (!CommonPred)
break;		break;
▲ Show 20 Lines • Show All 52 Lines • ▼ Show 20 Lines	if (!isInvisibleToCallerAfterRet(KillingUndObj)) {
NumCFGSuccess++;		NumCFGSuccess++;
}		}

// No aliasing MemoryUses of MaybeDeadAccess found, MaybeDeadAccess is		// No aliasing MemoryUses of MaybeDeadAccess found, MaybeDeadAccess is
// potentially dead.		// potentially dead.
return {MaybeDeadAccess};		return {MaybeDeadAccess};
}		}

		bool isLoopVariantStoreOverwrite(const Instruction *KillingI,
		const Instruction *DeadI) {
		auto KillingRange = MemRanges.find(KillingI);
		auto DeadRange = MemRanges.find(DeadI);

		if (MemRanges.count(DeadI) && MemRanges.count(KillingI))
		return DeadRange->second == KillingRange->second;

		if (MemRanges.count(DeadI) && isa<AnyMemIntrinsic>(KillingI)) {
		auto MemIntrRange = computeMemoryRange(cast<AnyMemIntrinsic>(KillingI));
		if (!MemIntrRange.isEmpty())
		return DeadRange->second == MemIntrRange;
		} else if (MemRanges.count(KillingI) && isa<AnyMemIntrinsic>(DeadI)) {
		auto MemIntrRange = computeMemoryRange(cast<AnyMemIntrinsic>(DeadI));
		if (!MemIntrRange.isEmpty())
		return KillingRange->second == MemIntrRange;
		}

		return false;
		}

// Delete dead memory defs		// Delete dead memory defs
void deleteDeadInstruction(Instruction *SI) {		void deleteDeadInstruction(Instruction *SI) {
MemorySSAUpdater Updater(&MSSA);		MemorySSAUpdater Updater(&MSSA);
SmallVector<Instruction *, 32> NowDeadInsts;		SmallVector<Instruction *, 32> NowDeadInsts;
NowDeadInsts.push_back(SI);		NowDeadInsts.push_back(SI);
--NumFastOther;		--NumFastOther;

while (!NowDeadInsts.empty()) {		while (!NowDeadInsts.empty()) {
▲ Show 20 Lines • Show All 350 Lines • ▼ Show 20 Lines	for (auto *Def : MemDefs) {
deleteDeadInstruction(DefInst);		deleteDeadInstruction(DefInst);
NumRedundantStores++;		NumRedundantStores++;
MadeChange = true;		MadeChange = true;
}		}
return MadeChange;		return MadeChange;
}		}
};		};

static bool eliminateDeadStores(Function &F, AliasAnalysis &AA, MemorySSA &MSSA,		static bool runDSEOptimizationLoop(DSEState &State) {
DominatorTree &DT, PostDominatorTree &PDT,
AssumptionCache &AC,
const TargetLibraryInfo &TLI,
const LoopInfo &LI) {
bool MadeChange = false;		bool MadeChange = false;

MSSA.ensureOptimizedUses();
DSEState State(F, AA, MSSA, DT, PDT, AC, TLI, LI);
// For each store:		// For each store:
for (unsigned I = 0; I < State.MemDefs.size(); I++) {		for (unsigned I = 0; I < State.MemDefs.size(); I++) {
MemoryDef *KillingDef = State.MemDefs[I];		MemoryDef *KillingDef = State.MemDefs[I];
if (State.SkipStores.count(KillingDef))		if (State.SkipStores.count(KillingDef))
continue;		continue;
Instruction *KillingI = KillingDef->getMemoryInst();		Instruction *KillingI = KillingDef->getMemoryInst();

Optional<MemoryLocation> MaybeKillingLoc;		Optional<MemoryLocation> MaybeKillingLoc;
▲ Show 20 Lines • Show All 92 Lines • ▼ Show 20 Lines	for (unsigned I = 0; I < ToCheck.size(); I++) {
}		}

if (EnablePartialStoreMerging && OR == OW_PartialEarlierWithFullLater) {		if (EnablePartialStoreMerging && OR == OW_PartialEarlierWithFullLater) {
auto *DeadSI = dyn_cast<StoreInst>(DeadI);		auto *DeadSI = dyn_cast<StoreInst>(DeadI);
auto *KillingSI = dyn_cast<StoreInst>(KillingI);		auto *KillingSI = dyn_cast<StoreInst>(KillingI);
// We are re-using tryToMergePartialOverlappingStores, which requires		// We are re-using tryToMergePartialOverlappingStores, which requires
// DeadSI to dominate DeadSI.		// DeadSI to dominate DeadSI.
// TODO: implement tryToMergeParialOverlappingStores using MemorySSA.		// TODO: implement tryToMergeParialOverlappingStores using MemorySSA.
if (DeadSI && KillingSI && DT.dominates(DeadSI, KillingSI)) {		if (DeadSI && KillingSI && State.DT.dominates(DeadSI, KillingSI)) {
if (Constant *Merged = tryToMergePartialOverlappingStores(		if (Constant *Merged = tryToMergePartialOverlappingStores(
KillingSI, DeadSI, KillingOffset, DeadOffset, State.DL,		KillingSI, DeadSI, KillingOffset, DeadOffset, State.DL,
State.BatchAA, &DT)) {		State.BatchAA, &State.DT)) {

// Update stored value of earlier store to merged constant.		// Update stored value of earlier store to merged constant.
DeadSI->setOperand(0, Merged);		DeadSI->setOperand(0, Merged);
++NumModifiedStores;		++NumModifiedStores;
MadeChange = true;		MadeChange = true;

Shortend = true;		Shortend = true;
// Remove killing store and remove any outstanding overlap		// Remove killing store and remove any outstanding overlap
Show All 31 Lines	for (unsigned I = 0; I < State.MemDefs.size(); I++) {
if (!Shortend && State.tryFoldIntoCalloc(KillingDef, KillingUndObj)) {		if (!Shortend && State.tryFoldIntoCalloc(KillingDef, KillingUndObj)) {
LLVM_DEBUG(dbgs() << "DSE: Remove memset after forming calloc:\n"		LLVM_DEBUG(dbgs() << "DSE: Remove memset after forming calloc:\n"
<< " DEAD: " << *KillingI << '\n');		<< " DEAD: " << *KillingI << '\n');
State.deleteDeadInstruction(KillingI);		State.deleteDeadInstruction(KillingI);
MadeChange = true;		MadeChange = true;
continue;		continue;
}		}
}		}
		return MadeChange;
		}

		static bool runAcrossLoopDSE(DSEState &State) {
		if (!EnableOptimizationAcrossLoops) {
		return false;
		}
		State.prepareStateForAcrossLoopOptimization();
		return runDSEOptimizationLoop(State);
		}

		static bool eliminateDeadStores(Function &F, AliasAnalysis &AA, MemorySSA &MSSA,
		DominatorTree &DT, PostDominatorTree &PDT,
		ScalarEvolution *SE,
		AssumptionCache &AC,
		const TargetLibraryInfo &TLI,
		const LoopInfo &LI) {

		MSSA.ensureOptimizedUses();
		DSEState State(F, AA, MSSA, DT, PDT, SE, AC, TLI, LI);
		bool MadeChange = false;

		MadeChange \|= runDSEOptimizationLoop(State);
if (EnablePartialOverwriteTracking)		if (EnablePartialOverwriteTracking)
for (auto &KV : State.IOLs)		for (auto &KV : State.IOLs)
MadeChange \|= State.removePartiallyOverlappedStores(KV.second);		MadeChange \|= State.removePartiallyOverlappedStores(KV.second);

MadeChange \|= State.eliminateRedundantStoresOfExistingValues();		MadeChange \|= State.eliminateRedundantStoresOfExistingValues();
MadeChange \|= State.eliminateDeadWritesAtEndOfFunction();		MadeChange \|= State.eliminateDeadWritesAtEndOfFunction();
		// Optimize across loops at the end because there should be
		// less stores to analyze. Plus, stores that overwrite some other stores
		// in the same loop are removed too
		MadeChange \|= runAcrossLoopDSE(State);
return MadeChange;		return MadeChange;
}		}
} // end anonymous namespace		} // end anonymous namespace

//===----------------------------------------------------------------------===//		//===----------------------------------------------------------------------===//
// DSE Pass		// DSE Pass
//===----------------------------------------------------------------------===//		//===----------------------------------------------------------------------===//
PreservedAnalyses DSEPass::run(Function &F, FunctionAnalysisManager &AM) {		PreservedAnalyses DSEPass::run(Function &F, FunctionAnalysisManager &AM) {
AliasAnalysis &AA = AM.getResult<AAManager>(F);		AliasAnalysis &AA = AM.getResult<AAManager>(F);
const TargetLibraryInfo &TLI = AM.getResult<TargetLibraryAnalysis>(F);		const TargetLibraryInfo &TLI = AM.getResult<TargetLibraryAnalysis>(F);
DominatorTree &DT = AM.getResult<DominatorTreeAnalysis>(F);		DominatorTree &DT = AM.getResult<DominatorTreeAnalysis>(F);
MemorySSA &MSSA = AM.getResult<MemorySSAAnalysis>(F).getMSSA();		MemorySSA &MSSA = AM.getResult<MemorySSAAnalysis>(F).getMSSA();
PostDominatorTree &PDT = AM.getResult<PostDominatorTreeAnalysis>(F);		PostDominatorTree &PDT = AM.getResult<PostDominatorTreeAnalysis>(F);
AssumptionCache &AC = AM.getResult<AssumptionAnalysis>(F);		AssumptionCache &AC = AM.getResult<AssumptionAnalysis>(F);
LoopInfo &LI = AM.getResult<LoopAnalysis>(F);		LoopInfo &LI = AM.getResult<LoopAnalysis>(F);

bool Changed = eliminateDeadStores(F, AA, MSSA, DT, PDT, AC, TLI, LI);		ScalarEvolution *SE = nullptr;
		if (EnableOptimizationAcrossLoops)
		SE = &AM.getResult<ScalarEvolutionAnalysis>(F);

		bool Changed = eliminateDeadStores(F, AA, MSSA, DT, PDT, SE, AC, TLI, LI);

#ifdef LLVM_ENABLE_STATS		#ifdef LLVM_ENABLE_STATS
if (AreStatisticsEnabled())		if (AreStatisticsEnabled())
for (auto &I : instructions(F))		for (auto &I : instructions(F))
NumRemainingStores += isa<StoreInst>(&I);		NumRemainingStores += isa<StoreInst>(&I);
#endif		#endif

if (!Changed)		if (!Changed) {
return PreservedAnalyses::all();		return PreservedAnalyses::all();
		}
PreservedAnalyses PA;		PreservedAnalyses PA;
		if (EnableOptimizationAcrossLoops) {
		PA.preserve<ScalarEvolutionAnalysis>();
		}

PA.preserveSet<CFGAnalyses>();		PA.preserveSet<CFGAnalyses>();
PA.preserve<MemorySSAAnalysis>();		PA.preserve<MemorySSAAnalysis>();
PA.preserve<LoopAnalysis>();		PA.preserve<LoopAnalysis>();
return PA;		return PA;
}		}

namespace {		namespace {

Show All 15 Lines	bool runOnFunction(Function &F) override {
const TargetLibraryInfo &TLI =		const TargetLibraryInfo &TLI =
getAnalysis<TargetLibraryInfoWrapperPass>().getTLI(F);		getAnalysis<TargetLibraryInfoWrapperPass>().getTLI(F);
MemorySSA &MSSA = getAnalysis<MemorySSAWrapperPass>().getMSSA();		MemorySSA &MSSA = getAnalysis<MemorySSAWrapperPass>().getMSSA();
PostDominatorTree &PDT =		PostDominatorTree &PDT =
getAnalysis<PostDominatorTreeWrapperPass>().getPostDomTree();		getAnalysis<PostDominatorTreeWrapperPass>().getPostDomTree();
AssumptionCache &AC =		AssumptionCache &AC =
getAnalysis<AssumptionCacheTracker>().getAssumptionCache(F);		getAnalysis<AssumptionCacheTracker>().getAssumptionCache(F);
LoopInfo &LI = getAnalysis<LoopInfoWrapperPass>().getLoopInfo();		LoopInfo &LI = getAnalysis<LoopInfoWrapperPass>().getLoopInfo();
		ScalarEvolution *SE = nullptr;
bool Changed = eliminateDeadStores(F, AA, MSSA, DT, PDT, AC, TLI, LI);		if (EnableOptimizationAcrossLoops) {
		SE = &getAnalysis<ScalarEvolutionWrapperPass>().getSE();
		}
		bool Changed = eliminateDeadStores(F, AA, MSSA, DT, PDT, SE, AC, TLI, LI);

#ifdef LLVM_ENABLE_STATS		#ifdef LLVM_ENABLE_STATS
if (AreStatisticsEnabled())		if (AreStatisticsEnabled())
for (auto &I : instructions(F))		for (auto &I : instructions(F))
NumRemainingStores += isa<StoreInst>(&I);		NumRemainingStores += isa<StoreInst>(&I);
#endif		#endif

return Changed;		return Changed;
}		}

void getAnalysisUsage(AnalysisUsage &AU) const override {		void getAnalysisUsage(AnalysisUsage &AU) const override {
AU.setPreservesCFG();		AU.setPreservesCFG();
		if (EnableOptimizationAcrossLoops) {
		AU.addRequired<ScalarEvolutionWrapperPass>();
		AU.addPreserved<ScalarEvolutionWrapperPass>();
		}
AU.addRequired<AAResultsWrapperPass>();		AU.addRequired<AAResultsWrapperPass>();
AU.addRequired<TargetLibraryInfoWrapperPass>();		AU.addRequired<TargetLibraryInfoWrapperPass>();
AU.addPreserved<GlobalsAAWrapperPass>();		AU.addPreserved<GlobalsAAWrapperPass>();
AU.addRequired<DominatorTreeWrapperPass>();		AU.addRequired<DominatorTreeWrapperPass>();
AU.addPreserved<DominatorTreeWrapperPass>();		AU.addPreserved<DominatorTreeWrapperPass>();
AU.addRequired<PostDominatorTreeWrapperPass>();		AU.addRequired<PostDominatorTreeWrapperPass>();
AU.addRequired<MemorySSAWrapperPass>();		AU.addRequired<MemorySSAWrapperPass>();
AU.addPreserved<PostDominatorTreeWrapperPass>();		AU.addPreserved<PostDominatorTreeWrapperPass>();
Show All 14 Lines
INITIALIZE_PASS_DEPENDENCY(PostDominatorTreeWrapperPass)		INITIALIZE_PASS_DEPENDENCY(PostDominatorTreeWrapperPass)
INITIALIZE_PASS_DEPENDENCY(AAResultsWrapperPass)		INITIALIZE_PASS_DEPENDENCY(AAResultsWrapperPass)
INITIALIZE_PASS_DEPENDENCY(GlobalsAAWrapperPass)		INITIALIZE_PASS_DEPENDENCY(GlobalsAAWrapperPass)
INITIALIZE_PASS_DEPENDENCY(MemorySSAWrapperPass)		INITIALIZE_PASS_DEPENDENCY(MemorySSAWrapperPass)
INITIALIZE_PASS_DEPENDENCY(MemoryDependenceWrapperPass)		INITIALIZE_PASS_DEPENDENCY(MemoryDependenceWrapperPass)
INITIALIZE_PASS_DEPENDENCY(TargetLibraryInfoWrapperPass)		INITIALIZE_PASS_DEPENDENCY(TargetLibraryInfoWrapperPass)
INITIALIZE_PASS_DEPENDENCY(LoopInfoWrapperPass)		INITIALIZE_PASS_DEPENDENCY(LoopInfoWrapperPass)
INITIALIZE_PASS_DEPENDENCY(AssumptionCacheTracker)		INITIALIZE_PASS_DEPENDENCY(AssumptionCacheTracker)
		INITIALIZE_PASS_DEPENDENCY(ScalarEvolutionWrapperPass)
INITIALIZE_PASS_END(DSELegacyPass, "dse", "Dead Store Elimination", false,		INITIALIZE_PASS_END(DSELegacyPass, "dse", "Dead Store Elimination", false,
false)		false)

FunctionPass *llvm::createDeadStoreEliminationPass() {		FunctionPass *llvm::createDeadStoreEliminationPass() {
return new DSELegacyPass();		return new DSELegacyPass();
}		}

llvm/test/Transforms/DeadStoreElimination/loop-variant-store-complete-overwrite.ll

This file was added.

				; NOTE: Assertions have been autogenerated by utils/update_test_checks.py
				; RUN: opt -S --passes=dse < %s \| FileCheck %s
				define dso_local void @init_vector(ptr %p) {
				; CHECK-LABEL: @init_vector(
				; CHECK-NEXT: entry:
				; CHECK-NEXT: br label [[FOR_BODY:%.*]]
				; CHECK: for.body:
				; CHECK-NEXT: [[N_06:%.]] = phi i64 [ 0, [[ENTRY:%.]] ], [ [[INC:%.*]], [[FOR_BODY]] ]
				; CHECK-NEXT: [[CONV:%.*]] = trunc i64 [[N_06]] to i32
				; CHECK-NEXT: [[ARRAYIDX:%.]] = getelementptr inbounds i32, ptr [[P:%.]], i64 [[N_06]]
				; CHECK-NEXT: store i32 [[CONV]], ptr [[ARRAYIDX]], align 4
				; CHECK-NEXT: [[INC]] = add nuw nsw i64 [[N_06]], 1
				; CHECK-NEXT: [[EXITCOND_NOT:%.*]] = icmp eq i64 [[INC]], 4096
				; CHECK-NEXT: br i1 [[EXITCOND_NOT]], label [[NRVO_SKIPDTOR:%.*]], label [[FOR_BODY]]
				; CHECK: nrvo.skipdtor:
				; CHECK-NEXT: ret void
				;
				entry:
				; 1 = MemoryDef(liveOnEntry)
				call void @llvm.memset.p0.i64(ptr %p, i8 0, i64 16384, i1 false)
				br label %for.body

				for.body: ; preds = %for.body, %entry
				; 3 = MemoryPhi({entry,1},{for.body,2})
				%n.06 = phi i64 [ 0, %entry ], [ %inc, %for.body ]
				%conv = trunc i64 %n.06 to i32
				%arrayidx = getelementptr inbounds i32, ptr %p, i64 %n.06
				; 2 = MemoryDef(3)
				store i32 %conv, ptr %arrayidx, align 4
				%inc = add nuw nsw i64 %n.06, 1
				%exitcond.not = icmp eq i64 %inc, 4096
				br i1 %exitcond.not, label %nrvo.skipdtor, label %for.body

				nrvo.skipdtor: ; preds = %for.body
				ret void
				}

				define dso_local void @store_in_two_loops(ptr nocapture noundef writeonly %P, i32 noundef %N) {
				; CHECK-LABEL: @store_in_two_loops(
				; CHECK-NEXT: entry:
				; CHECK-NEXT: [[CMP17_NOT:%.]] = icmp eq i32 [[N:%.]], 0
				; CHECK-NEXT: br i1 [[CMP17_NOT]], label [[FOR_COND_CLEANUP4:%.]], label [[FOR_BODY_PREHEADER:%.]]
				; CHECK: for.body.preheader:
				; CHECK-NEXT: [[WIDE_TRIP_COUNT:%.*]] = zext i32 [[N]] to i64
				; CHECK-NEXT: br label [[FOR_BODY:%.*]]
				; CHECK: for.cond2.preheader:
				; CHECK-NEXT: br i1 [[CMP17_NOT]], label [[FOR_COND_CLEANUP4]], label [[FOR_BODY5_PREHEADER:%.*]]
				; CHECK: for.body5.preheader:
				; CHECK-NEXT: [[WIDE_TRIP_COUNT25:%.*]] = zext i32 [[N]] to i64
				; CHECK-NEXT: br label [[FOR_BODY5:%.*]]
				; CHECK: for.body:
				; CHECK-NEXT: [[INDVARS_IV:%.]] = phi i64 [ 0, [[FOR_BODY_PREHEADER]] ], [ [[INDVARS_IV_NEXT:%.]], [[FOR_BODY]] ]
				; CHECK-NEXT: [[ARRAYIDX:%.]] = getelementptr inbounds i32, ptr [[P:%.]], i64 [[INDVARS_IV]]
				; CHECK-NEXT: store i32 1, ptr [[ARRAYIDX]], align 4
				; CHECK-NEXT: [[INDVARS_IV_NEXT]] = add nuw nsw i64 [[INDVARS_IV]], 1
				; CHECK-NEXT: [[EXITCOND_NOT:%.*]] = icmp eq i64 [[INDVARS_IV_NEXT]], [[WIDE_TRIP_COUNT]]
				; CHECK-NEXT: br i1 [[EXITCOND_NOT]], label [[FOR_COND2_PREHEADER:%.*]], label [[FOR_BODY]]
				; CHECK: for.cond.cleanup4:
				; CHECK-NEXT: ret void
				; CHECK: for.body5:
				; CHECK-NEXT: [[INDVARS_IV22:%.]] = phi i64 [ 0, [[FOR_BODY5_PREHEADER]] ], [ [[INDVARS_IV_NEXT23:%.]], [[FOR_BODY5]] ]
				; CHECK-NEXT: [[ARRAYIDX7:%.*]] = getelementptr inbounds i32, ptr [[P]], i64 [[INDVARS_IV22]]
				; CHECK-NEXT: store i32 1, ptr [[ARRAYIDX7]], align 4
				; CHECK-NEXT: [[INDVARS_IV_NEXT23]] = add nuw nsw i64 [[INDVARS_IV22]], 1
				; CHECK-NEXT: [[EXITCOND26_NOT:%.*]] = icmp eq i64 [[INDVARS_IV_NEXT23]], [[WIDE_TRIP_COUNT25]]
				; CHECK-NEXT: br i1 [[EXITCOND26_NOT]], label [[FOR_COND_CLEANUP4]], label [[FOR_BODY5]]
				;
				entry:
				%cmp17.not = icmp eq i32 %N, 0
				br i1 %cmp17.not, label %for.cond.cleanup4, label %for.body.preheader

				for.body.preheader: ; preds = %entry
				%wide.trip.count = zext i32 %N to i64
				br label %for.body

				for.cond2.preheader: ; preds = %for.body
				br i1 %cmp17.not, label %for.cond.cleanup4, label %for.body5.preheader

				for.body5.preheader: ; preds = %for.cond2.preheader
				%wide.trip.count25 = zext i32 %N to i64
				br label %for.body5

				for.body: ; preds = %for.body, %for.body.preheader
				; 5 = MemoryPhi({for.body.preheader,liveOnEntry},{for.body,1})
				%indvars.iv = phi i64 [ 0, %for.body.preheader ], [ %indvars.iv.next, %for.body ]
				%arrayidx = getelementptr inbounds i32, ptr %P, i64 %indvars.iv
				; 1 = MemoryDef(5)
				store i32 1, ptr %arrayidx, align 4
				%indvars.iv.next = add nuw nsw i64 %indvars.iv, 1
				%exitcond.not = icmp eq i64 %indvars.iv.next, %wide.trip.count
				br i1 %exitcond.not, label %for.cond2.preheader, label %for.body

				for.cond.cleanup4: ; preds = %for.body5, %for.cond2.preheader, %entry
				; 3 = MemoryPhi({entry,liveOnEntry},{for.cond2.preheader,1},{for.body5,2})
				ret void

				for.body5: ; preds = %for.body5, %for.body5.preheader
				; 4 = MemoryPhi({for.body5.preheader,1},{for.body5,2})
				%indvars.iv22 = phi i64 [ 0, %for.body5.preheader ], [ %indvars.iv.next23, %for.body5 ]
				%arrayidx7 = getelementptr inbounds i32, ptr %P, i64 %indvars.iv22
				; 2 = MemoryDef(4)
				store i32 1, ptr %arrayidx7, align 4
				%indvars.iv.next23 = add nuw nsw i64 %indvars.iv22, 1
				%exitcond26.not = icmp eq i64 %indvars.iv.next23, %wide.trip.count25
				br i1 %exitcond26.not, label %for.cond.cleanup4, label %for.body5
				}


				declare void @llvm.memset.p0.i64(ptr nocapture writeonly, i8, i64, i1 immarg)