This is an archive of the discontinued LLVM Phabricator instance.

Paths

Table of Contentst

-
llvm/
-
lib/Transforms/Scalar/
-
Transforms/
-
Scalar/
1/2
MergedLoadStoreMotion.cpp
-
test/
-
CodeGen/AMDGPU/
-
AMDGPU/
-
opt-pipeline.ll
-
Other/
-
opt-LTO-pipeline.ll
-
opt-O2-pipeline.ll
-
opt-O3-pipeline-enable-matrix.ll
1
opt-O3-pipeline.ll
-
opt-Os-pipeline.ll
-
Transforms/InstMerge/
-
InstMerge/
-
cond-store-elim.ll
-
st_sink_split_bb.ll

Differential D105545

[MergedLoadStoreMotion] Conditional store elimination
AbandonedPublic

Authored by chill on Jul 7 2021, 3:46 AM.

Download Raw Diff

Details

Reviewers

SjoerdMeijer
sanwou01
Gerolf
dendibakh
davide
mcrosier
jaykang10
fhahn
efriedma
bjope

Summary

This patch implements replacement of certain conditional stores with
unconditional ones (subject to constraints).

A triangle shaped part of the CFG like:

header:
  ...
  br %cond, label %if.then, label %if.end
    
if.then:
  store %s.new, %addr
  br label %if.end
    
if.end:
   ...

would become

 header:
   ...
   br %cond, label %if.then, label %if.else
    
if.then:
   br %if.end
    
if.else:
  %s.old = load %addr
  br label %if.end
    
if.end:
  %s = phi [%s.new, if.then], [%s.old, if.else]
  store %s, %addr

This transformation is correct as long as the store:
a) is not volatile
b) does not introduce an invalid memory access in the new location
c) does not introduce a data race in the new location

For a) volatile stores are simply disqualified from the transformation.

To satisfy b) we can check that on all paths leading up to the end of the
header block

the program already contains a write (or, for local objects only, a read) to precisely the same memory location, and
for non-local objects only, following that write, there is no instruction that could possibly make subsequent writes invalid (local objects are considered always writable).

To satisfy c), for local objects only, we can check that the address of the
object does not escape the function.

For non-local objects or for escaping local objects we can check that on all
paths leading up to the end of the header block

the program already contains a write to precisely the same memory location, and
following that write, there is no instruction that could possibly be the tail edge of a "synchronizes-with" relation

If the candidate store is to a local variable, we first traverse the users of
the alloca instruction, noting whether the address escapes and whether a load
or a store to the same address dominates the candidate store (domination is a
stronger constraint than the above "on all paths" one).

Failing that, or for stores to non-local objects, we then traverse the MemorySSA
graph, starting from the MemoryDef that corresponds to the candidate
store. During that travesal:

if we reach the initial liveOnEntry, nothing is guaranteed and we fail
if we reach a call to an unknown function, a volatile memory access, or an atomic memory access we bail
if we reach a simple store to the same memory location, we stop traversing upwards from this MemoryDef only
otherwise we contionue the traverse to the incoming MemoryDef

If we didn't bail anywhere in the above traversal, the transformation is
considered correct.

Diff Detail

Unit TestsFailed

	Time	Test
	2,900 ms	x64 debian > libarcher.critical::critical.c
	2,680 ms	x64 debian > libarcher.critical::lock-nested.c
	2,880 ms	x64 debian > libarcher.parallel::parallel-simple.c
	2,910 ms	x64 debian > libarcher.races::critical-unrelated.c
	2,660 ms	x64 debian > libarcher.races::lock-nested-unrelated.c
		View Full Test Results (20 Failed)

Event Timeline

chill created this revision.Jul 7 2021, 3:46 AM

Herald added subscribers: ormris, kerbowa, jfb and 4 others. · View Herald TranscriptJul 7 2021, 3:46 AM

chill requested review of this revision.Jul 7 2021, 3:46 AM

Herald added a project: Restricted Project. · View Herald TranscriptJul 7 2021, 3:46 AM

Herald added a subscriber: llvm-commits. · View Herald Transcript

chill added a parent revision: D105544: Refactor and update comments in the MergedLoadStoreMotion pass (NFC).Jul 7 2021, 3:46 AM

chill mentioned this in D105544: Refactor and update comments in the MergedLoadStoreMotion pass (NFC).

chill added reviewers: SjoerdMeijer, sanwou01, Gerolf, dendibakh, bjope.Jul 7 2021, 3:58 AM

Oh, and I'm going to work on adding tests.

lebedev.ri retitled this revision from Conditional store elimination to [MergedLoadStoreMotion] Conditional store elimination.Jul 7 2021, 4:13 AM

lebedev.ri edited the summary of this revision. (Show Details)

Warning: this pass clearly modifies CFG, and clearly does not update dominator tree,
which means you can not use dominator tree in this pass,
at least not until fixing the pass to preserve it when modifying CFG.

Harbormaster completed remote builds in B112759: Diff 356915.Jul 7 2021, 4:26 AM

chill added inline comments.Jul 8 2021, 2:43 AM

llvm/lib/Transforms/Scalar/MergedLoadStoreMotion.cpp
492	Note to self: should be `V == cast<StoreInst>(I)->getValueOperand())`

Updated with a very different implementation. Now handles non-local stores as well.
Updated the description with a justification why I think this transformation is correct.

Herald added subscribers: asbirlea, george.burgess.iv. · View Herald TranscriptJul 9 2021, 10:49 AM

In D105545#2861643, @lebedev.ri wrote:

Warning: this pass clearly modifies CFG, and clearly does not update dominator tree,
which means you can not use dominator tree in this pass,
at least not until fixing the pass to preserve it when modifying CFG.

Thanks. I think all accesses to the dominator tree (and on the next variants to the MemorySSA graph) occur
before any changes to the CFG.

Harbormaster completed remote builds in B113242: Diff 357569.Jul 9 2021, 11:47 AM

In this updates:

added tests
update the Dom tree, preserve DomTreeAnalysis
small fixes here and there

bjope resigned from this revision.Jul 13 2021, 2:28 AM

Harbormaster completed remote builds in B113692: Diff 358202.Jul 13 2021, 3:05 AM

SjoerdMeijer added reviewers: davide, mcrosier, jaykang10.Jul 15 2021, 1:52 AM

SjoerdMeijer added a reviewer: fhahn.

It looks like the patch does Partial Redundancy Elimination for store with triangle CFG... As far as I know, LLVM has passes to support the PRE.
If you have already checked the llvm passes, can you let me know why the passes do not handle the triangle CFG with store instruction please?

In D105545#2879424, @jaykang10 wrote:

It looks like the patch does Partial Redundancy Elimination for store with triangle CFG... As far as I know, LLVM has passes to support the PRE.
If you have already checked the llvm passes, can you let me know why the passes do not handle the triangle CFG with store instruction please?

I'm not aware of a pass that does PRE on stores. GVN does a limited form of PRE
on diamonds, but it does not deal with void instructions (such is stores).
DSE only handles fully redundant stores, as far as I can tell.

Even if we have PRE on stores, we still need this pass (or a variant thereof) to
create the (partial) redundancy, to feed to or as a step of a (hypothetical?)
Partial-DSE pass, since we don't have actually a case for a PRE - no two stores,
one to be considered (partially) redundant.

In any case, I'm open to suggestions about better places for this
transformation.

@mkazantsev I can see you have worked with PRE recently. If possible, can you recommend the point where this transformation is useful please?

Ping.

Adding @efriedma to see if this is the right place for this.

Assuming it is, for now, I plan to look at this soon.

High-level question(s) first before I dive more into the details: I was wondering if the main benefit of this is simplification of control flow? Which then probably relies on the store that is being sunk the only instruction in the block? In other words, I was wondering if this is always a good thing to do. Or can we can regress things if it's not the only instruction, or it the other path (the one that does not have the store) now has to deal with the condition?

llvm/lib/Transforms/Scalar/MergedLoadStoreMotion.cpp
99	It's probably good to add some of the justification and legality considerations here, i.e. the things you added to the description of this ticket.

At a high level, I think this needs some heuristic to drive it. I can see this being profitable under two circumstances, broadly speaking:

Sinking the store enables DSE to eliminate an earlier store.
Sinking the store enables vectorization.

Otherwise, you're just performing extra memory operations, which doesn't seem like an improvement.

llvm/test/Other/opt-O3-pipeline.ll
144	We don't want an additional MemorySSA run, if we can avoid it. If we're going to do this, maybe stick the code into Dead Store Elimination?

Hello,

Thanks for the comments and the suggestions.
The main objective of this patch is to simplify the CFG, but, indeed, the patch itself does not
make the CFG simpler.
As it turns out, there's already a very similar transformation done in SimplifyCFG, so I have opted
to adding extra correctness checks (respectively, trigger in more cases) *there*.

I'm abandoning this patch in favour of https://reviews.llvm.org/D107281

chill abandoned this revision.Aug 3 2021, 2:06 AM

Revision Contents

Path

Size

llvm/

lib/

Transforms/

Scalar/

MergedLoadStoreMotion.cpp

349 lines

test/

CodeGen/

AMDGPU/

opt-pipeline.ll

8 lines

Other/

opt-LTO-pipeline.ll

3 lines

opt-O2-pipeline.ll

4 lines

opt-O3-pipeline-enable-matrix.ll

4 lines

opt-O3-pipeline.ll

4 lines

opt-Os-pipeline.ll

4 lines

Transforms/

InstMerge/

cond-store-elim.ll

460 lines

st_sink_split_bb.ll

4 lines

Diff 358202

llvm/lib/Transforms/Scalar/MergedLoadStoreMotion.cpp

//===- MergedLoadStoreMotion.cpp - merge and sink stores ------------------===//		//===- MergedLoadStoreMotion.cpp - merge and sink stores ------------------===//
//		//
// Part of the LLVM Project, under the Apache License v2.0 with LLVM Exceptions.		// Part of the LLVM Project, under the Apache License v2.0 with LLVM Exceptions.
// See https://llvm.org/LICENSE.txt for license information.		// See https://llvm.org/LICENSE.txt for license information.
// SPDX-License-Identifier: Apache-2.0 WITH LLVM-exception		// SPDX-License-Identifier: Apache-2.0 WITH LLVM-exception
//		//
//===----------------------------------------------------------------------===//		//===----------------------------------------------------------------------===//
//		//
//! \file		//! \file
//! This pass performs merges of stores on both sides of a		//! This pass performs merges of stores on both sides of a
// diamond (hammock).		// diamond (hammock).
//		//
// The algorithm iteratively sinks two stores to the same address out of a		// The algorithm iteratively sinks two stores to the same address out of a
// diamond (hammock) and merges them into a single store to the tail block		// diamond (hammock) and merges them into a single store to the tail block
// (footer). The algorithm iterates over the instructions of one side of the		// (footer). The algorithm iterates over the instructions of one side of the
// diamond and attempts to find a matching store on the other side. New		// diamond and attempts to find a matching store on the other side. New
// tail/footer block may be insterted if the tail/footer block has more		// tail/footer block may be inserted if the tail/footer block has more
// predecessors (not only the two predecessors that are forming the diamond). It		// predecessors (not only the two predecessors that are forming the diamond). It
// sinks when it thinks it safe to do so. This optimization helps with eg.		// sinks when it thinks it safe to do so. This optimization helps with eg.
// hiding load latencies, triggering if-conversion, and reducing static code		// hiding load latencies, triggering if-conversion, and reducing static code
// size.		// size.
//		//
//===----------------------------------------------------------------------===//		//===----------------------------------------------------------------------===//
//		//
//		//
Show All 31 Lines
// + +		// + +
// if.end ("footer"):		// if.end ("footer"):
// %s.sink = phi [%st, if.then], [%se, if.else]		// %s.sink = phi [%st, if.then], [%se, if.else]
// <...>		// <...>
// store %s.sink, %addr_s		// store %s.sink, %addr_s
// <...>		// <...>
//		//
//		//
		// This pass also replaces conditional stores with unconditional ones (subject to
		Lint: Pre-merge checks Inline Actions clang-format: please reformat the code -// This pass also replaces conditional stores with unconditional ones (subject to -// constraints). Given a triangle shape before the transformation: +// This pass also replaces conditional stores with unconditional ones (subject +// to constraints). Given a triangle shape before the transformation: Lint: Pre-merge checks: clang-format: please reformat the code ``` -// This pass also replaces conditional stores with…
		// constraints). Given a triangle shape before the transformation:
		//
		// header:
		// br %cond, label %if.then, label %if.end
		// + +
		// + +
		// + +
		// if.then: +
		// store %s, %addr +
		// br label %if.end +
		// + +
		// + +
		// + +
		// if.end ("footer"):
		// <...>
		//
		// After the conditional store replacement, this becomes:
		// header:
		// br %cond, label %if.then, label %if.else
		// + +
		// + +
		// + +
		// if.then: if.else:
		// br label %if.end load %s.old, %addr
		// br label %if.end
		// + +
		// + +
		// + +
		// if.end ("footer"):
		// %s.new = phi [%s, if.then], [%s.old, if.else]
		// <...>
		// store %s.new, %addr_s
		// <...>
		//
		SjoerdMeijerUnsubmitted Not Done Reply Inline Actions It's probably good to add some of the justification and legality considerations here, i.e. the things you added to the description of this ticket. SjoerdMeijer: It's probably good to add some of the justification and legality considerations here, i.e. the…
//===----------------------- TODO -----------------------------------------===//		//===----------------------- TODO -----------------------------------------===//
//		//
// 1) Generalize to regions other than diamonds		// 1) Generalize to regions other than diamonds
// 2) Be more aggressive merging memory operations		// 2) Be more aggressive merging memory operations
// Note that both changes require register pressure control		// Note that both changes require register pressure control
//		//
//===----------------------------------------------------------------------===//		//===----------------------------------------------------------------------===//

#include "llvm/Transforms/Scalar/MergedLoadStoreMotion.h"		#include "llvm/Transforms/Scalar/MergedLoadStoreMotion.h"
#include "llvm/ADT/Statistic.h"		#include "llvm/ADT/Statistic.h"
#include "llvm/Analysis/AliasAnalysis.h"		#include "llvm/Analysis/AliasAnalysis.h"
#include "llvm/Analysis/CFG.h"		#include "llvm/Analysis/CFG.h"
#include "llvm/Analysis/GlobalsModRef.h"		#include "llvm/Analysis/GlobalsModRef.h"
#include "llvm/Analysis/Loads.h"		#include "llvm/Analysis/Loads.h"
		#include "llvm/Analysis/MemorySSA.h"
#include "llvm/Analysis/ValueTracking.h"		#include "llvm/Analysis/ValueTracking.h"
#include "llvm/IR/Metadata.h"		#include "llvm/IR/Metadata.h"
#include "llvm/InitializePasses.h"		#include "llvm/InitializePasses.h"
#include "llvm/Support/Debug.h"		#include "llvm/Support/Debug.h"
#include "llvm/Support/raw_ostream.h"		#include "llvm/Support/raw_ostream.h"
#include "llvm/Transforms/Scalar.h"		#include "llvm/Transforms/Scalar.h"
#include "llvm/Transforms/Utils/BasicBlockUtils.h"		#include "llvm/Transforms/Utils/BasicBlockUtils.h"

using namespace llvm;		using namespace llvm;

#define DEBUG_TYPE "mldst-motion"		#define DEBUG_TYPE "mldst-motion"

namespace {		namespace {
//===----------------------------------------------------------------------===//		//===----------------------------------------------------------------------===//
// MergedLoadStoreMotion Pass		// MergedLoadStoreMotion Pass
//===----------------------------------------------------------------------===//		//===----------------------------------------------------------------------===//
class MergedLoadStoreMotion {		class MergedLoadStoreMotion {
AliasAnalysis *AA = nullptr;		AliasAnalysis *AA = nullptr;
		MemorySSA *MSSA = nullptr;
		DominatorTree *DT = nullptr;

// The mergeStores algorithms could have Size0 * Size1 complexity,		// The mergeStores algorithms could have Size0 * Size1 complexity,
// where Size0 and Size1 are the number instructions on the two sides of		// where Size0 and Size1 are the number instructions on the two sides of
// the diamond. The constant chosen here is arbitrary. Compiler Time		// the diamond. The constant chosen here is arbitrary. Compiler Time
// Control is enforced by the check Size0 * Size1 < MagicCompileTimeControl.		// Control is enforced by the check Size0 * Size1 < MagicCompileTimeControl.
const int MagicCompileTimeControl = 250;		const int MagicCompileTimeControl = 250;

const bool SplitFooterBB;		const bool SplitFooterBB;
public:		public:
MergedLoadStoreMotion(bool SplitFooterBB) : SplitFooterBB(SplitFooterBB) {}		MergedLoadStoreMotion(bool SplitFooterBB) : SplitFooterBB(SplitFooterBB) {}
std::pair<bool, bool> run(Function &F, AliasAnalysis &AA);		std::pair<bool, bool> run(Function &F, AliasAnalysis &AA, MemorySSA &MSSA,
		DominatorTree &DT);

private:		private:
bool isDiamondHead(BasicBlock BB, BasicBlock &Left, BasicBlock *&Right,		bool isDiamondHead(BasicBlock BB, BasicBlock &Left, BasicBlock *&Right,
BasicBlock *&Tail);		BasicBlock *&Tail);
		bool isTriangleHead(BasicBlock BB, BasicBlock &Side, BasicBlock *&Tail);
// Routines for sinking stores		// Routines for sinking stores
StoreInst canSinkFromBlock(BasicBlock BB, StoreInst *SI);		StoreInst canSinkFromBlock(BasicBlock BB, StoreInst *SI);
PHINode getPHIOperand(BasicBlock BB, StoreInst S0, StoreInst S1);		PHINode getPHIOperand(BasicBlock BB, StoreInst S0, StoreInst S1);
bool isStoreSinkBarrierInRange(const Instruction &Start,		bool isStoreSinkBarrierInRange(const Instruction &Start,
const Instruction &End, MemoryLocation Loc);		const Instruction &End, MemoryLocation Loc);
bool canSinkStoresAndGEPs(StoreInst S0, StoreInst S1) const;		bool canSinkStoresAndGEPs(StoreInst S0, StoreInst S1) const;
void sinkStoresAndGEPs(BasicBlock BB, StoreInst SinkCand,		void sinkStoresAndGEPs(BasicBlock BB, StoreInst SinkCand,
StoreInst *ElseInst);		StoreInst *ElseInst);
std::pair<bool, bool> mergeStores(BasicBlock Head, BasicBlock Left,		std::pair<bool, bool> mergeStores(BasicBlock Head, BasicBlock Left,
BasicBlock Right, BasicBlock Tail);		BasicBlock Right, BasicBlock Tail,
		DomTreeUpdater &DTU);
		void sinkStore(StoreInst *, DomTreeUpdater &DTU);
		std::pair<bool, bool> checkEscapeAndDomination(const StoreInst *S,
		const AllocaInst *AI);
		bool isNonTrappingStore(const StoreInst *S, bool IsLocal, bool IsEscaping);
		bool isLegalToReplaceStore(const StoreInst *S);
};		};
} // end anonymous namespace		} // end anonymous namespace

///		///
/// True when BB is the head of a diamond (hammock)		/// True when BB is the head of a diamond (hammock)
///		///
bool MergedLoadStoreMotion::isDiamondHead(BasicBlock BB, BasicBlock &Left,		bool MergedLoadStoreMotion::isDiamondHead(BasicBlock BB, BasicBlock &Left,
BasicBlock *&Right,		BasicBlock *&Right,
Show All 14 Lines	bool MergedLoadStoreMotion::isDiamondHead(BasicBlock BB, BasicBlock &Left,
if (!Right->getSinglePredecessor())		if (!Right->getSinglePredecessor())
return false;		return false;

Tail = Left->getSingleSuccessor();		Tail = Left->getSingleSuccessor();
return Tail && Tail == Right->getSingleSuccessor();		return Tail && Tail == Right->getSingleSuccessor();
}		}

///		///
		/// True when BB is the head of a triangle
		///
		bool MergedLoadStoreMotion::isTriangleHead(BasicBlock BB, BasicBlock &Side,
		BasicBlock *&Tail) {
		if (!BB)
		return false;
		auto *BI = dyn_cast<BranchInst>(BB->getTerminator());
		if (!BI \|\| !BI->isConditional())
		return false;

		BasicBlock *Succ0 = BI->getSuccessor(0);
		BasicBlock *Succ1 = BI->getSuccessor(1);

		if (Succ0->hasNPredecessors(1)) {
		Side = Succ0;
		Tail = Succ1;
		return Side->getSingleSuccessor() == Tail;
		}

		if (Succ1->hasNPredecessors(1)) {
		Side = Succ1;
		Tail = Succ0;
		return Side->getSingleSuccessor() == Tail;
		}

		return false;
		}

		///
/// True when instruction is a sink barrier for a store		/// True when instruction is a sink barrier for a store
/// located in Loc		/// located in Loc
///		///
/// Whenever an instruction could possibly read or modify the		/// Whenever an instruction could possibly read or modify the
/// value being stored or protect against the store from		/// value being stored or protect against the store from
/// happening it is considered a sink barrier.		/// happening it is considered a sink barrier.
///		///
bool MergedLoadStoreMotion::isStoreSinkBarrierInRange(const Instruction &Start,		bool MergedLoadStoreMotion::isStoreSinkBarrierInRange(const Instruction &Start,
▲ Show 20 Lines • Show All 105 Lines • ▼ Show 20 Lines
/// True when two stores are equivalent and can sink into the footer		/// True when two stores are equivalent and can sink into the footer
///		///
/// Starting from a diamond head block, iterate over the instructions in one		/// Starting from a diamond head block, iterate over the instructions in one
/// successor block and try to match a store in the second successor.		/// successor block and try to match a store in the second successor.
///		///
std::pair<bool, bool> MergedLoadStoreMotion::mergeStores(BasicBlock *HeadBB,		std::pair<bool, bool> MergedLoadStoreMotion::mergeStores(BasicBlock *HeadBB,
BasicBlock *LeftBB,		BasicBlock *LeftBB,
BasicBlock *RightBB,		BasicBlock *RightBB,
BasicBlock *TailBB) {		BasicBlock *TailBB,
		DomTreeUpdater &DTU) {
bool MergedStores = false;		bool MergedStores = false;
bool SplitFooter = false;		bool SplitFooter = false;
BasicBlock *SinkBB = TailBB;		BasicBlock *SinkBB = TailBB;
// Bail out early if we can not merge into the footer BB.		// Bail out early if we can not merge into the footer BB.
if (!SplitFooterBB && TailBB->hasNPredecessorsOrMore(3))		if (!SplitFooterBB && TailBB->hasNPredecessorsOrMore(3))
return {false, false};		return {false, false};
// #Instructions in LeftBB for Compile Time Control		// #Instructions in LeftBB for Compile Time Control
auto InstsNoDbg = RightBB->instructionsWithoutDebug();		auto InstsNoDbg = RightBB->instructionsWithoutDebug();
Show All 21 Lines	while (Instruction *I = P->getPrevNonDebugInstruction()) {
}		}
if (!canSinkStoresAndGEPs(S0, S1))		if (!canSinkStoresAndGEPs(S0, S1))
// Don't attempt to sink below stores that had to stick around		// Don't attempt to sink below stores that had to stick around
break;		break;

if (SinkBB == TailBB && TailBB->hasNPredecessorsOrMore(3)) {		if (SinkBB == TailBB && TailBB->hasNPredecessorsOrMore(3)) {
// We have more than 2 predecessors. Insert a new block		// We have more than 2 predecessors. Insert a new block
// postdominating 2 predecessors we're going to sink from.		// postdominating 2 predecessors we're going to sink from.
SinkBB = SplitBlockPredecessors(TailBB, {LeftBB, RightBB}, ".sink.split");		SinkBB = SplitBlockPredecessors(TailBB, {LeftBB, RightBB}, ".sink.split",
		&DTU);
if (!SinkBB)		if (!SinkBB)
break;		break;
SplitFooter = true;		SplitFooter = true;
}		}

MergedStores = true;		MergedStores = true;
sinkStoresAndGEPs(SinkBB, S0, S1);		sinkStoresAndGEPs(SinkBB, S0, S1);
}		}
return {MergedStores, SplitFooter};		return {MergedStores, SplitFooter};
}		}

std::pair<bool, bool> MergedLoadStoreMotion::run(Function &F,		///
AliasAnalysis &AA) {		/// Check whether the address `AI` of a local variable may escape from the
this->AA = &AA;		/// function and whether a load from or a store to precisely the same
		/// location as the store `S` properly dominates `S`. `S` must be a simple store
		/// to the object, allocated by `AI`. Return the escape property in the first
		/// element and the domination property in the second element of the pair.
		///
		std::pair<bool, bool>
		MergedLoadStoreMotion::checkEscapeAndDomination(const StoreInst *S,
		const AllocaInst *AI) {
		bool Escape = false;
		bool Dominate = false;
		SmallPtrSet<const PHINode *, 4> Visited;
		SmallVector<const Value *, 16> Worklist;

		Worklist.push_back(AI);
		while (!(Escape && Dominate) && !Worklist.empty()) {
		const Value *V = Worklist.pop_back_val();
		for (const User *U : V->users()) {
		const Instruction *I = dyn_cast<Instruction>(U);
		if (I == nullptr \|\| I == S)
		continue;
		switch (I->getOpcode()) {
		default:
		// If not handled in one of the other cases, conservatively assume the
		// value escapes.
		Escape = true;
		break;
		case Instruction::Ret:
		case Instruction::AtomicRMW:
		// A pointer cannot escape via these instructions.
		break;
		case Instruction::Load:
		if (!Dominate &&
		AA->alias(MemoryLocation::get(S), MemoryLocation::get(I)) ==
		AliasResult::MustAlias)
		Dominate = DT->dominates(I, S);
		break;
		case Instruction::Call: {
		const auto *CI = cast<CallInst>(I);
		if (CI->isDebugOrPseudoInst() \|\| CI->isLifetimeStartOrEnd())
		break;
		}
		LLVM_FALLTHROUGH;
		case Instruction::Invoke:
		case Instruction::CallBr:
		if (any_of(cast<CallBase>(I)->data_ops(),
		[=](const Use &Arg) { return Arg.get() == V; }))
		Escape = true;
		break;
		case Instruction::Store:
		if (V == cast<StoreInst>(I)->getValueOperand())
		Escape = true;
		if (!Dominate &&
		AA->alias(MemoryLocation::get(S), MemoryLocation::get(I)) ==
		AliasResult::MustAlias)
		Dominate = DT->dominates(I, S);
		break;
		case Instruction::AtomicCmpXchg:
		if (V == cast<AtomicCmpXchgInst>(I)->getNewValOperand())
		Escape = true;
		break;
		case Instruction::PHI:
		if (!Visited.insert(cast<PHINode>(I)).second)
		break;
		LLVM_FALLTHROUGH;
		case Instruction::GetElementPtr:
		case Instruction::BitCast:
		case Instruction::AddrSpaceCast:
		case Instruction::Select:
		Worklist.push_back(I);
		break;
		}
		}
		}

		return {Escape, Dominate};
		}

		bool MergedLoadStoreMotion::isNonTrappingStore(const StoreInst *S, bool IsLocal,
		bool IsEscaping) {
		SmallVector<const MemoryAccess *, 16> Worklist;
		SmallPtrSet<const MemoryPhi *, 4> Visited;

		MemoryAccess *SDef = MSSA->getMemoryAccess(S);
		Worklist.append(SDef->defs_begin(), SDef->defs_end());

		while (!Worklist.empty()) {
		const MemoryAccess *M = Worklist.pop_back_val();

		// Can't guarantee anything if we reach the initial node.
		if (MSSA->isLiveOnEntryDef(M))
		return false;
		chillAuthorUnsubmitted Done Reply Inline Actions Note to self: should be `V == cast<StoreInst>(I)->getValueOperand())` chill: Note to self: should be `V == cast<StoreInst>(I)->getValueOperand())`

		// Continue traversing along the MemoryPhi operands.
		if (const auto *Phi = dyn_cast<MemoryPhi>(M)) {
		if (Visited.insert(Phi).second)
		Worklist.append(Phi->defs_begin(), Phi->defs_end());
		continue;
		}

bool Changed = false;		assert(isa<MemoryDef>(M) && "Unexpected MemoryUse");
bool CFGChanged = false;		const auto *MDef = cast<MemoryDef>(M);
		Instruction *I = MDef->getMemoryInst();

		// We can reach the same instruction (along a cyclic path), stop traversing
		// at this point.
		if (I == S)
		continue;

		switch (I->getOpcode()) {
		default:
		// Unknown MemoryDef or an atomic instruction, assume invalidating and
		// bail out.
		return false;
		case Instruction::Store:
		// Ordered or volatile store, assume invalidating.
		if (!cast<StoreInst>(I)->isSimple())
		return false;
		// A write to precisely the same location feeding into the candidate store
		// "shields" it from traps along the dependency chain.
		// A write to a separate or partially overlapping memory location does
		// not guarantee the candidate store is non-trapping, but does not
		// preclude it from being such either, so continue traversing up the
		// dependency chain.
		if (AA->alias(MemoryLocation::get(S), MemoryLocation::get(I)) !=
		AliasResult::MustAlias)
		Worklist.push_back(MDef->getDefiningAccess());
		break;
		case Instruction::Call:
		if (!IsLocal \|\| IsEscaping) {
		const auto *CI = cast<CallInst>(I);
		if (!CI->isDebugOrPseudoInst() && !CI->isLifetimeStartOrEnd())
		// TODO: Continue traversing across well-behaved calls: stdlib
		// functions, readonly/readnone, etc.
		return false;
		}
		Worklist.push_back(MDef->getDefiningAccess());
		break;
		}
		}

		return true;
		}

		// Check if it's legal to replace a conditional store with a load and
		// unconditional store.
		bool MergedLoadStoreMotion::isLegalToReplaceStore(const StoreInst *S) {
		if (!S->isSimple())
		return false;
		bool IsLocal = false, IsEscaping = false, HasDominating = false;
		if (auto *AI =
		dyn_cast<AllocaInst>(getUnderlyingObject(S->getPointerOperand()))) {
		IsLocal = true;
		std::tie(IsEscaping, HasDominating) = checkEscapeAndDomination(S, AI);
		}
		if (IsLocal && !IsEscaping && HasDominating)
		return true;
		return isNonTrappingStore(S, IsLocal, IsEscaping);
		}

		void MergedLoadStoreMotion::sinkStore(StoreInst *S, DomTreeUpdater &DTU) {
		BasicBlock *Body = S->getParent();
		BasicBlock *Head = Body->getSinglePredecessor();
		BasicBlock *Tail = Body->getSingleSuccessor();
		assert(Head && Tail && "Invalid CFG shape for single store replacement");

		// Create a load on the "other" side of the branch.
		BasicBlock *NewBody = SplitBlockPredecessors(Tail, {Head}, ".for.load", &DTU);
		Value *V = S->getValueOperand();
		auto *L =
		new LoadInst(V->getType(), S->getPointerOperand(), V->getName() + ".old",
		&*NewBody->getFirstInsertionPt());

		// Sink the store to the tail block.
		if (Tail->hasNPredecessorsOrMore(3))
		Tail = SplitBlockPredecessors(Tail, {Head, Body}, ".sink", &DTU);

		LLVM_DEBUG(dbgs() << "Sink single store into ";
		Tail->printAsOperand(dbgs(), false);
		Lint: Pre-merge checks Inline Actions clang-format: please reformat the code - Tail->printAsOperand(dbgs(), false); - dbgs() << "\nStore: "; S->dump(); - dbgs() << "Load: "; L->dump()); + Tail->printAsOperand(dbgs(), false); dbgs() << "\nStore: "; + S->dump(); dbgs() << "Load: "; L->dump()); Lint: Pre-merge checks: clang-format: please reformat the code ``` - Tail->printAsOperand(dbgs(), false)…
		dbgs() << "\nStore: "; S->dump();
		dbgs() << "Load: "; L->dump());
		S->removeFromParent();
		S->insertBefore(&*Tail->getFirstInsertionPt());

		auto *Phi =
		PHINode::Create(V->getType(), 2, V->getName() + ".new", &Tail->front());
		Phi->addIncoming(V, Body);
		Phi->addIncoming(L, NewBody);
		S->setOperand(0, Phi);
		}

		std::pair<bool, bool> MergedLoadStoreMotion::run(Function &F, AliasAnalysis &AA,
		MemorySSA &MSSA,
		DominatorTree &DT) {
		this->AA = &AA;
		this->MSSA = &MSSA;
		this->DT = &DT;

LLVM_DEBUG(dbgs() << "Instruction Merger\n");		LLVM_DEBUG(dbgs() << "Instruction Merger\n");

		// Collect conditional stores which are legal to replace with unconditional
		// ones. Do this before making any changes to the function, so we are using up
		// to date dominator tree and Memory SSA graph.
		SmallVector<StoreInst *, 4> CondStores;
		for (BasicBlock &BB : F) {
		BasicBlock IfBody, IfTail;
		if (!isTriangleHead(&BB, IfBody, IfTail))
		continue;

		// Have to insert a new tail, but it's disallowed.
		if (!SplitFooterBB && IfTail->hasNPredecessorsOrMore(3))
		continue;

		// Don't sink if there are more than two (terminator and store) instructions
		// in the block.
		Instruction *I = IfBody->getTerminator()->getPrevNonDebugInstruction(true);
		if (I == nullptr)
		continue;
		if (Instruction *P = I->getPrevNonDebugInstruction(true))
		continue;

		// Don't sink if it could introduce data race or invalid memory
		// access to do so.
		auto *S = dyn_cast<StoreInst>(I);
		if (S == nullptr \|\| !isLegalToReplaceStore(S))
		continue;

		CondStores.push_back(S);
		}

		DomTreeUpdater DTU(DT, DomTreeUpdater::UpdateStrategy::Eager);

		// Now that we have collected the stores, do the replacement.
		for (StoreInst *S : CondStores)
		sinkStore(S, DTU);

		bool CFGChanged = !CondStores.empty();
		bool Changed = CFGChanged;

// Merge unconditional branches, allowing PRE to catch more		// Merge unconditional branches, allowing PRE to catch more
// optimization opportunities.		// optimization opportunities.
// This loop doesn't care about newly inserted/split blocks		// This loop doesn't care about newly inserted/split blocks
// since they never will be diamond heads.		// since they never will be diamond heads.
for (BasicBlock &BB : make_early_inc_range(F)) {		for (BasicBlock &BB : make_early_inc_range(F)) {
BasicBlock Left, Right, *Tail;		BasicBlock Left, Right, *Tail;
if (isDiamondHead(&BB, Left, Right, Tail)) {		if (isDiamondHead(&BB, Left, Right, Tail)) {
// Merge and sink store pairs outside diamonds when possible.		// Merge and sink store pairs outside diamonds when possible.
bool Change, ChangeCFG;		bool Change, ChangeCFG;
std::tie(Change, ChangeCFG) = mergeStores(&BB, Left, Right, Tail);		std::tie(Change, ChangeCFG) = mergeStores(&BB, Left, Right, Tail, DTU);
Changed \|= Change;		Changed \|= Change;
CFGChanged \|= ChangeCFG;		CFGChanged \|= ChangeCFG;
}		}
}		}

return {Changed, CFGChanged};		return {Changed, CFGChanged};
}		}

namespace {		namespace {
class MergedLoadStoreMotionLegacyPass : public FunctionPass {		class MergedLoadStoreMotionLegacyPass : public FunctionPass {
const bool SplitFooterBB;		const bool SplitFooterBB;
public:		public:
static char ID; // Pass identification, replacement for typeid		static char ID; // Pass identification, replacement for typeid
MergedLoadStoreMotionLegacyPass(bool SplitFooterBB = false)		MergedLoadStoreMotionLegacyPass(bool SplitFooterBB = false)
: FunctionPass(ID), SplitFooterBB(SplitFooterBB) {		: FunctionPass(ID), SplitFooterBB(SplitFooterBB) {
initializeMergedLoadStoreMotionLegacyPassPass(		initializeMergedLoadStoreMotionLegacyPassPass(
*PassRegistry::getPassRegistry());		*PassRegistry::getPassRegistry());
}		}

///		///
/// Run the transformation for each function		/// Run the transformation for each function
///		///
bool runOnFunction(Function &F) override {		bool runOnFunction(Function &F) override {
if (skipFunction(F))		if (skipFunction(F))
return false;		return false;
MergedLoadStoreMotion Impl(SplitFooterBB);		MergedLoadStoreMotion Impl(SplitFooterBB);
bool Change, _;		bool Change, _;
std::tie(Change, _) =		std::tie(Change, _) =
Impl.run(F, getAnalysis<AAResultsWrapperPass>().getAAResults());		Impl.run(F, getAnalysis<AAResultsWrapperPass>().getAAResults(),
		getAnalysis<MemorySSAWrapperPass>().getMSSA(),
		getAnalysis<DominatorTreeWrapperPass>().getDomTree());
return Change;		return Change;
}		}

private:		private:
void getAnalysisUsage(AnalysisUsage &AU) const override {		void getAnalysisUsage(AnalysisUsage &AU) const override {
if (!SplitFooterBB)
AU.setPreservesCFG();
AU.addRequired<AAResultsWrapperPass>();		AU.addRequired<AAResultsWrapperPass>();
		AU.addRequired<MemorySSAWrapperPass>();
		AU.addRequired<DominatorTreeWrapperPass>();
AU.addPreserved<GlobalsAAWrapperPass>();		AU.addPreserved<GlobalsAAWrapperPass>();
}		}
};		};

char MergedLoadStoreMotionLegacyPass::ID = 0;		char MergedLoadStoreMotionLegacyPass::ID = 0;
} // anonymous namespace		} // anonymous namespace

///		///
/// createMergedLoadStoreMotionPass - The public interface to this file.		/// createMergedLoadStoreMotionPass - The public interface to this file.
///		///
FunctionPass *llvm::createMergedLoadStoreMotionPass(bool SplitFooterBB) {		FunctionPass *llvm::createMergedLoadStoreMotionPass(bool SplitFooterBB) {
return new MergedLoadStoreMotionLegacyPass(SplitFooterBB);		return new MergedLoadStoreMotionLegacyPass(SplitFooterBB);
}		}

INITIALIZE_PASS_BEGIN(MergedLoadStoreMotionLegacyPass, "mldst-motion",		INITIALIZE_PASS_BEGIN(MergedLoadStoreMotionLegacyPass, "mldst-motion",
"MergedLoadStoreMotion", false, false)		"MergedLoadStoreMotion", false, false)
INITIALIZE_PASS_DEPENDENCY(AAResultsWrapperPass)		INITIALIZE_PASS_DEPENDENCY(AAResultsWrapperPass)
		INITIALIZE_PASS_DEPENDENCY(MemorySSAWrapperPass)
		INITIALIZE_PASS_DEPENDENCY(DominatorTreeWrapperPass)
INITIALIZE_PASS_END(MergedLoadStoreMotionLegacyPass, "mldst-motion",		INITIALIZE_PASS_END(MergedLoadStoreMotionLegacyPass, "mldst-motion",
"MergedLoadStoreMotion", false, false)		"MergedLoadStoreMotion", false, false)

PreservedAnalyses		PreservedAnalyses
MergedLoadStoreMotionPass::run(Function &F, FunctionAnalysisManager &AM) {		MergedLoadStoreMotionPass::run(Function &F, FunctionAnalysisManager &AM) {
MergedLoadStoreMotion Impl(Options.SplitFooterBB);		MergedLoadStoreMotion Impl(Options.SplitFooterBB);
auto &AA = AM.getResult<AAManager>(F);		auto &AA = AM.getResult<AAManager>(F);
		auto &MSSA = AM.getResult<MemorySSAAnalysis>(F).getMSSA();
		auto &DT = AM.getResult<DominatorTreeAnalysis>(F);

bool Change, CFGChange;		bool Change, CFGChange;
std::tie(Change, CFGChange) = Impl.run(F, AA);		std::tie(Change, CFGChange) = Impl.run(F, AA, MSSA, DT);
if (!Change && !CFGChange)		if (!Change)
return PreservedAnalyses::all();		return PreservedAnalyses::all();

PreservedAnalyses PA;		PreservedAnalyses PA;
		PA.preserve<DominatorTreeAnalysis>();
if (!CFGChange)		if (!CFGChange)
PA.preserveSet<CFGAnalyses>();		PA.preserveSet<CFGAnalyses>();
return PA;		return PA;
}		}

llvm/test/CodeGen/AMDGPU/opt-pipeline.ll

	Show First 20 Lines • Show All 483 Lines • ▼ Show 20 Lines
	; GCN-O2-NEXT: Scalar Evolution Analysis			; GCN-O2-NEXT: Scalar Evolution Analysis
	; GCN-O2-NEXT: Loop Pass Manager			; GCN-O2-NEXT: Loop Pass Manager
	; GCN-O2-NEXT: Recognize loop idioms			; GCN-O2-NEXT: Recognize loop idioms
	; GCN-O2-NEXT: Induction Variable Simplification			; GCN-O2-NEXT: Induction Variable Simplification
	; GCN-O2-NEXT: Delete dead loops			; GCN-O2-NEXT: Delete dead loops
	; GCN-O2-NEXT: Unroll loops			; GCN-O2-NEXT: Unroll loops
	; GCN-O2-NEXT: SROA			; GCN-O2-NEXT: SROA
	; GCN-O2-NEXT: Function Alias Analysis Results			; GCN-O2-NEXT: Function Alias Analysis Results
				; GCN-O2-NEXT: Memory SSA
	; GCN-O2-NEXT: MergedLoadStoreMotion			; GCN-O2-NEXT: MergedLoadStoreMotion
				; GCN-O2-NEXT: Dominator Tree Construction
				; GCN-O2-NEXT: Natural Loop Information
	; GCN-O2-NEXT: Phi Values Analysis			; GCN-O2-NEXT: Phi Values Analysis
				; GCN-O2-NEXT: Basic Alias Analysis (stateless AA impl)
	; GCN-O2-NEXT: Function Alias Analysis Results			; GCN-O2-NEXT: Function Alias Analysis Results
	; GCN-O2-NEXT: Memory Dependence Analysis			; GCN-O2-NEXT: Memory Dependence Analysis
	; GCN-O2-NEXT: Lazy Branch Probability Analysis			; GCN-O2-NEXT: Lazy Branch Probability Analysis
	; GCN-O2-NEXT: Lazy Block Frequency Analysis			; GCN-O2-NEXT: Lazy Block Frequency Analysis
	; GCN-O2-NEXT: Optimization Remark Emitter			; GCN-O2-NEXT: Optimization Remark Emitter
	; GCN-O2-NEXT: Global Value Numbering			; GCN-O2-NEXT: Global Value Numbering
	; GCN-O2-NEXT: Sparse Conditional Constant Propagation			; GCN-O2-NEXT: Sparse Conditional Constant Propagation
	; GCN-O2-NEXT: Demanded bits analysis			; GCN-O2-NEXT: Demanded bits analysis
	▲ Show 20 Lines • Show All 341 Lines • ▼ Show 20 Lines
	; GCN-O3-NEXT: Scalar Evolution Analysis			; GCN-O3-NEXT: Scalar Evolution Analysis
	; GCN-O3-NEXT: Loop Pass Manager			; GCN-O3-NEXT: Loop Pass Manager
	; GCN-O3-NEXT: Recognize loop idioms			; GCN-O3-NEXT: Recognize loop idioms
	; GCN-O3-NEXT: Induction Variable Simplification			; GCN-O3-NEXT: Induction Variable Simplification
	; GCN-O3-NEXT: Delete dead loops			; GCN-O3-NEXT: Delete dead loops
	; GCN-O3-NEXT: Unroll loops			; GCN-O3-NEXT: Unroll loops
	; GCN-O3-NEXT: SROA			; GCN-O3-NEXT: SROA
	; GCN-O3-NEXT: Function Alias Analysis Results			; GCN-O3-NEXT: Function Alias Analysis Results
				; GCN-O3-NEXT: Memory SSA
	; GCN-O3-NEXT: MergedLoadStoreMotion			; GCN-O3-NEXT: MergedLoadStoreMotion
				; GCN-O3-NEXT: Dominator Tree Construction
				; GCN-O3-NEXT: Natural Loop Information
	; GCN-O3-NEXT: Phi Values Analysis			; GCN-O3-NEXT: Phi Values Analysis
				; GCN-O3-NEXT: Basic Alias Analysis (stateless AA impl)
	; GCN-O3-NEXT: Function Alias Analysis Results			; GCN-O3-NEXT: Function Alias Analysis Results
	; GCN-O3-NEXT: Memory Dependence Analysis			; GCN-O3-NEXT: Memory Dependence Analysis
	; GCN-O3-NEXT: Lazy Branch Probability Analysis			; GCN-O3-NEXT: Lazy Branch Probability Analysis
	; GCN-O3-NEXT: Lazy Block Frequency Analysis			; GCN-O3-NEXT: Lazy Block Frequency Analysis
	; GCN-O3-NEXT: Optimization Remark Emitter			; GCN-O3-NEXT: Optimization Remark Emitter
	; GCN-O3-NEXT: Global Value Numbering			; GCN-O3-NEXT: Global Value Numbering
	; GCN-O3-NEXT: Sparse Conditional Constant Propagation			; GCN-O3-NEXT: Sparse Conditional Constant Propagation
	; GCN-O3-NEXT: Demanded bits analysis			; GCN-O3-NEXT: Demanded bits analysis
	▲ Show 20 Lines • Show All 192 Lines • Show Last 20 Lines

llvm/test/Other/opt-LTO-pipeline.ll

	Show First 20 Lines • Show All 111 Lines • ▼ Show 20 Lines
	; CHECK-NEXT: Global Value Numbering			; CHECK-NEXT: Global Value Numbering
	; CHECK-NEXT: Basic Alias Analysis (stateless AA impl)			; CHECK-NEXT: Basic Alias Analysis (stateless AA impl)
	; CHECK-NEXT: Function Alias Analysis Results			; CHECK-NEXT: Function Alias Analysis Results
	; CHECK-NEXT: MemCpy Optimization			; CHECK-NEXT: MemCpy Optimization
	; CHECK-NEXT: Post-Dominator Tree Construction			; CHECK-NEXT: Post-Dominator Tree Construction
	; CHECK-NEXT: Dead Store Elimination			; CHECK-NEXT: Dead Store Elimination
	; CHECK-NEXT: Function Alias Analysis Results			; CHECK-NEXT: Function Alias Analysis Results
	; CHECK-NEXT: MergedLoadStoreMotion			; CHECK-NEXT: MergedLoadStoreMotion
				; CHECK-NEXT: Dominator Tree Construction
				; CHECK-NEXT: Natural Loop Information
	; CHECK-NEXT: Canonicalize natural loops			; CHECK-NEXT: Canonicalize natural loops
	; CHECK-NEXT: LCSSA Verifier			; CHECK-NEXT: LCSSA Verifier
	; CHECK-NEXT: Loop-Closed SSA Form Pass			; CHECK-NEXT: Loop-Closed SSA Form Pass
				; CHECK-NEXT: Basic Alias Analysis (stateless AA impl)
	; CHECK-NEXT: Function Alias Analysis Results			; CHECK-NEXT: Function Alias Analysis Results
	; CHECK-NEXT: Scalar Evolution Analysis			; CHECK-NEXT: Scalar Evolution Analysis
	; CHECK-NEXT: Loop Pass Manager			; CHECK-NEXT: Loop Pass Manager
	; CHECK-NEXT: Induction Variable Simplification			; CHECK-NEXT: Induction Variable Simplification
	; CHECK-NEXT: Delete dead loops			; CHECK-NEXT: Delete dead loops
	; CHECK-NEXT: Unroll loops			; CHECK-NEXT: Unroll loops
	; CHECK-NEXT: Loop Access Analysis			; CHECK-NEXT: Loop Access Analysis
	; CHECK-NEXT: Lazy Branch Probability Analysis			; CHECK-NEXT: Lazy Branch Probability Analysis
	▲ Show 20 Lines • Show All 85 Lines • Show Last 20 Lines

llvm/test/Other/opt-O2-pipeline.ll

	Show First 20 Lines • Show All 130 Lines • ▼ Show 20 Lines
	; CHECK-NEXT: Scalar Evolution Analysis			; CHECK-NEXT: Scalar Evolution Analysis
	; CHECK-NEXT: Loop Pass Manager			; CHECK-NEXT: Loop Pass Manager
	; CHECK-NEXT: Recognize loop idioms			; CHECK-NEXT: Recognize loop idioms
	; CHECK-NEXT: Induction Variable Simplification			; CHECK-NEXT: Induction Variable Simplification
	; CHECK-NEXT: Delete dead loops			; CHECK-NEXT: Delete dead loops
	; CHECK-NEXT: Unroll loops			; CHECK-NEXT: Unroll loops
	; CHECK-NEXT: SROA			; CHECK-NEXT: SROA
	; CHECK-NEXT: Function Alias Analysis Results			; CHECK-NEXT: Function Alias Analysis Results
				; CHECK-NEXT: Memory SSA
	; CHECK-NEXT: MergedLoadStoreMotion			; CHECK-NEXT: MergedLoadStoreMotion
				; CHECK-NEXT: Dominator Tree Construction
				; CHECK-NEXT: Natural Loop Information
	; CHECK-NEXT: Phi Values Analysis			; CHECK-NEXT: Phi Values Analysis
				; CHECK-NEXT: Basic Alias Analysis (stateless AA impl)
	; CHECK-NEXT: Function Alias Analysis Results			; CHECK-NEXT: Function Alias Analysis Results
	; CHECK-NEXT: Memory Dependence Analysis			; CHECK-NEXT: Memory Dependence Analysis
	; CHECK-NEXT: Lazy Branch Probability Analysis			; CHECK-NEXT: Lazy Branch Probability Analysis
	; CHECK-NEXT: Lazy Block Frequency Analysis			; CHECK-NEXT: Lazy Block Frequency Analysis
	; CHECK-NEXT: Optimization Remark Emitter			; CHECK-NEXT: Optimization Remark Emitter
	; CHECK-NEXT: Global Value Numbering			; CHECK-NEXT: Global Value Numbering
	; CHECK-NEXT: Sparse Conditional Constant Propagation			; CHECK-NEXT: Sparse Conditional Constant Propagation
	; CHECK-NEXT: Demanded bits analysis			; CHECK-NEXT: Demanded bits analysis
	▲ Show 20 Lines • Show All 187 Lines • Show Last 20 Lines

llvm/test/Other/opt-O3-pipeline-enable-matrix.ll

	Show First 20 Lines • Show All 135 Lines • ▼ Show 20 Lines
	; CHECK-NEXT: Scalar Evolution Analysis			; CHECK-NEXT: Scalar Evolution Analysis
	; CHECK-NEXT: Loop Pass Manager			; CHECK-NEXT: Loop Pass Manager
	; CHECK-NEXT: Recognize loop idioms			; CHECK-NEXT: Recognize loop idioms
	; CHECK-NEXT: Induction Variable Simplification			; CHECK-NEXT: Induction Variable Simplification
	; CHECK-NEXT: Delete dead loops			; CHECK-NEXT: Delete dead loops
	; CHECK-NEXT: Unroll loops			; CHECK-NEXT: Unroll loops
	; CHECK-NEXT: SROA			; CHECK-NEXT: SROA
	; CHECK-NEXT: Function Alias Analysis Results			; CHECK-NEXT: Function Alias Analysis Results
				; CHECK-NEXT: Memory SSA
	; CHECK-NEXT: MergedLoadStoreMotion			; CHECK-NEXT: MergedLoadStoreMotion
				; CHECK-NEXT: Dominator Tree Construction
				; CHECK-NEXT: Natural Loop Information
	; CHECK-NEXT: Phi Values Analysis			; CHECK-NEXT: Phi Values Analysis
				; CHECK-NEXT: Basic Alias Analysis (stateless AA impl)
	; CHECK-NEXT: Function Alias Analysis Results			; CHECK-NEXT: Function Alias Analysis Results
	; CHECK-NEXT: Memory Dependence Analysis			; CHECK-NEXT: Memory Dependence Analysis
	; CHECK-NEXT: Lazy Branch Probability Analysis			; CHECK-NEXT: Lazy Branch Probability Analysis
	; CHECK-NEXT: Lazy Block Frequency Analysis			; CHECK-NEXT: Lazy Block Frequency Analysis
	; CHECK-NEXT: Optimization Remark Emitter			; CHECK-NEXT: Optimization Remark Emitter
	; CHECK-NEXT: Global Value Numbering			; CHECK-NEXT: Global Value Numbering
	; CHECK-NEXT: Sparse Conditional Constant Propagation			; CHECK-NEXT: Sparse Conditional Constant Propagation
	; CHECK-NEXT: Demanded bits analysis			; CHECK-NEXT: Demanded bits analysis
	▲ Show 20 Lines • Show All 192 Lines • Show Last 20 Lines

llvm/test/Other/opt-O3-pipeline.ll

	Show First 20 Lines • Show All 135 Lines • ▼ Show 20 Lines
	; CHECK-NEXT: Scalar Evolution Analysis			; CHECK-NEXT: Scalar Evolution Analysis
	; CHECK-NEXT: Loop Pass Manager			; CHECK-NEXT: Loop Pass Manager
	; CHECK-NEXT: Recognize loop idioms			; CHECK-NEXT: Recognize loop idioms
	; CHECK-NEXT: Induction Variable Simplification			; CHECK-NEXT: Induction Variable Simplification
	; CHECK-NEXT: Delete dead loops			; CHECK-NEXT: Delete dead loops
	; CHECK-NEXT: Unroll loops			; CHECK-NEXT: Unroll loops
	; CHECK-NEXT: SROA			; CHECK-NEXT: SROA
	; CHECK-NEXT: Function Alias Analysis Results			; CHECK-NEXT: Function Alias Analysis Results
				; CHECK-NEXT: Memory SSA
				efriedmaUnsubmitted Not Done Reply Inline Actions We don't want an additional MemorySSA run, if we can avoid it. If we're going to do this, maybe stick the code into Dead Store Elimination? efriedma: We don't want an additional MemorySSA run, if we can avoid it. If we're going to do this…
	; CHECK-NEXT: MergedLoadStoreMotion			; CHECK-NEXT: MergedLoadStoreMotion
				; CHECK-NEXT: Dominator Tree Construction
				; CHECK-NEXT: Natural Loop Information
	; CHECK-NEXT: Phi Values Analysis			; CHECK-NEXT: Phi Values Analysis
				; CHECK-NEXT: Basic Alias Analysis (stateless AA impl)
	; CHECK-NEXT: Function Alias Analysis Results			; CHECK-NEXT: Function Alias Analysis Results
	; CHECK-NEXT: Memory Dependence Analysis			; CHECK-NEXT: Memory Dependence Analysis
	; CHECK-NEXT: Lazy Branch Probability Analysis			; CHECK-NEXT: Lazy Branch Probability Analysis
	; CHECK-NEXT: Lazy Block Frequency Analysis			; CHECK-NEXT: Lazy Block Frequency Analysis
	; CHECK-NEXT: Optimization Remark Emitter			; CHECK-NEXT: Optimization Remark Emitter
	; CHECK-NEXT: Global Value Numbering			; CHECK-NEXT: Global Value Numbering
	; CHECK-NEXT: Sparse Conditional Constant Propagation			; CHECK-NEXT: Sparse Conditional Constant Propagation
	; CHECK-NEXT: Demanded bits analysis			; CHECK-NEXT: Demanded bits analysis
	▲ Show 20 Lines • Show All 187 Lines • Show Last 20 Lines

llvm/test/Other/opt-Os-pipeline.ll

	Show First 20 Lines • Show All 116 Lines • ▼ Show 20 Lines
	; CHECK-NEXT: Scalar Evolution Analysis			; CHECK-NEXT: Scalar Evolution Analysis
	; CHECK-NEXT: Loop Pass Manager			; CHECK-NEXT: Loop Pass Manager
	; CHECK-NEXT: Recognize loop idioms			; CHECK-NEXT: Recognize loop idioms
	; CHECK-NEXT: Induction Variable Simplification			; CHECK-NEXT: Induction Variable Simplification
	; CHECK-NEXT: Delete dead loops			; CHECK-NEXT: Delete dead loops
	; CHECK-NEXT: Unroll loops			; CHECK-NEXT: Unroll loops
	; CHECK-NEXT: SROA			; CHECK-NEXT: SROA
	; CHECK-NEXT: Function Alias Analysis Results			; CHECK-NEXT: Function Alias Analysis Results
				; CHECK-NEXT: Memory SSA
	; CHECK-NEXT: MergedLoadStoreMotion			; CHECK-NEXT: MergedLoadStoreMotion
				; CHECK-NEXT: Dominator Tree Construction
				; CHECK-NEXT: Natural Loop Information
	; CHECK-NEXT: Phi Values Analysis			; CHECK-NEXT: Phi Values Analysis
				; CHECK-NEXT: Basic Alias Analysis (stateless AA impl)
	; CHECK-NEXT: Function Alias Analysis Results			; CHECK-NEXT: Function Alias Analysis Results
	; CHECK-NEXT: Memory Dependence Analysis			; CHECK-NEXT: Memory Dependence Analysis
	; CHECK-NEXT: Lazy Branch Probability Analysis			; CHECK-NEXT: Lazy Branch Probability Analysis
	; CHECK-NEXT: Lazy Block Frequency Analysis			; CHECK-NEXT: Lazy Block Frequency Analysis
	; CHECK-NEXT: Optimization Remark Emitter			; CHECK-NEXT: Optimization Remark Emitter
	; CHECK-NEXT: Global Value Numbering			; CHECK-NEXT: Global Value Numbering
	; CHECK-NEXT: Sparse Conditional Constant Propagation			; CHECK-NEXT: Sparse Conditional Constant Propagation
	; CHECK-NEXT: Demanded bits analysis			; CHECK-NEXT: Demanded bits analysis
	▲ Show 20 Lines • Show All 187 Lines • Show Last 20 Lines

llvm/test/Transforms/InstMerge/cond-store-elim.ll

This file was added.

				; RUN: opt --passes=mldst-motion,verify -S %s -o - \| FileCheck %s

				; Local store, dominated by a load, non-escaping
				define i32 @f0(i64 %k, i32 %b) {
				entry:
				%a = alloca i64, align 8
				%tmpcast = bitcast i64* %a to [2 x i32]*
				store i64 4294967296, i64* %a, align 8
				%arrayidx = getelementptr inbounds [2 x i32], [2 x i32]* %tmpcast, i64 0, i64 %k
				%0 = load i32, i32* %arrayidx, align 4
				%cmp = icmp sgt i32 %0, %b
				br i1 %cmp, label %if.then, label %if.end
				; New control flow
				; CHECK: br i1 %cmp, label %if.then, label %if.end.for.load

				if.then:
				store i32 %b, i32* %arrayidx, align 4
				br label %if.end
				; Store moved out of the block
				; CHECK: if.then:
				; CHECK-NEXT: br label %if.end

				; Load inserted
				; CHECK: if.end.for.load:
				; CHECK-NEXT: %b.old = load i32, i32* %arrayidx, align 4
				; CHECK-NEXT: br label %if.end

				if.end:
				%arrayidx2 = bitcast i64* %a to i32*
				%1 = load i32, i32* %arrayidx2, align 8
				%arrayidx3 = getelementptr inbounds [2 x i32], [2 x i32]* %tmpcast, i64 0, i64 1
				%2 = load i32, i32* %arrayidx3, align 4
				%add = add nsw i32 %2, %1
				ret i32 %add
				; Store moved here
				; CHECK: if.end:
				; CHECK-NEXT: %b.new = phi i32 [ %b, %if.then ], [ %b.old, %if.end.for.load ]
				; CHECK-NEXT: store i32 %b.new, i32* %arrayidx, align 4
				}

				; Local store, dominated by a load, escaping (missed optimisation)
				define i32 @f1(i64 %k, i32 %b) {
				entry:
				%a = alloca i64, align 8
				%tmpcast = bitcast i64* %a to [2 x i32]*
				store i64 4294967296, i64* %a, align 8
				%arrayidx = getelementptr inbounds [2 x i32], [2 x i32]* %tmpcast, i64 0, i64 %k
				%0 = load i32, i32* %arrayidx, align 4
				%cmp = icmp sgt i32 %0, %b
				br i1 %cmp, label %if.then, label %if.end
				; CHECK: br i1 %cmp, label %if.then, label %if.end

				if.then:
				store i32 %b, i32* %arrayidx, align 4
				br label %if.end
				; CHECK: if.then:
				; CHECK-NEXT: store i32 %b, i32* %arrayidx, align 4

				if.end:
				%arrayidx2 = bitcast i64* %a to i32*
				%1 = load i32, i32* %arrayidx2, align 8
				%arrayidx3 = getelementptr inbounds [2 x i32], [2 x i32]* %tmpcast, i64 0, i64 1
				%2 = load i32, i32* %arrayidx3, align 4
				%add = add nsw i32 %2, %1
				; Escape of the local variable here prevents the optimisation
				call void @g(i64* %a)
				ret i32 %add
				}

				declare void @g(i64 *)

				; Local store, loads on all paths (missed optimisation)
				define i32 @f2(i64 %k, i64 %j, i32 %b, i32 %c) {
				entry:
				%a = alloca i64, align 8
				%tmpcast = bitcast i64* %a to [2 x i32]*
				store i64 4294967296, i64* %a, align 8
				%arrayidx = getelementptr inbounds [2 x i32], [2 x i32]* %tmpcast, i64 0, i64 %k
				%cmp = icmp eq i64 %k, %j
				br i1 %cmp, label %if.then, label %if.else

				; MemorySSA can't be used to reach these two loads
				if.then:
				%0 = load i32, i32* %arrayidx, align 4
				br label %if.end

				if.else:
				%1 = load i32, i32* %arrayidx, align 4
				br label %if.end

				if.end:
				%cmp7 = icmp slt i32 %b, %c
				br i1 %cmp7, label %if.then8, label %if.end9

				if.then8:
				store i32 %b, i32* %arrayidx, align 4
				br label %if.end9
				; CHECK: if.then8:
				; CHECK-NEXT: store i32 %b, i32* %arrayidx, align 4
				; CHECK-NEXT: br label %if.end9

				if.end9:
				%arrayidx10 = bitcast i64* %a to i32*
				%2 = load i32, i32* %arrayidx10, align 8
				%arrayidx11 = getelementptr inbounds [2 x i32], [2 x i32]* %tmpcast, i64 0, i64 1
				%3 = load i32, i32* %arrayidx11, align 4
				%add12 = add nsw i32 %3, %2
				ret i32 %add12
				}

				; Dominating non-local load, not optimised
				define i32 @f3(i64 %k, i32 %b, i32* nocapture %a) {
				entry:
				%arrayidx = getelementptr inbounds i32, i32* %a, i64 %k
				%0 = load i32, i32* %arrayidx, align 4
				%cmp = icmp sgt i32 %0, %b
				br i1 %cmp, label %if.then, label %if.end

				if.then:
				store i32 %b, i32* %arrayidx, align 4
				br label %if.end
				; CHECK: if.then:
				; CHECK-NEXT: store i32 %b, i32* %arrayidx, align 4
				; CHECK-NEXT: br label %if.end

				if.end:
				%1 = load i32, i32* %a, align 4
				%arrayidx3 = getelementptr inbounds i32, i32* %a, i64 1
				%2 = load i32, i32* %arrayidx3, align 4
				%add = add nsw i32 %2, %1
				ret i32 %add
				}

				; Dominating non-local store
				define i32 @f4(i64 %k, i64 %j, i32 %b, i32* %a) {
				entry:
				%arrayidx = getelementptr inbounds i32, i32* %a, i64 %k
				%0 = load i32, i32* %arrayidx, align 4
				%arrayidx1 = getelementptr inbounds i32, i32* %a, i64 %j
				store i32 %0, i32* %arrayidx1, align 4
				%1 = load i32, i32* %arrayidx, align 4
				%cmp = icmp sgt i32 %1, %b
				br i1 %cmp, label %if.then, label %if.end
				; New control flow
				; CHECK: br i1 %cmp, label %if.then, label %if.end.for.load

				if.then:
				store i32 %b, i32* %arrayidx1, align 4
				br label %if.end
				; Store moved out
				; CHECK: if.then:
				; CHECK-NEXT: br label %if.end

				; Load inserted
				; CHECK: if.end.for.load:
				; CHECK-NEXT: %b.old = load i32, i32* %arrayidx1, align 4
				; CHECK-NEXT: br label %if.end

				if.end:
				%2 = load i32, i32* %a, align 4
				%arrayidx5 = getelementptr inbounds i32, i32* %a, i64 1
				%3 = load i32, i32* %arrayidx5, align 4
				%add = add nsw i32 %2, %3
				ret i32 %add
				; Store moved here
				; CHECK: if.end:
				; CHECK-NEXT: %b.new = phi i32 [ %b, %if.then ], [ %b.old, %if.end.for.load ]
				; CHECK-NEXT: store i32 %b.new, i32* %arrayidx1, align 4
				}

				; Non-local stores on all paths
				define i32 @f5(i64 %k, i64 %j, i32 %b, i32* nocapture %a) #0 {
				entry:
				%arrayidx = getelementptr inbounds i32, i32* %a, i64 %k
				%arrayidx1 = getelementptr inbounds i32, i32* %a, i64 %j
				%cmp = icmp slt i64 %k, %j
				br i1 %cmp, label %if.then, label %if.else

				if.then:
				%0 = load i32, i32* %arrayidx, align 4
				; This store ...
				store i32 %0, i32* %arrayidx1, align 4
				br label %if.end

				if.else:
				%add = add nsw i64 %j, %k
				%arrayidx2 = getelementptr inbounds i32, i32* %a, i64 %add
				%1 = load i32, i32* %arrayidx2, align 4
				; ... and this one precede the condidate store in all executions.
				store i32 %1, i32* %arrayidx1, align 4
				store i32 1, i32* %arrayidx2, align 4
				br label %if.end

				if.end:
				%2 = load i32, i32* %arrayidx, align 4
				%cmp5 = icmp sgt i32 %2, %b
				br i1 %cmp5, label %if.then6, label %if.end8
				; New control flow
				; CHECK: br i1 %cmp5, label %if.then6, label %if.end8.for.load

				if.then6:
				store i32 %b, i32* %arrayidx1, align 4
				br label %if.end8
				; Store moved out
				; CHECK: if.then6:
				; CHECK-NEXT: br label %if.end8

				; Load inserted
				; CHECK: if.end8.for.load:
				; CHECK-NEXT: %b.old = load i32, i32* %arrayidx1, align 4
				; CHECK-NEXT: br label %if.end8

				if.end8:
				%3 = load i32, i32* %a, align 4
				%arrayidx10 = getelementptr inbounds i32, i32* %a, i64 1
				%4 = load i32, i32* %arrayidx10, align 4
				%add11 = add nsw i32 %4, %3
				ret i32 %add11
				; Store moved here
				; CHECK: if.end8:
				; CHECK-NEXT: %b.new = phi i32 [ %b, %if.then6 ], [ %b.old, %if.end8.for.load ]
				; CHECK-NEXT: store i32 %b.new, i32* %arrayidx1, align
				}

				; One path not fully "covered"
				define i32 @f6(i64 %k, i64 %j, i32 %b, i32* nocapture %a) #0 {
				entry:
				%arrayidx = getelementptr inbounds i32, i32* %a, i64 %k
				%arrayidx1 = getelementptr inbounds i32, i32* %a, i64 %j
				%cmp = icmp slt i64 %k, %j
				%add = add nsw i64 %j, %k
				%arrayidx2 = getelementptr inbounds i32, i32* %a, i64 %add
				br i1 %cmp, label %if.then, label %if.else

				if.then:
				%0 = load i32, i32* %arrayidx, align 4
				store i32 %0, i32* %arrayidx1, align 4
				br label %if.end

				if.else:
				%1 = load i32, i32* %arrayidx2, align 4
				store i32 %1, i32* %arrayidx, align 4
				br label %if.end

				if.end:
				%2 = load i32, i32* %arrayidx, align 4
				%cmp5 = icmp sgt i32 %2, %b
				br i1 %cmp5, label %if.then6, label %if.end8

				if.then6:
				store i32 %b, i32* %arrayidx1, align 4
				br label %if.end8
				; Optimisation did not fire: along entry -> if.else -> if.then6 there is
				; no preceding store to the same location
				; CHECK: if.then6:
				; CHECK-NEXT: store i32 %b, i32* %arrayidx1, align 4
				; CHECK-NEXT: br label %if.end8

				if.end8:
				%3 = load i32, i32* %a, align 4
				%arrayidx10 = getelementptr inbounds i32, i32* %a, i64 1
				%4 = load i32, i32* %arrayidx10, align 4
				%add11 = add nsw i32 %4, %3
				ret i32 %add11
				}

				; "Invalidating" instruction along a path
				define i32 @f7(i64 %k, i64 %j, i32 %b, i32* nocapture %a) #0 {
				entry:
				%arrayidx = getelementptr inbounds i32, i32* %a, i64 %k
				%arrayidx1 = getelementptr inbounds i32, i32* %a, i64 %j
				%cmp = icmp slt i64 %k, %j
				%add = add nsw i64 %j, %k
				%arrayidx2 = getelementptr inbounds i32, i32* %a, i64 %add
				br i1 %cmp, label %if.then, label %if.else

				if.then:
				%0 = load i32, i32* %arrayidx, align 4
				store i32 %0, i32* %arrayidx1, align 4
				; The call to `h` invalidates whatever we can infer from the store above.
				call void @h()
				br label %if.end

				if.else:
				%1 = load i32, i32* %arrayidx2, align 4
				store i32 %1, i32* %arrayidx1, align 4
				br label %if.end

				if.end:
				%2 = load i32, i32* %arrayidx, align 4
				%cmp5 = icmp sgt i32 %2, %b
				br i1 %cmp5, label %if.then6, label %if.end8

				if.then6:
				store i32 %b, i32* %arrayidx1, align 4
				br label %if.end8
				; Optimisation did not fire.
				; CHECK: if.then6:
				; CHECK-NEXT: store i32 %b, i32* %arrayidx1, align 4
				; CHECK-NEXT: br label %if.end8

				if.end8:
				%3 = load i32, i32* %a, align 4
				%arrayidx10 = getelementptr inbounds i32, i32* %a, i64 1
				%4 = load i32, i32* %arrayidx10, align 4
				%add11 = add nsw i32 %4, %3
				ret i32 %add11
				}

				declare void @h()

				; Invalidating instruction along a cyclic path
				@p = global i32* null, align 8

				define i32 @f8(i64 %k, i64 %j, i32 %b, i32* nocapture %a) {
				entry:
				%arrayidx = getelementptr inbounds i32, i32* %a, i64 %k
				%0 = load i32, i32* %arrayidx, align 4
				%arrayidx1 = getelementptr inbounds i32, i32* %a, i64 %j
				store i32 %0, i32* %arrayidx1, align 4
				%cmp18 = icmp slt i64 %k, %j
				br i1 %cmp18, label %for.body, label %for.end

				for.body:
				%k.addr.019 = phi i64 [ %inc, %for.inc ], [ %k, %entry]
				%arrayidx2 = getelementptr inbounds i32, i32* %a, i64 %k.addr.019
				%1 = load i32, i32* %arrayidx2, align 4
				%cmp3 = icmp sgt i32 %1, %b
				br i1 %cmp3, label %if.then, label %for.inc

				if.then:
				store i32 %b, i32* %arrayidx1, align 4
				br label %for.inc

				; Optimisation is invalid because of the volatile store below.
				; CHECK: if.then:
				; CHECK-NEXT: store i32 %b, i32* %arrayidx1, align 4
				; CHECK-NEXT: br label %for.inc

				for.inc:
				%ptr = load i32 , i32* @p, align 8
				store volatile i32 1, i32* %ptr, align 8
				%inc = add nsw i64 %k.addr.019, 1
				%exitcond = icmp ne i64 %inc, %j
				br i1 %exitcond, label %for.body, label %for.end

				for.end:
				%2 = load i32, i32* %a, align 4
				%arrayidx6 = getelementptr inbounds i32, i32* %a, i64 1
				%3 = load i32, i32* %arrayidx6, align 4
				%add = add nsw i32 %3, %2
				ret i32 %add
				}

				; Non-local stores on all paths (cyclic version)
				define i32 @f9(i64 %k, i64 %j, i32 %b, i32* nocapture %a) {
				entry:
				%arrayidx = getelementptr inbounds i32, i32* %a, i64 %k
				%0 = load i32, i32* %arrayidx, align 4
				%arrayidx1 = getelementptr inbounds i32, i32* %a, i64 %j
				store i32 %0, i32* %arrayidx1, align 4
				%cmp18 = icmp slt i64 %k, %j
				br i1 %cmp18, label %for.body, label %for.end

				for.body:
				%k.addr.019 = phi i64 [ %inc, %for.inc ], [ %k, %entry]
				%arrayidx2 = getelementptr inbounds i32, i32* %a, i64 %k.addr.019
				%1 = load i32, i32* %arrayidx2, align 4
				%cmp3 = icmp sgt i32 %1, %b
				br i1 %cmp3, label %if.then, label %for.inc
				; New control flow
				; CHECK: br i1 %cmp3, label %if.then, label %for.inc.for.load

				if.then:
				store i32 %b, i32* %arrayidx1, align 4
				br label %for.inc
				; Store moved out
				; CHECK: if.then:
				; CHECK-NEXT: br label %for.inc

				; Load inserted
				; CHECK: for.inc.for.load:
				; CHECK-NEXT: %b.old = load i32, i32* %arrayidx1, align 4
				; CHECK-NEXT: br label %for.inc

				for.inc:
				call void @h();
				store i32 %b, i32* %arrayidx1, align 4
				%inc = add nsw i64 %k.addr.019, 1
				%exitcond = icmp ne i64 %inc, %j
				br i1 %exitcond, label %for.body, label %for.end
				; Store moved here
				; CHECK: for.inc:
				; CHECK-NEXT: %b.new = phi i32 [ %b, %if.then ], [ %b.old, %for.inc.for.load ]
				; CHECK-NEXT: store i32 %b.new, i32* %arrayidx1, align 4

				for.end:
				%2 = load i32, i32* %a, align 4
				%arrayidx6 = getelementptr inbounds i32, i32* %a, i64 1
				%3 = load i32, i32* %arrayidx6, align 4
				%add = add nsw i32 %3, %2
				ret i32 %add
				}

				; "Invalidating" instruction along a path, bit it does not invalidate local stores
				define i32 @f10(i64 %k, i64 %j, i32 %b) #0 {
				entry:
				%a = alloca i64, align 8
				%tmpcast = bitcast i64* %a to [2 x i32]*
				store i64 4294967296, i64* %a, align 8
				%p = bitcast i64* %a to i32*
				%arrayidx = getelementptr inbounds i32, i32* %p, i64 %k
				%arrayidx1 = getelementptr inbounds i32, i32* %p, i64 %j
				%cmp = icmp slt i64 %k, %j
				%add = add nsw i64 %j, %k
				%arrayidx2 = getelementptr inbounds i32, i32* %p, i64 %add
				br i1 %cmp, label %if.then, label %if.else

				if.then:
				%0 = load i32, i32* %arrayidx, align 4
				store i32 %0, i32* %arrayidx1, align 4
				; A local variables stays writeable regardless of calls to arbitrary functions.
				; If it's non-escaping no data races possible too.
				call void @h()
				br label %if.end

				if.else:
				%1 = load i32, i32* %arrayidx2, align 4
				store i32 %1, i32* %arrayidx1, align 4
				br label %if.end

				if.end:
				%2 = load i32, i32* %arrayidx, align 4
				%cmp5 = icmp sgt i32 %2, %b
				br i1 %cmp5, label %if.then6, label %if.end8
				; CHECK: br i1 %cmp5, label %if.then6, label %if.end8.for.load

				if.then6:
				store i32 %b, i32* %arrayidx1, align 4
				br label %if.end8
				; Store moved out
				; CHECK: if.then6:
				; CHECK-NEXT: br label %if.end8

				; Load inserted
				; CHECK: if.end8.for.load:
				; CHECK-NEXT: %b.old = load i32, i32* %arrayidx1, align 4
				; CHECK-NEXT: br label %if.end8

				if.end8:
				%3 = load i32, i32* %p, align 4
				%arrayidx10 = getelementptr inbounds i32, i32* %p, i64 1
				%4 = load i32, i32* %arrayidx10, align 4
				%add11 = add nsw i32 %4, %3
				ret i32 %add11
				; Store moved here
				; CHECK: if.end8:
				; CHECK-NEXT: %b.new = phi i32 [ %b, %if.then6 ], [ %b.old, %if.end8.for.load ]
				; CHECK-NEXT: store i32 %b.new, i32* %arrayidx1, align 4
				}

llvm/test/Transforms/InstMerge/st_sink_split_bb.ll

	; NOTE: Assertions have been autogenerated by utils/update_test_checks.py			; NOTE: Assertions have been autogenerated by utils/update_test_checks.py
	; Test to make sure that a new block is inserted if we			; Test to make sure that a new block is inserted if we
	; have more than 2 predecessors for the block we're going to sink to.			; have more than 2 predecessors for the block we're going to sink to.
	; RUN: opt -basic-aa -memdep -mldst-motion -S < %s \| FileCheck %s --check-prefix=CHECK-NO			; RUN: opt -basic-aa -memdep -mldst-motion -S < %s \| FileCheck %s --check-prefix=CHECK-NO
	; RUN: opt -debug-pass-manager -aa-pipeline=basic-aa -passes='require<memdep>,mldst-motion' -S < %s 2>&1 \| FileCheck %s --check-prefixes=CHECK-NO,CHECK-INV-NO			; RUN: opt -debug-pass-manager -aa-pipeline=basic-aa -passes='require<memdep>,mldst-motion' -S < %s 2>&1 \| FileCheck %s --check-prefixes=CHECK-NO,CHECK-INV-NO
	; RUN: opt -debug-pass-manager -aa-pipeline=basic-aa -passes='require<memdep>,mldst-motion<split-footer-bb>' -S < %s 2>&1 \| FileCheck %s --check-prefixes=CHECK-YES,CHECK-INV-YES			; RUN: opt -debug-pass-manager -aa-pipeline=basic-aa -passes='require<memdep>,mldst-motion<split-footer-bb>' -S < %s 2>&1 \| FileCheck %s --check-prefixes=CHECK-YES,CHECK-INV-NO
	target datalayout = "e-m:o-i64:64-i128:128-n32:64-S128"			target datalayout = "e-m:o-i64:64-i128:128-n32:64-S128"

	; When passing split-footer-bb to MLSM, it invalidates CFG analyses			; When passing split-footer-bb to MLSM, it invalidates CFG analyses
	; CHECK-INV-NO: Running pass: MergedLoadStoreMotionPass			; CHECK-INV-NO: Running pass: MergedLoadStoreMotionPass
	; CHECK-INV-NO-NOT: Invalidating analysis: DominatorTreeAnalysis			; CHECK-INV-NO-NOT: Invalidating analysis: DominatorTreeAnalysis
	; CHECK-INV-YES: Running pass: MergedLoadStoreMotionPass
	; CHECK-INV-YES: Invalidating analysis: DominatorTreeAnalysis

	; Function Attrs: nounwind uwtable			; Function Attrs: nounwind uwtable
	define dso_local void @st_sink_split_bb(i32* nocapture %arg, i32* nocapture %arg1, i1 zeroext %arg2, i1 zeroext %arg3) local_unnamed_addr {			define dso_local void @st_sink_split_bb(i32* nocapture %arg, i32* nocapture %arg1, i1 zeroext %arg2, i1 zeroext %arg3) local_unnamed_addr {
	; CHECK-NO-LABEL: @st_sink_split_bb(			; CHECK-NO-LABEL: @st_sink_split_bb(
	; CHECK-NO-NEXT: bb:			; CHECK-NO-NEXT: bb:
	; CHECK-NO-NEXT: br i1 [[ARG2:%.]], label [[BB4:%.]], label [[BB5:%.*]]			; CHECK-NO-NEXT: br i1 [[ARG2:%.]], label [[BB4:%.]], label [[BB5:%.*]]
	; CHECK-NO: bb4:			; CHECK-NO: bb4:
	; CHECK-NO-NEXT: store i32 1, i32* [[ARG:%.*]], align 4			; CHECK-NO-NEXT: store i32 1, i32* [[ARG:%.*]], align 4
	▲ Show 20 Lines • Show All 73 Lines • Show Last 20 Lines

This is an archive of the discontinued LLVM Phabricator instance.

[MergedLoadStoreMotion] Conditional store eliminationAbandonedPublic

Details

Diff Detail

Unit TestsFailed

Event Timeline

Revision Contents

Diff 358202

llvm/lib/Transforms/Scalar/MergedLoadStoreMotion.cpp

llvm/test/CodeGen/AMDGPU/opt-pipeline.ll

llvm/test/Other/opt-LTO-pipeline.ll

llvm/test/Other/opt-O2-pipeline.ll

llvm/test/Other/opt-O3-pipeline-enable-matrix.ll

llvm/test/Other/opt-O3-pipeline.ll

llvm/test/Other/opt-Os-pipeline.ll

llvm/test/Transforms/InstMerge/cond-store-elim.ll

llvm/test/Transforms/InstMerge/st_sink_split_bb.ll

[MergedLoadStoreMotion] Conditional store elimination
AbandonedPublic