This is an archive of the discontinued LLVM Phabricator instance.

Paths

Table of Contentst

-
llvm/
-
include/llvm/
-
llvm/
-
Analysis/
-
CaptureTracking.h
-
Transforms/Scalar/
-
Scalar/
-
MemCpyOptimizer.h
-
lib/
-
Analysis/
-
CaptureTracking.cpp
-
Transforms/Scalar/
-
Scalar/
10/23
MemCpyOptimizer.cpp
-
test/
-
Analysis/ScopedNoAliasAA/
-
ScopedNoAliasAA/
-
alias-scope-merging.ll
-
Other/
-
new-pm-defaults.ll
-
new-pm-lto-defaults.ll
-
new-pm-thinlto-defaults.ll
-
Transforms/MemCpyOpt/
-
MemCpyOpt/
-
callslot.ll
-
callslot_badaa.ll
2/2
stack-move.ll

Differential D140089

[MemCpyOpt] Add a stack-move optimization to opportunistically merge allocas together.
Needs ReviewPublic

Authored by pcwalton on Dec 15 2022, 2:01 AM.

Download Raw Diff

Details

Reviewers

nikic

Summary

This patch adds a new feature to the memcpy optimizer known as the stack-move
optimization, intended primarily for the Rust language. It detects the pattern
whereby memory is copied from one stack slot to another stack slot in such a
way that the destination and source are neither captured nor simultaneously
live. In such cases, it optimizes the pattern by merging the two allocas into
one and deleting the memcpy.

Without any changes to the frontend, this optimization eliminates over 17% of
the memcpys between stack slots in the self-hosted Rust compiler. The number
rises to approximately 42% if the Rust frontend is patched to apply nocapture
to all pointer-valued arguments. That patch is of course an incorrect change,
but a Rust compiler compiled in this way works, which is evidence that further
improvements to the Rust frontend to deduce nocapture in more cases can soundly
push the number of memcpys that this optimization eliminates toward the higher
end of that range.

In order to determine that the optimization is safe to apply, this patch
performs a coarse-grained conservative liveness analysis of the two stack slots
on the basic block level. This may appear to be slow, but measurement has
demonstrated the compile-time overhead of this optimization to be approximately
0.07%, which seems well worth it for the large amount of stack traffic that it
eliminates in typical Rust code. Nevertheless, in order to reduce the compile
time impact of this optimization, especially for other languages in which it
isn't expected to help as much, it has a
frontend-configurable cap on the number of basic blocks that it's allowed to
examine.

The liveness analysis is built on top of the CaptureTracking framework so that
the pointer use analysis that capture tracking performs can be reused to
calculate live ranges. The actual analysis is the efficient
"variable-at-a-time" version of liveness analysis, which has a time complexity
of O(number of variables * number of instructions) in the worst case. Since we
only have two variables in this case, and we work on the basic block level
instead of on the instruction level (with the exception of the basic block
containing the memcpy itself), this reduces to O(number of basic blocks +
number of instructions in that one block). In practice, this seems to be
efficient enough to avoid a significant compile time regression.

Additionally, this optimization "shrink wraps" any lifetime markers around the
allocas to the nearest common dominator and postdominator blocks. For this, it
needs the postdominator tree. This doesn't seem to cause a noticeable
difference in compile times, because the postdominator tree is needed by dead
store elimination anyway, which typically follows memcpy optimization.
Nonetheless, to be maximally cautious regarding compile time, we only require
the postdominator tree if the optimization is enabled.

Diff Detail

Repository: rG LLVM Github Monorepo

Event Timeline

pcwalton created this revision.Dec 15 2022, 2:01 AM

Herald added a project: Restricted Project. · View Herald TranscriptDec 15 2022, 2:01 AM

Herald added subscribers: JDevlieghere, hiraditya. · View Herald Transcript

pcwalton requested review of this revision.Dec 15 2022, 2:01 AM

Herald added a project: Restricted Project. · View Herald TranscriptDec 15 2022, 2:01 AM

Herald added a subscriber: llvm-commits. · View Herald Transcript

I know this is a big patch. Please feel free to let me know if it's too much to review and I can hand it to someone else. Thanks for your time :)

xbolva00 added a subscriber: xbolva00.Dec 15 2022, 2:34 AM

xbolva00 added inline comments.

llvm/lib/Transforms/Scalar/MemCpyOptimizer.cpp
75	Lets avoid yet another off by default oprimization..

Harbormaster completed remote builds in B203289: Diff 483095.Dec 15 2022, 3:27 AM

nikic added inline comments.Dec 15 2022, 3:40 AM

llvm/lib/Transforms/Scalar/MemCpyOptimizer.cpp
75	Setting this to 8, I see no compile-time impact and very little codegen impact on CTMark -- looks like this optimization just doesn't trigger on C/C++ code. So I don't think we need to be concerned about enabling it by default.
2152	stripPointerCasts() is sufficient. You'll never get an alloca looking through aliases.

slanterns added a subscriber: slanterns.Dec 15 2022, 4:09 AM

alex added a subscriber: alex.Dec 15 2022, 5:04 AM

lkail added a subscriber: lkail.Dec 15 2022, 6:47 AM

arsenm added a subscriber: arsenm.Dec 15 2022, 6:49 AM

arsenm added inline comments.

llvm/lib/Transforms/Scalar/MemCpyOptimizer.cpp
1848–1860	Can reflow to early exit if either one is dynamic
1863	you could adjust the alignments up
1894–1897	Can merge these into one LLVM_DEBUG block
1958–1959	You should query the pointer size for the pointer type
1988	Swap condition? PostDomNode ? getBlock : null
2150–2152	More early returns
2152	Is there any real reason for a strip here with opaque pointers? This only does anything for addrspacecast?
llvm/test/Transforms/MemCpyOpt/stack-move.ll
37	Add a test with a volatile memcpy. Also one that shows metadata that was attached to these

arsenm added inline comments.Dec 15 2022, 7:01 AM

llvm/lib/Transforms/Scalar/MemCpyOptimizer.cpp
1843	Should also check the address space of the allocas match

FWIW, if this is only about allocas, this is going to be covered by fixing SROA mem transfer intrin handling, and finishing the nocapture/"backing alloca" handling.

pftbest added a subscriber: pftbest.Dec 15 2022, 9:08 AM

rrbutani added a subscriber: rrbutani.Dec 15 2022, 10:41 AM

rajputrajat added a subscriber: rajputrajat.Dec 15 2022, 5:25 PM

lebedev.ri mentioned this in rGcfd594f8bb5e: [SROA] `isVectorPromotionViable()`: memory intrinsics operate on vectors of….Dec 16 2022, 8:32 AM

Updated per comments. The optimization is now enabled by default and tests have been updated accordingly.

Herald added subscribers: jeroen.dobbelaere, ormris, steven_wu. · View Herald TranscriptDec 19 2022, 4:31 PM

pcwalton retitled this revision from [MemCpyOpt] Add a stack-move optimization to opportunistically merge allocas together, disabled by default. to [MemCpyOpt] Add a stack-move optimization to opportunistically merge allocas together..Dec 19 2022, 4:32 PM

pcwalton edited the summary of this revision. (Show Details)

pcwalton marked 11 inline comments as done.

Should be ready for re-review now.

llvm/test/Transforms/MemCpyOpt/stack-move.ll
37	I'm not quite sure what you meant by metadata that was attached, but I added a test `remove_alloca_metadata` that ensures that we strip metadata properly.

pcwalton edited the summary of this revision. (Show Details)Dec 19 2022, 4:36 PM

Harbormaster completed remote builds in B204058: Diff 484132.Dec 19 2022, 7:08 PM

Some mostly stylistic notes, I didn't get to the core logic yet.

High level question: Do we need to do anything about debug info during this transform? I suspect "yes", but I have absolutely no clue about debug info.

llvm/lib/Transforms/Scalar/MemCpyOptimizer.cpp
77	This limit is pretty large. Do you have any data that shows that there is any substantial increase in optimization opportunities over a more conservative value, such as 8 or 16?
819	`auto *` with dyn_cast
824	This casts the TypeSize to int, which will assert for scalable vectors. Please add a test with scalable load/store. (I think if you pass in TypeSize rather than uint64_t and use that, you can probably support scalable vectors without much effort, though just bailing out is also fine.)
826	The iterator is moved forward before the process functions are called, so it already points to the instruction after the store. I think what you want to do here is just `++BBI;`, which will get undone by the step back that returning true from here performs. (Ugh, the iterator handling in memcpyopt is a mess...)
1861	I've added a getAllocationSize() method in https://github.com/llvm/llvm-project/commit/a6a526ec5465d712db11fdbf5ed5fce8da0722cf to avoid the *8 back and forth here.
1877	For stacksave/stackrestore it would be fine to check just isStaticAlloca() (in fact, you might want to do that anyway for inalloca allocas, though they might get excluded by other checks already). Even if the alloca is not at the start of the entry block, it will won't get affected by stacksave/stackrestore. For domination, as you skip past GEP instructions (which might be based on the allocas), I'm not sure this would actually avoid all possible domination issues. What I'd do is check isStaticAlloca here, and then in the transform move the retained alloca up if necessary.
1914	nit: Omit braces.
1923	or so
1938	This inspects a potentially large number of instructions.
1976	You want the index size of the alloca address space here, not the pointer size of the default address space.
1982	This looks a bit odd, it might be better to use the instruction-level common dominator instead? I've landed https://github.com/llvm/llvm-project/commit/7f0de9573f758f5f9108795850337a5acbd17eef to add a convenience API for this.
2150	nit: `auto *` for dyn_cast

n-omer added a subscriber: n-omer.Jan 11 2023, 8:03 AM

Herald added a subscriber: StephenFan. · View Herald TranscriptJan 11 2023, 8:03 AM

Drive-by comment after spotting this question:

High level question: Do we need to do anything about debug info during this transform? I suspect "yes", but I have absolutely no clue about debug info.

From a brief glance, I understand this optimisation RAUWs the dest alloca to the source alloca -- if that's all, then there are a couple of ways that variable locations will be presented that have the potential to be misleading. I'm assuming that the source and destination allocas each have their own source variable:

Stores to the surviving alloca can be interpreted as assignments to _both_ source variables. That means stores to the alloca before the memcpy would have happened may appear as assignments to the "destination" variable, and stores after the memcpy might appear as assignments to the "source" variable.
If no part of either alloca is promoted, both source and destination variables will have the same locations, and so will appear to be identical variables throughout the function.

Whether this is acceptable depends on the user expectations -- if the source and destination variables are never in scope at the same time, then there are no downsides. It might be tolerable for some of these assignments to "leak" between variables. Another alternative would be to try and terminate the "source" variable after the memcpy and prevent the "destination" variable appearing before it. (I know nothing of rust, hence asking about preferences). CC @Orlando as this could interact with assignment-tracking stuff.

I'm not aware of anything odd that could happen with source locations.

khei4 added a subscriber: khei4.Mar 15 2023, 1:19 AM

@pcwalton Hi! I'm happy to take ownership of fixing this unless someone else is already working on it. I now plan to start with basic block local optimizations!

khei4 mentioned this in D153453: [MemCpyOpt] implement single BB stack-move optimization which unify the static unescaped allocas .Jun 21 2023, 11:50 AM

khei4 mentioned this in D152277: [MemCpyOpt] precommit tests to add single-BB stack-move optimization (NFC).Jul 8 2023, 10:20 PM

khei4 mentioned this in D155422: [MemCpyOpt] precommit test for D155406 (NFC).Jul 21 2023, 12:57 AM

Revision Contents

Path

Size

llvm/

include/

llvm/

Analysis/

CaptureTracking.h

7 lines

Transforms/

Scalar/

MemCpyOptimizer.h

9 lines

lib/

Analysis/

CaptureTracking.cpp

10 lines

Transforms/

Scalar/

MemCpyOptimizer.cpp

642 lines

test/

Analysis/

ScopedNoAliasAA/

alias-scope-merging.ll

2 lines

Other/

new-pm-defaults.ll

3 lines

new-pm-lto-defaults.ll

2 lines

new-pm-thinlto-defaults.ll

3 lines

Transforms/

MemCpyOpt/

callslot.ll

2 lines

callslot_badaa.ll

2 lines

stack-move.ll

894 lines

Diff 484132

llvm/include/llvm/Analysis/CaptureTracking.h

Show First 20 Lines • Show All 90 Lines • ▼ Show 20 Lines	namespace llvm {
/// to see whether anything was captured.		/// to see whether anything was captured.
struct CaptureTracker {		struct CaptureTracker {
virtual ~CaptureTracker();		virtual ~CaptureTracker();

/// tooManyUses - The depth of traversal has breached a limit. There may be		/// tooManyUses - The depth of traversal has breached a limit. There may be
/// capturing instructions that will not be passed into captured().		/// capturing instructions that will not be passed into captured().
virtual void tooManyUses() = 0;		virtual void tooManyUses() = 0;

		/// visitUse - We found a use of a value derived from the pointer. This is
		/// called after shouldExplore(). Return true to stop the traversal or
		/// false to continue looking for more uses.
		///
		/// U->getUser() is always an Instruction.
		virtual bool visitUse(const Use *U);

/// shouldExplore - This is the use of a value derived from the pointer.		/// shouldExplore - This is the use of a value derived from the pointer.
/// To prune the search (ie., assume that none of its users could possibly		/// To prune the search (ie., assume that none of its users could possibly
/// capture) return false. To search it, return true.		/// capture) return false. To search it, return true.
///		///
/// U->getUser() is always an Instruction.		/// U->getUser() is always an Instruction.
virtual bool shouldExplore(const Use *U);		virtual bool shouldExplore(const Use *U);

/// captured - Information about the pointer was captured by the user of		/// captured - Information about the pointer was captured by the user of
▲ Show 20 Lines • Show All 45 Lines • Show Last 20 Lines

llvm/include/llvm/Transforms/Scalar/MemCpyOptimizer.h

	Show All 14 Lines
	#define LLVM_TRANSFORMS_SCALAR_MEMCPYOPTIMIZER_H			#define LLVM_TRANSFORMS_SCALAR_MEMCPYOPTIMIZER_H

	#include "llvm/IR/BasicBlock.h"			#include "llvm/IR/BasicBlock.h"
	#include "llvm/IR/PassManager.h"			#include "llvm/IR/PassManager.h"

	namespace llvm {			namespace llvm {

	class AAResults;			class AAResults;
				class AllocaInst;
	class BatchAAResults;			class BatchAAResults;
	class AssumptionCache;			class AssumptionCache;
	class CallBase;			class CallBase;
	class CallInst;			class CallInst;
	class DominatorTree;			class DominatorTree;
	class Function;			class Function;
	class Instruction;			class Instruction;
	class LoadInst;			class LoadInst;
	class MemCpyInst;			class MemCpyInst;
	class MemMoveInst;			class MemMoveInst;
	class MemorySSA;			class MemorySSA;
	class MemorySSAUpdater;			class MemorySSAUpdater;
	class MemSetInst;			class MemSetInst;
				class PostDominatorTree;
	class StoreInst;			class StoreInst;
	class TargetLibraryInfo;			class TargetLibraryInfo;
	class Value;			class Value;

	class MemCpyOptPass : public PassInfoMixin<MemCpyOptPass> {			class MemCpyOptPass : public PassInfoMixin<MemCpyOptPass> {
	TargetLibraryInfo *TLI = nullptr;			TargetLibraryInfo *TLI = nullptr;
	AAResults *AA = nullptr;			AAResults *AA = nullptr;
	AssumptionCache *AC = nullptr;			AssumptionCache *AC = nullptr;
	DominatorTree *DT = nullptr;			DominatorTree *DT = nullptr;
				PostDominatorTree *PDT = nullptr;
	MemorySSA *MSSA = nullptr;			MemorySSA *MSSA = nullptr;
	MemorySSAUpdater *MSSAU = nullptr;			MemorySSAUpdater *MSSAU = nullptr;

	public:			public:
	MemCpyOptPass() = default;			MemCpyOptPass() = default;

	PreservedAnalyses run(Function &F, FunctionAnalysisManager &AM);			PreservedAnalyses run(Function &F, FunctionAnalysisManager &AM);

	// Glue for the old PM.			// Glue for the old PM.
	bool runImpl(Function &F, TargetLibraryInfo TLI, AAResults AA,			bool runImpl(Function &F, TargetLibraryInfo TLI, AAResults AA,
	AssumptionCache AC, DominatorTree DT, MemorySSA *MSSA);			AssumptionCache AC, DominatorTree DT, PostDominatorTree *PDT,
				MemorySSA *MSSA);

	private:			private:
	// Helper functions			// Helper functions
	bool processStore(StoreInst *SI, BasicBlock::iterator &BBI);			bool processStore(StoreInst *SI, BasicBlock::iterator &BBI);
	bool processMemSet(MemSetInst *SI, BasicBlock::iterator &BBI);			bool processMemSet(MemSetInst *SI, BasicBlock::iterator &BBI);
	bool processMemCpy(MemCpyInst *M, BasicBlock::iterator &BBI);			bool processMemCpy(MemCpyInst *M, BasicBlock::iterator &BBI);
	bool processMemMove(MemMoveInst *M);			bool processMemMove(MemMoveInst *M);
	bool performCallSlotOptzn(Instruction cpyLoad, Instruction cpyStore,			bool performCallSlotOptzn(Instruction cpyLoad, Instruction cpyStore,
	Value cpyDst, Value cpySrc, TypeSize cpyLen,			Value cpyDst, Value cpySrc, TypeSize cpyLen,
	Align cpyAlign, BatchAAResults &BAA,			Align cpyAlign, BatchAAResults &BAA,
	std::function<CallInst *()> GetC);			std::function<CallInst *()> GetC);
	bool processMemCpyMemCpyDependence(MemCpyInst M, MemCpyInst MDep,			bool processMemCpyMemCpyDependence(MemCpyInst M, MemCpyInst MDep,
	BatchAAResults &BAA);			BatchAAResults &BAA);
	bool processMemSetMemCpyDependence(MemCpyInst MemCpy, MemSetInst MemSet,			bool processMemSetMemCpyDependence(MemCpyInst MemCpy, MemSetInst MemSet,
	BatchAAResults &BAA);			BatchAAResults &BAA);
	bool performMemCpyToMemSetOptzn(MemCpyInst MemCpy, MemSetInst MemSet,			bool performMemCpyToMemSetOptzn(MemCpyInst MemCpy, MemSetInst MemSet,
	BatchAAResults &BAA);			BatchAAResults &BAA);
	bool processByValArgument(CallBase &CB, unsigned ArgNo);			bool processByValArgument(CallBase &CB, unsigned ArgNo);
	Instruction tryMergingIntoMemset(Instruction I, Value *StartPtr,			Instruction tryMergingIntoMemset(Instruction I, Value *StartPtr,
	Value *ByteVal);			Value *ByteVal);
	bool moveUp(StoreInst SI, Instruction P, const LoadInst *LI);			bool moveUp(StoreInst SI, Instruction P, const LoadInst *LI);
				bool performStackMoveOptzn(Instruction Load, Instruction Store,
				AllocaInst DestAlloca, AllocaInst SrcAlloca,
				uint64_t Size);

	void eraseInstruction(Instruction *I);			void eraseInstruction(Instruction *I);
	bool iterateOnFunction(Function &F);			bool iterateOnFunction(Function &F);
	};			};

	} // end namespace llvm			} // end namespace llvm

	#endif // LLVM_TRANSFORMS_SCALAR_MEMCPYOPTIMIZER_H			#endif // LLVM_TRANSFORMS_SCALAR_MEMCPYOPTIMIZER_H

llvm/lib/Analysis/CaptureTracking.cpp

Show First 20 Lines • Show All 49 Lines • ▼ Show 20 Lines	DefaultMaxUsesToExplore("capture-tracking-max-uses-to-explore", cl::Hidden,
cl::init(100));		cl::init(100));

unsigned llvm::getDefaultMaxUsesToExploreForCaptureTracking() {		unsigned llvm::getDefaultMaxUsesToExploreForCaptureTracking() {
return DefaultMaxUsesToExplore;		return DefaultMaxUsesToExplore;
}		}

CaptureTracker::~CaptureTracker() = default;		CaptureTracker::~CaptureTracker() = default;

		bool CaptureTracker::visitUse(const Use *U) { return false; }

bool CaptureTracker::shouldExplore(const Use *U) { return true; }		bool CaptureTracker::shouldExplore(const Use *U) { return true; }

bool CaptureTracker::isDereferenceableOrNull(Value *O, const DataLayout &DL) {		bool CaptureTracker::isDereferenceableOrNull(Value *O, const DataLayout &DL) {
// An inbounds GEP can either be a valid pointer (pointing into		// An inbounds GEP can either be a valid pointer (pointing into
// or to the end of an allocation), or be null in the default		// or to the end of an allocation), or be null in the default
// address space. So for an inbounds GEP there is no way to let		// address space. So for an inbounds GEP there is no way to let
// the pointer escape using clever GEP hacking because doing so		// the pointer escape using clever GEP hacking because doing so
// would make the pointer point outside of the allocated object		// would make the pointer point outside of the allocated object
▲ Show 20 Lines • Show All 373 Lines • ▼ Show 20 Lines	void llvm::PointerMayBeCaptured(const Value V, CaptureTracker Tracker,
assert(V->getType()->isPointerTy() && "Capture is for pointers only!");		assert(V->getType()->isPointerTy() && "Capture is for pointers only!");
if (MaxUsesToExplore == 0)		if (MaxUsesToExplore == 0)
MaxUsesToExplore = DefaultMaxUsesToExplore;		MaxUsesToExplore = DefaultMaxUsesToExplore;

SmallVector<const Use *, 20> Worklist;		SmallVector<const Use *, 20> Worklist;
Worklist.reserve(getDefaultMaxUsesToExploreForCaptureTracking());		Worklist.reserve(getDefaultMaxUsesToExploreForCaptureTracking());
SmallSet<const Use *, 20> Visited;		SmallSet<const Use *, 20> Visited;

auto AddUses = [&](const Value *V) {		auto VisitUses = [&](const Value *V) {
for (const Use &U : V->uses()) {		for (const Use &U : V->uses()) {
// If there are lots of uses, conservatively say that the value		// If there are lots of uses, conservatively say that the value
// is captured to avoid taking too much compile time.		// is captured to avoid taking too much compile time.
if (Visited.size() >= MaxUsesToExplore) {		if (Visited.size() >= MaxUsesToExplore) {
Tracker->tooManyUses();		Tracker->tooManyUses();
return false;		return false;
}		}
if (!Visited.insert(&U).second)		if (!Visited.insert(&U).second)
continue;		continue;
if (!Tracker->shouldExplore(&U))		if (!Tracker->shouldExplore(&U))
continue;		continue;
Worklist.push_back(&U);		Worklist.push_back(&U);
}		}
return true;		return true;
};		};
if (!AddUses(V))		if (!VisitUses(V))
return;		return;

auto IsDereferenceableOrNull = [Tracker](Value *V, const DataLayout &DL) {		auto IsDereferenceableOrNull = [Tracker](Value *V, const DataLayout &DL) {
return Tracker->isDereferenceableOrNull(V, DL);		return Tracker->isDereferenceableOrNull(V, DL);
};		};
while (!Worklist.empty()) {		while (!Worklist.empty()) {
const Use *U = Worklist.pop_back_val();		const Use *U = Worklist.pop_back_val();
		if (Tracker->visitUse(U))
		return;
switch (DetermineUseCaptureKind(*U, IsDereferenceableOrNull)) {		switch (DetermineUseCaptureKind(*U, IsDereferenceableOrNull)) {
case UseCaptureKind::NO_CAPTURE:		case UseCaptureKind::NO_CAPTURE:
continue;		continue;
case UseCaptureKind::MAY_CAPTURE:		case UseCaptureKind::MAY_CAPTURE:
if (Tracker->captured(U))		if (Tracker->captured(U))
return;		return;
continue;		continue;
case UseCaptureKind::PASSTHROUGH:		case UseCaptureKind::PASSTHROUGH:
if (!AddUses(U->getUser()))		if (!VisitUses(U->getUser()))
return;		return;
continue;		continue;
}		}
}		}

// All uses examined.		// All uses examined.
}		}

Show All 26 Lines

llvm/lib/Transforms/Scalar/MemCpyOptimizer.cpp

//===- MemCpyOptimizer.cpp - Optimize use of memcpy and friends -----------===// //===- MemCpyOptimizer.cpp - Optimize use of memcpy and friends -----------===//

// //

// Part of the LLVM Project, under the Apache License v2.0 with LLVM Exceptions. // Part of the LLVM Project, under the Apache License v2.0 with LLVM Exceptions.

// See https://llvm.org/LICENSE.txt for license information. // See https://llvm.org/LICENSE.txt for license information.

// SPDX-License-Identifier: Apache-2.0 WITH LLVM-exception // SPDX-License-Identifier: Apache-2.0 WITH LLVM-exception

// //

//===----------------------------------------------------------------------===// //===----------------------------------------------------------------------===//

// //

// This pass performs various transformations related to eliminating memcpy // This pass performs various transformations related to eliminating memcpy

// calls, or transforming sets of stores into memset's. // calls, or transforming sets of stores into memset's.

// //

//===----------------------------------------------------------------------===// //===----------------------------------------------------------------------===//

#include "llvm/Transforms/Scalar/MemCpyOptimizer.h" #include "llvm/Transforms/Scalar/MemCpyOptimizer.h"

#include "llvm/ADT/BitVector.h"

#include "llvm/ADT/Bitfields.h"

#include "llvm/ADT/DenseSet.h" #include "llvm/ADT/DenseSet.h"

#include "llvm/ADT/PointerIntPair.h"

#include "llvm/ADT/STLExtras.h" #include "llvm/ADT/STLExtras.h"

#include "llvm/ADT/SmallVector.h" #include "llvm/ADT/SmallVector.h"

#include "llvm/ADT/Statistic.h" #include "llvm/ADT/Statistic.h"

#include "llvm/ADT/iterator_range.h" #include "llvm/ADT/iterator_range.h"

#include "llvm/Analysis/AliasAnalysis.h" #include "llvm/Analysis/AliasAnalysis.h"

#include "llvm/Analysis/AssumptionCache.h" #include "llvm/Analysis/AssumptionCache.h"

#include "llvm/Analysis/CaptureTracking.h" #include "llvm/Analysis/CaptureTracking.h"

#include "llvm/Analysis/GlobalsModRef.h" #include "llvm/Analysis/GlobalsModRef.h"

#include "llvm/Analysis/Loads.h" #include "llvm/Analysis/Loads.h"

#include "llvm/Analysis/MemoryLocation.h" #include "llvm/Analysis/MemoryLocation.h"

#include "llvm/Analysis/MemorySSA.h" #include "llvm/Analysis/MemorySSA.h"

#include "llvm/Analysis/MemorySSAUpdater.h" #include "llvm/Analysis/MemorySSAUpdater.h"

#include "llvm/Analysis/PostDominators.h"

#include "llvm/Analysis/TargetLibraryInfo.h" #include "llvm/Analysis/TargetLibraryInfo.h"

#include "llvm/Analysis/ValueTracking.h" #include "llvm/Analysis/ValueTracking.h"

#include "llvm/IR/BasicBlock.h" #include "llvm/IR/BasicBlock.h"

#include "llvm/IR/Constants.h" #include "llvm/IR/Constants.h"

#include "llvm/IR/DataLayout.h" #include "llvm/IR/DataLayout.h"

#include "llvm/IR/DerivedTypes.h" #include "llvm/IR/DerivedTypes.h"

#include "llvm/IR/Dominators.h" #include "llvm/IR/Dominators.h"

#include "llvm/IR/Function.h" #include "llvm/IR/Function.h"

Show All 25 Lines

using namespace llvm; using namespace llvm;

#define DEBUG_TYPE "memcpyopt" #define DEBUG_TYPE "memcpyopt"

static cl::opt<bool> EnableMemCpyOptWithoutLibcalls( static cl::opt<bool> EnableMemCpyOptWithoutLibcalls(

"enable-memcpyopt-without-libcalls", cl::Hidden, "enable-memcpyopt-without-libcalls", cl::Hidden,

cl::desc("Enable memcpyopt even when libcalls are disabled")); cl::desc("Enable memcpyopt even when libcalls are disabled"));

static cl::opt<unsigned>

MemCpyOptStackMoveThreshold("memcpyopt-stack-move-threshold", cl::Hidden,

cl::desc("Maximum number of basic blocks the "

xbolva00Unsubmitted

Done

Lets avoid yet another off by default oprimization..

xbolva00: Lets avoid yet another off by default oprimization..

nikicUnsubmitted

Done

Setting this to 8, I see no compile-time impact and very little codegen impact on CTMark -- looks like this optimization just doesn't trigger on C/C++ code. So I don't think we need to be concerned about enabling it by default.

nikic: Setting this to 8, I see no compile-time impact and very little codegen impact on CTMark…

"stack-move optimization may examine"),

cl::init(250));

nikicUnsubmitted

Not Done

This limit is pretty large. Do you have any data that shows that there is any substantial increase in optimization opportunities over a more conservative value, such as 8 or 16?

nikic: This limit is pretty large. Do you have any data that shows that there is any substantial…

STATISTIC(NumMemCpyInstr, "Number of memcpy instructions deleted"); STATISTIC(NumMemCpyInstr, "Number of memcpy instructions deleted");

STATISTIC(NumMemSetInfer, "Number of memsets inferred"); STATISTIC(NumMemSetInfer, "Number of memsets inferred");

STATISTIC(NumMoveToCpy, "Number of memmoves converted to memcpy"); STATISTIC(NumMoveToCpy, "Number of memmoves converted to memcpy");

STATISTIC(NumCpyToSet, "Number of memcpys converted to memset"); STATISTIC(NumCpyToSet, "Number of memcpys converted to memset");

STATISTIC(NumCallSlot, "Number of call slot optimizations performed"); STATISTIC(NumCallSlot, "Number of call slot optimizations performed");

STATISTIC(NumStackMove, "Number of stack-move optimizations performed");

namespace { namespace {

/// Represents a range of memset'd bytes with the ByteVal value. /// Represents a range of memset'd bytes with the ByteVal value.

/// This allows us to analyze stores like: /// This allows us to analyze stores like:

/// store 0 -> P+1 /// store 0 -> P+1

/// store 0 -> P+0 /// store 0 -> P+0

/// store 0 -> P+3 /// store 0 -> P+3

▲ Show 20 Lines • Show All 188 Lines • ▼ Show 20 Lines

private: private:

// This transformation requires dominator postdominator info // This transformation requires dominator postdominator info

void getAnalysisUsage(AnalysisUsage &AU) const override { void getAnalysisUsage(AnalysisUsage &AU) const override {

AU.setPreservesCFG(); AU.setPreservesCFG();

AU.addRequired<AssumptionCacheTracker>(); AU.addRequired<AssumptionCacheTracker>();

AU.addRequired<DominatorTreeWrapperPass>(); AU.addRequired<DominatorTreeWrapperPass>();

AU.addPreserved<DominatorTreeWrapperPass>(); AU.addPreserved<DominatorTreeWrapperPass>();

AU.addRequired<PostDominatorTreeWrapperPass>();

AU.addPreserved<PostDominatorTreeWrapperPass>();

AU.addPreserved<GlobalsAAWrapperPass>(); AU.addPreserved<GlobalsAAWrapperPass>();

AU.addRequired<TargetLibraryInfoWrapperPass>(); AU.addRequired<TargetLibraryInfoWrapperPass>();

AU.addRequired<AAResultsWrapperPass>(); AU.addRequired<AAResultsWrapperPass>();

AU.addPreserved<AAResultsWrapperPass>(); AU.addPreserved<AAResultsWrapperPass>();

AU.addRequired<MemorySSAWrapperPass>(); AU.addRequired<MemorySSAWrapperPass>();

AU.addPreserved<MemorySSAWrapperPass>(); AU.addPreserved<MemorySSAWrapperPass>();

} }

}; };

} // end anonymous namespace } // end anonymous namespace

char MemCpyOptLegacyPass::ID = 0; char MemCpyOptLegacyPass::ID = 0;

/// The public interface to this file... /// The public interface to this file...

FunctionPass *llvm::createMemCpyOptPass() { return new MemCpyOptLegacyPass(); } FunctionPass *llvm::createMemCpyOptPass() { return new MemCpyOptLegacyPass(); }

INITIALIZE_PASS_BEGIN(MemCpyOptLegacyPass, "memcpyopt", "MemCpy Optimization", INITIALIZE_PASS_BEGIN(MemCpyOptLegacyPass, "memcpyopt", "MemCpy Optimization",

false, false) false, false)

INITIALIZE_PASS_DEPENDENCY(AssumptionCacheTracker) INITIALIZE_PASS_DEPENDENCY(AssumptionCacheTracker)

INITIALIZE_PASS_DEPENDENCY(DominatorTreeWrapperPass) INITIALIZE_PASS_DEPENDENCY(DominatorTreeWrapperPass)

INITIALIZE_PASS_DEPENDENCY(PostDominatorTreeWrapperPass)

INITIALIZE_PASS_DEPENDENCY(TargetLibraryInfoWrapperPass) INITIALIZE_PASS_DEPENDENCY(TargetLibraryInfoWrapperPass)

INITIALIZE_PASS_DEPENDENCY(AAResultsWrapperPass) INITIALIZE_PASS_DEPENDENCY(AAResultsWrapperPass)

INITIALIZE_PASS_DEPENDENCY(GlobalsAAWrapperPass) INITIALIZE_PASS_DEPENDENCY(GlobalsAAWrapperPass)

INITIALIZE_PASS_DEPENDENCY(MemorySSAWrapperPass) INITIALIZE_PASS_DEPENDENCY(MemorySSAWrapperPass)

INITIALIZE_PASS_END(MemCpyOptLegacyPass, "memcpyopt", "MemCpy Optimization", INITIALIZE_PASS_END(MemCpyOptLegacyPass, "memcpyopt", "MemCpy Optimization",

false, false) false, false)

// Check that V is either not accessible by the caller, or unwinding cannot // Check that V is either not accessible by the caller, or unwinding cannot

▲ Show 20 Lines • Show All 487 Lines • ▼ Show 20 Lines if (LI->isSimple() && LI->hasOneUse() &&

DL.getTypeStoreSize(SI->getOperand(0)->getType()), DL.getTypeStoreSize(SI->getOperand(0)->getType()),

std::min(SI->getAlign(), LI->getAlign()), BAA, GetCall); std::min(SI->getAlign(), LI->getAlign()), BAA, GetCall);

if (changed) { if (changed) {

eraseInstruction(SI); eraseInstruction(SI);

eraseInstruction(LI); eraseInstruction(LI);

++NumMemCpyInstr; ++NumMemCpyInstr;

return true; return true;

} }

// If this is a load-store pair from a stack slot to a stack slot, we

// might be able to perform the stack-move optimization just as we do for

// memcpys from an alloca to an alloca.

if (AllocaInst *DestAlloca =

nikicUnsubmitted

Not Done

auto * with dyn_cast

nikic: `auto *` with dyn_cast

dyn_cast<AllocaInst>(SI->getPointerOperand())) {

if (AllocaInst *SrcAlloca =

dyn_cast<AllocaInst>(LI->getPointerOperand())) {

if (performStackMoveOptzn(LI, SI, DestAlloca, SrcAlloca,

DL.getTypeStoreSize(T))) {

nikicUnsubmitted

Not Done

This casts the TypeSize to int, which will assert for scalable vectors. Please add a test with scalable load/store.

(I think if you pass in TypeSize rather than uint64_t and use that, you can probably support scalable vectors without much effort, though just bailing out is also fine.)

nikic: This casts the TypeSize to int, which will assert for scalable vectors. Please add a test with…

// Avoid invalidating the iterator.

BBI = SI->getNextNonDebugInstruction()->getIterator();

nikicUnsubmitted

Not Done

The iterator is moved forward before the process functions are called, so it already points to the instruction after the store. I think what you want to do here is just ++BBI;, which will get undone by the step back that returning true from here performs. (Ugh, the iterator handling in memcpyopt is a mess...)

nikic: The iterator is moved forward before the process functions are called, so it already points to…

eraseInstruction(SI);

eraseInstruction(LI);

++NumMemCpyInstr;

return true;

}

} }

// The following code creates memset intrinsics out of thin air. Don't do // The following code creates memset intrinsics out of thin air. Don't do

// this if the corresponding libfunc is not available. // this if the corresponding libfunc is not available.

// TODO: We should really distinguish between libcall availability and // TODO: We should really distinguish between libcall availability and

// our ability to introduce intrinsics. // our ability to introduce intrinsics.

if (!(TLI->has(LibFunc_memset) || EnableMemCpyOptWithoutLibcalls)) if (!(TLI->has(LibFunc_memset) || EnableMemCpyOptWithoutLibcalls))

▲ Show 20 Lines • Show All 618 Lines • ▼ Show 20 Lines bool MemCpyOptPass::performMemCpyToMemSetOptzn(MemCpyInst *MemCpy,

auto *LastDef = auto *LastDef =

cast<MemoryDef>(MSSAU->getMemorySSA()->getMemoryAccess(MemCpy)); cast<MemoryDef>(MSSAU->getMemorySSA()->getMemoryAccess(MemCpy));

auto *NewAccess = MSSAU->createMemoryAccessAfter(NewM, LastDef, LastDef); auto *NewAccess = MSSAU->createMemoryAccessAfter(NewM, LastDef, LastDef);

MSSAU->insertDef(cast<MemoryDef>(NewAccess), /*RenameUses=*/true); MSSAU->insertDef(cast<MemoryDef>(NewAccess), /*RenameUses=*/true);

return true; return true;

} }

// These helper classes are used for the stack-move optimization. See the

// comments above performStackMoveOptzn() for more details.

namespace {

// Tracks liveness on the basic block level. This is conservative; see the

// comments above performStackMoveOptzn() for justification.

class BasicBlockLiveness {

// The earliest definition or use we've seen, combined with the three bits

// below.

PointerIntPair<Instruction *, 3> Value;

// Whether the alloca is live-in to the block (from predecessor basic blocks).

using LiveIn = Bitfield::Element<bool, 0, 1>;

// Whether the alloca is live-out from the block (to successor basic blocks).

using LiveOut = Bitfield::Element<bool, 1, 1>;

// Whether there's at least one use of the alloca in this basic block. This

// flag is important for detecting liveness conflicts, since the other

// information stored here isn't sufficient to determine that a use is present

// if a definition precedes it.

using HasUse = Bitfield::Element<bool, 2, 1>;

// Records a new def or use instruction.

void setDefUseInst(Instruction *I) {

assert((!hasDefUseInst() || I->comesBefore(getDefUseInst())) &&

"Tried to overwrite an earlier def or use with a later one!");

Value.setPointer(I);

}

// Sets the flag which determines whether this block has a use.

void setHasUse(bool On) {

unsigned V = Value.getInt();

Bitfield::set<HasUse>(V, On);

Value.setInt(V);

}

public:

BasicBlockLiveness() : Value(nullptr) {}

// Returns the earliest definition or use we've seen in this block.

Instruction *getDefUseInst() const { return Value.getPointer(); }

// Returns true if there's a definition or use of the memory in this block.

bool hasDefUseInst() const { return Value.getPointer() != nullptr; }

// Returns true if the memory is live-in to this block (i.e. live-out of a

// predecessor).

bool isLiveIn() const { return Bitfield::get<LiveIn>(Value.getInt()); }

// Returns true if the memory is live-out of this block (i.e. live-in to a

// successor).

bool isLiveOut() const { return Bitfield::get<LiveOut>(Value.getInt()); }

// Returns true if there is at least one use of the memory in this block.

bool hasUse() const { return Bitfield::get<HasUse>(Value.getInt()); }

// Returns true if this alloca is live anywhere in this block or has

// at least one use in it. If this returns false, the alloca is

// guaranteed to be completely dead within this basic block.

bool isLiveAnywhereOrHasUses() const {

return isLiveIn() || isLiveOut() || hasUse();

}

// Records a new definition or use of the alloca being tracked within this

// basic block.

void update(Instruction *I, bool IsDef) {

if (!hasDefUseInst() || I->comesBefore(getDefUseInst())) {

setDefUseInst(I);

setLiveIn(!IsDef);

}

if (!IsDef)

setHasUse(true);

}

// Adjusts the live-in flag for this block.

void setLiveIn(bool On) {

unsigned V = Value.getInt();

Bitfield::set<LiveIn>(V, On);

Value.setInt(V);

}

// Adjusts the live-out flag for this block.

void setLiveOut(bool On) {

unsigned V = Value.getInt();

Bitfield::set<LiveOut>(V, On);

Value.setInt(V);

}

};

using BasicBlockLivenessMap = DenseMap<BasicBlock *, BasicBlockLiveness>;

// Tracks uses of an alloca for the purposes of the stack-move optimization.

// This class does three things: (1) it makes sure that the alloca is never

// captured; (2) it records defs and uses of the alloca in a map for the

// liveness analysis to use; (3) it finds the nearest dominator and

// postdominator of all uses of this alloca for the purpose of lifetime

// intrinsic "shrink wrapping" if the optimization goes through.

class StackMoveTracker : public CaptureTracker {

// Data layout info.

const DataLayout &DL;

// Dominator tree info.

DominatorTree &DT;

// Postdominator tree info.

PostDominatorTree &PDT;

// The memcpy instruction.

Instruction *Store;

// The size of the underlying alloca, in bits.

TypeSize AllocaSizeInBits;

public:

// Keeps track of the lifetime intrinsics that we find. We'll need to remove

// these if the optimization goes through.

SmallVector<IntrinsicInst *, 4> LifetimeMarkers;

// Keeps track of instructions that have !noalias metadata. We need to drop

// that metadata if the optimization succeeds.

std::vector<Instruction *> NoAliasInstrs;

// Liveness information for this alloca, tracked on the basic block level.

BasicBlockLivenessMap BBLiveness;

// Liveness information for this alloca, tracked on the instruction level for

// the single basic block containing the memcpy.

DenseMap<Instruction *, bool> StoreBBDefUseMap;

// The nearest basic block that dominates all uses of the alloca that we've

// seen so far. This is only null if we haven't seen any uses yet.

BasicBlock *Dom;

// The nearest basic block that postdominates all uses of the alloca that

// we've seen so far. This can be null if there's no such postdominator.

BasicBlock *PostDom;

// The user that caused us to bail out, if any.

User *AbortingUser;

// Whether we should bail out of the stack-move optimization.

bool Abort;

StackMoveTracker(Instruction *Store, AllocaInst *Alloca, DominatorTree &DT,

PostDominatorTree &PDT)

: DL(Store->getModule()->getDataLayout()), DT(DT), PDT(PDT), Store(Store),

AllocaSizeInBits(*Alloca->getAllocationSizeInBits(DL)), Dom(nullptr),

PostDom(nullptr), AbortingUser(nullptr), Abort(false) {}

private:

// Called whenever we see a use or a definition of the alloca. If IsDef is

// true, this is a def; otherwise, it's a use.

void recordUseOrDef(Instruction *I, bool IsDef) {

BasicBlock *BB = I->getParent();

BBLiveness[BB].update(I, IsDef);

// For the basic block containing the store, track liveness on the

// instruction level.

if (BB == Store->getParent())

StoreBBDefUseMap[I] = IsDef;

// If the instruction has !noalias metadata, record it so that we can delete

// the metadata if the optimization succeeds.

if (I->hasMetadata(LLVMContext::MD_noalias))

NoAliasInstrs.push_back(I);

}

public:

// If there are too many uses, just bail out to avoid spending excessive

// compile time.

void tooManyUses() override { Abort = true; }

// If the pointer was captured, we can't usefully track it, so just bail out.

bool captured(const Use *U) override {

if (!Abort) {

AbortingUser = U->getUser();

Abort = true;

return true;

}

return false;

}

// Classifies a use as either a true use or a definition, records that, and

// updates the nearest common dominator and postdominator accordingly.

bool visitUse(const Use *U) override {

Instruction *I = cast<Instruction>(U->getUser());

BasicBlock *BB = I->getParent();

// GEPs don't count as uses of the alloca memory (just of the pointer to the

// alloca), so we don't care about them here.

if (isa<GetElementPtrInst>(I) && U->getOperandNo() == 0)

return false;

// Update the nearest common dominator and postdominator. We know that this

// is the first use if Dom is null, because multiple blocks always have a

// mutual common dominator (though not necessarily a common postdominator).

if (Dom == nullptr) {

Dom = PostDom = BB;

} else {

Dom = DT.findNearestCommonDominator(Dom, BB);

if (PostDom != nullptr)

PostDom = PDT.findNearestCommonDominator(PostDom, BB);

}

// If an instruction overwrites all bytes of the alloca, it's a definition,

// not a use. Detect those cases here.

if (IntrinsicInst *II = dyn_cast<IntrinsicInst>(I)) {

if (II->isLifetimeStartOrEnd()) {

// We treat a call to a lifetime intrinsic that covers the entire alloca

// as a definition, since both llvm.lifetime.start and llvm.lifetime.end

// intrinsics conceptually fill all the bytes of the alloca with an

// undefined value. We also note these locations of these intrinsic

// calls so that we can delete them later if the optimization succeeds.

int64_t Size = cast<ConstantInt>(II->getArgOperand(0))->getSExtValue();

if (Size < 0 || uint64_t(Size) * 8 == AllocaSizeInBits) {

recordUseOrDef(II, true);

LifetimeMarkers.push_back(II);

return false;

}

} else if (MemIntrinsic *MI = dyn_cast<MemIntrinsic>(II)) {

if (MI->getArgOperandNo(U) == 0) {

if (ConstantInt *CI = dyn_cast<ConstantInt>(MI->getLength())) {

if (CI->getZExtValue() * 8 == AllocaSizeInBits.getFixedSize()) {

// Memcpy, memmove, and memset instructions that fill every byte

// of the alloca are definitions.

recordUseOrDef(MI, true);

return false;

}

} else if (StoreInst *SI = dyn_cast<StoreInst>(I)) {

// Stores that overwrite all bytes of the alloca are definitions.

if (U->getOperandNo() == 1 &&

DL.getTypeStoreSizeInBits(SI->getValueOperand()->getType()) ==

AllocaSizeInBits.getFixedSize()) {

recordUseOrDef(SI, true);

return false;

}

// Otherwise, this instruction is a use. Make a note of that fact and

// continue.

recordUseOrDef(I, false);

return false;

}

};

} // namespace

// Performs liveness dataflow analysis for an alloca at the basic block level as

// part of the stack-move optimization.

// This implements the "backwards variable-at-a-time" variant of liveness

// analysis, propagating liveness information backwards from uses until it sees

// a basic block with a definition or one in which the variable is already

// live-out. As implemented, this is a linear-time algorithm, because it only

// visits every basic block at most once and the number of tracked variables is

// constant (two--the source and destination of the memcpy).

// In order to avoid spending too much compile time, this operates on the level

// of basic blocks instead of instructions, making it a conservative

// analysis. See the comments in performStackMoveOptzn() for more details.

// Returns true if the analysis succeeded or false if it failed due to examining

// too many basic blocks.

static bool computeLiveness(BasicBlockLivenessMap &BBLiveness) {

// Start by initializing a worklist with all basic blocks that are live-in

// (i.e. they potentially need to propagate liveness to their predecessors).

SmallVector<BasicBlock *, 8> Worklist;

for (auto &Pair : BBLiveness) {

if (Pair.second.isLiveIn())

Worklist.push_back(Pair.first);

}

// Iterate until we have no more blocks to process.

unsigned Count = 0;

while (!Worklist.empty()) {

BasicBlock *BB = Worklist.back();

Worklist.pop_back();

// Cap the number of basic blocks we examine in order to avoid blowing up

// compile time. The default threshold was empirically determined to be

// sufficient 90% of the time in the Rust compiler.

++Count;

if (Count >= MemCpyOptStackMoveThreshold) {

LLVM_DEBUG(

dbgs()

<< "Stack Move: Exceeded max basic block threshold, bailing\n");

return false;

}

// We know that the alloca must be live-in to this basic block, or else we

// wouldn't have added the block to the worklist in the first place.

assert(BBLiveness.lookup(BB).isLiveIn() &&

"Shouldn't have added a BB that wasn't live-in to the worklist!");

// Propagate liveness back to predecessors.

for (BasicBlock *Pred : predecessors(BB)) {

BasicBlockLiveness PredLiveness = BBLiveness.lookup(Pred);

// Skip predecessors in which the variable is already known to be

// live-out.

if (!PredLiveness.isLiveOut()) {

PredLiveness.setLiveOut(true);

// Don't enqueue predecessors if they contain direct defs or uses of the

// variable. If a predecessor contains a use of the variable that

// dominates all the other uses or defs of the variable within that

// block, then we already added that predecessor to the worklist at the

// beginning of this procedure, so we don't need to add it again. If, on

// the other hand, the predecessor contains a definition of the variable

// that dominates all the other uses or defs of the variable within the

// block, then the predecessor won't propagate any liveness to *its*

// predecessors, so we don't need to enqueue it either.

if (!PredLiveness.hasDefUseInst()) {

// We know that this predecessor is a basic block that contains

// neither defs nor uses of the variable and in which the variable is

// live-out. So the variable must be live-in to this predecessor too.

PredLiveness.setLiveIn(true);

Worklist.push_back(Pred);

}

BBLiveness[Pred] = PredLiveness;

}

return true;

}

// Returns true if the alloca is at the start of the entry block, modulo a few

// instructions like GEPs and debug info. We only perform the stack-move

// optimization for such allocas, which simplifies the logic.

static bool allocaIsAtStartOfEntryBlock(AllocaInst *AI) {

BasicBlock *BB = AI->getParent();

if (!BB->isEntryBlock()) {

LLVM_DEBUG(dbgs() << "Stack Move: Alloca isn't in entry block\n");

return false;

}

for (Instruction &I : *BB) {

if (&I == AI)

return true;

if (isa<AllocaInst>(I) || isa<GetElementPtrInst>(I) ||

isa<DbgInfoIntrinsic>(I) || I.isLifetimeStartOrEnd()) {

continue;

}

LLVM_DEBUG(

dbgs()

<< "Stack Move: Alloca isn't at start of entry block\n Instruction:"

<< I << "\n");

return false;

}

llvm_unreachable("Alloca wasn't found in its parent basic block");

}

// Attempts to optimize the pattern whereby memory is copied from an alloca to

// another alloca, where the two allocas aren't live simultaneously except

// during the transfer. If successful, the two allocas can be merged into one

// and the transfer can be deleted. This pattern is generated frequently in

// Rust, due to the ubiquity of move operations in that language.

// We choose to limit this optimization to cases in which neither alloca was

// captured, in order to avoid interprocedural analysis. As it turns out, the

// same CaptureTracking framework that is needed to detect this condition also

// turns out to be useful for gathering definitions and uses. So our general

// approach is to run CaptureTracking to find captures and simultaneously gather

// up uses and defs, followed by the standard liveness dataflow analysis to

// ensure that the source and destination aren't simultaneously live anywhere.

// To avoid blowing up compile time, we perform the liveness analysis

// conservatively on the basic block level rather than on the instruction level,

// with the exception of the basic block containing the memcpy itself. This

// means that any basic block that contains a use of both the source and

// destination causes us to conservatively bail out, even if the source and

// destination aren't actually simultaneously live. Empirically, this happens

// less than 2% of the time in typical Rust code, making the

// precision/compile-time tradeoff well worth it.

// Once we determine that the optimization is safe to perform, we replace all

// uses of the destination alloca with the source alloca. We also "shrink wrap"

// the lifetime markers of the single merged alloca to the nearest dominating

// and postdominating basic block. Note that the "shrink wrapping" procedure is

// a safe transformation only because we restrict the scope of this optimization

// to allocas that aren't captured.

bool MemCpyOptPass::performStackMoveOptzn(Instruction *Load, Instruction *Store,

AllocaInst *DestAlloca,

AllocaInst *SrcAlloca,

arsenmUnsubmitted

Not Done

Should also check the address space of the allocas match

arsenm: Should also check the address space of the allocas match

uint64_t Size) {

// If the optimization is disabled, forget it.

if (MemCpyOptStackMoveThreshold == 0)

return false;

LLVM_DEBUG(dbgs() << "Stack Move: Attempting to optimize:\n"

<< *Store << "\n");

// Make sure the two allocas are in the same address space.

if (SrcAlloca->getAddressSpace() != DestAlloca->getAddressSpace()) {

LLVM_DEBUG(dbgs() << "Stack Move: Address space mismatch\n");

return false;

}

// Calculate the static size of the allocas to be merged, bailing out if we

// can't.

const DataLayout &DL = DestAlloca->getModule()->getDataLayout();

arsenmUnsubmitted

Done

Can reflow to early exit if either one is dynamic

arsenm: Can reflow to early exit if either one is dynamic

std::optional<TypeSize> SrcSize = SrcAlloca->getAllocationSizeInBits(DL);

nikicUnsubmitted

Not Done

I've added a getAllocationSize() method in https://github.com/llvm/llvm-project/commit/a6a526ec5465d712db11fdbf5ed5fce8da0722cf to avoid the *8 back and forth here.

nikic: I've added a getAllocationSize() method in https://github.com/llvm/llvm…

if (!SrcSize || SrcSize->isScalable() ||

Size * 8 != SrcSize->getFixedSize()) {

arsenmUnsubmitted

Done

you could adjust the alignments up

arsenm: you could adjust the alignments up

LLVM_DEBUG(dbgs() << "Stack Move: Source alloca size mismatch\n");

return false;

}

std::optional<TypeSize> DestSize = DestAlloca->getAllocationSizeInBits(DL);

if (!DestSize || DestSize->isScalable() ||

Size * 8 != DestSize->getFixedSize()) {

LLVM_DEBUG(dbgs() << "Stack Move: Destination alloca size mismatch\n");

return false;

}

// Make sure the allocas are at the start of the entry block. This lets us

// avoid having to do annoying checks to ensure the allocas dominate their

// uses, as well as problems related to llvm.stacksave and llvm.stackrestore

// intrinsics.

nikicUnsubmitted

Not Done

For stacksave/stackrestore it would be fine to check just isStaticAlloca() (in fact, you might want to do that anyway for inalloca allocas, though they might get excluded by other checks already). Even if the alloca is not at the start of the entry block, it will won't get affected by stacksave/stackrestore.

For domination, as you skip past GEP instructions (which might be based on the allocas), I'm not sure this would actually avoid all possible domination issues. What I'd do is check isStaticAlloca here, and then in the transform move the retained alloca up if necessary.

nikic: For stacksave/stackrestore it would be fine to check just isStaticAlloca() (in fact, you might…

if (!allocaIsAtStartOfEntryBlock(DestAlloca) ||

!allocaIsAtStartOfEntryBlock(SrcAlloca)) {

return false;

}

// Gather up all uses of the destination. Make sure that it wasn't captured

// anywhere.

StackMoveTracker DestTracker(Store, DestAlloca, *DT, *PDT);

PointerMayBeCaptured(DestAlloca, &DestTracker);

if (DestTracker.Abort) {

LLVM_DEBUG({

dbgs() << "Stack Move: Destination was captured:";

if (DestTracker.AbortingUser != nullptr)

dbgs() << "\n" << *DestTracker.AbortingUser;

dbgs() << "\n";

});

return false;

}

// Likewise, collect all uses of the source, again making sure that it wasn't

arsenmUnsubmitted

Done

Can merge these into one LLVM_DEBUG block

arsenm: Can merge these into one LLVM_DEBUG block

// captured anywhere.

StackMoveTracker SrcTracker(Store, SrcAlloca, *DT, *PDT);

PointerMayBeCaptured(SrcAlloca, &SrcTracker);

if (SrcTracker.Abort) {

LLVM_DEBUG({

dbgs() << "Stack Move: Source was captured:";

if (SrcTracker.AbortingUser != nullptr)

dbgs() << "\n" << *SrcTracker.AbortingUser;

dbgs() << "\n";

});

return false;

}

// Compute liveness on the basic block level.

BasicBlock *StoreBB = Store->getParent();

if (!computeLiveness(DestTracker.BBLiveness) ||

!computeLiveness(SrcTracker.BBLiveness)) {

nikicUnsubmitted

Not Done

nit: Omit braces.

nikic: nit: Omit braces.

return false;

}

// Check for liveness conflicts on the basic block level (with the exception

// of the basic block containing the memcpy). This is conservative compared to

// computing liveness on the instruction level. The precision loss is only 2%

// on the Rust compiler, however, making this compile-time tradeoff

// worthwhile.

for (auto DestPair : DestTracker.BBLiveness) {

nikicUnsubmitted

Not Done

// worthwhile.

- for (auto DestPair : DestTracker.BBLiveness) {

+ for (const auto &[BB, DestLiveness] : DestTracker.BBLiveness) {

BasicBlock *BB = DestPair.first;

or so

nikic: or so

BasicBlock *BB = DestPair.first;

if (BB != StoreBB && DestPair.second.isLiveAnywhereOrHasUses() &&

SrcTracker.BBLiveness.lookup(BB).isLiveAnywhereOrHasUses()) {

LLVM_DEBUG(dbgs() << "Stack Move: Detected liveness conflict, "

"bailing:\n Basic Block: "

<< BB->getNameOrAsOperand() << "\n");

return false;

}

// Check liveness inside the single basic block containing the load and

// store.

bool DestLive = DestTracker.BBLiveness.lookup(StoreBB).isLiveOut();

bool SrcLive = SrcTracker.BBLiveness.lookup(StoreBB).isLiveOut();

for (auto &BI : reverse(*StoreBB)) {

nikicUnsubmitted

Not Done

This inspects a potentially large number of instructions.

nikic: This inspects a potentially large number of instructions.

if (DestLive && SrcLive && &BI != Load && &BI != Store) {

LLVM_DEBUG(

dbgs() << "Stack Move: Detected liveness conflict inside the basic "

"block containing the memcpy, bailing:\n Instruction: "

<< BI << "\n");

return false;

}

auto DestDefUseIt = DestTracker.StoreBBDefUseMap.find(&BI);

auto SrcDefUseIt = SrcTracker.StoreBBDefUseMap.find(&BI);

if (DestDefUseIt != DestTracker.StoreBBDefUseMap.end())

DestLive = !DestDefUseIt->second;

if (SrcDefUseIt != SrcTracker.StoreBBDefUseMap.end())

SrcLive = !SrcDefUseIt->second;

}

// We can do the transformation. First, align the allocas appropriately.

SrcAlloca->setAlignment(

std::max(SrcAlloca->getAlign(), DestAlloca->getAlign()));

// Merge the two allocas.

arsenmUnsubmitted

Done

You should query the pointer size for the pointer type

arsenm: You should query the pointer size for the pointer type

DestAlloca->replaceAllUsesWith(SrcAlloca);

// Drop metadata on the source alloca.

SrcAlloca->dropUnknownNonDebugMetadata();

// Now "shrink wrap" the lifetimes. Begin by creating a new lifetime start

// marker at the start of the nearest common dominator of all defs and uses of

// the merged alloca.

// We could be more precise here and query AA to find the latest point in the

// basic block at which to place the call to the intrinsic, but that doesn't

// seem worth it at the moment.

assert(DestTracker.Dom != nullptr && SrcTracker.Dom != nullptr &&

"There must be a common dominator for all defs and uses of the source "

"and destination");

Type *IntPtrTy =

Type::getIntNTy(SrcAlloca->getContext(), DL.getPointerSizeInBits());

nikicUnsubmitted

Not Done

You want the index size of the alloca address space here, not the pointer size of the default address space.

nikic: You want the index size of the alloca address space here, not the pointer size of the default…

ConstantInt *CI = cast<ConstantInt>(ConstantInt::get(IntPtrTy, Size));

BasicBlock *Dom =

DT->findNearestCommonDominator(DestTracker.Dom, SrcTracker.Dom);

BasicBlock::iterator InsertionPt = Dom->getFirstNonPHIOrDbgOrAlloca();

if (Dom == SrcAlloca->getParent() && InsertionPt != Dom->end() &&

InsertionPt->comesBefore(SrcAlloca)) {

nikicUnsubmitted

Not Done

This looks a bit odd, it might be better to use the instruction-level common dominator instead? I've landed https://github.com/llvm/llvm-project/commit/7f0de9573f758f5f9108795850337a5acbd17eef to add a convenience API for this.

nikic: This looks a bit odd, it might be better to use the instruction-level common dominator instead?

// Make sure that the alloca dominates the lifetime start intrinsic.

// Usually, the call to getFirstNonPHIOrDbgOrAlloca() above ensures that,

// but if the allocas aren't all at the start of the basic block we might

// have to fix things up.

InsertionPt = ++BasicBlock::iterator(SrcAlloca);

}

arsenmUnsubmitted

Done

Swap condition? PostDomNode ? getBlock : null

arsenm: Swap condition? PostDomNode ? getBlock : null

IRBuilder<>(Dom, InsertionPt).CreateLifetimeStart(SrcAlloca, CI);

// Next, create a new lifetime end marker at the end of the nearest common

// postdominator of all defs and uses of the merged alloca, if there is one.

// If there's no such postdominator, just don't bother; we could create one at

// each exit block, but that'd be essentially semantically meaningless.

if (DestTracker.PostDom != nullptr && SrcTracker.PostDom != nullptr) {

if (BasicBlock *PostDom = PDT->findNearestCommonDominator(

DestTracker.PostDom, SrcTracker.PostDom)) {

// Edge case: It's possible that the terminating instruction of the

// postdominating basic block is itself an invoke instruction that uses

// the alloca. Placing the lifetime end intrinsic before that call would

// be incorrect. Detect this situation and choose the next postdominator

// instead.

MemoryLocation Loc = MemoryLocation::getBeforeOrAfter(SrcAlloca);

if (isModOrRefSet(AA->getModRefInfo(PostDom->getTerminator(), Loc))) {

auto PostDomNode = (*PDT)[PostDom]->getIDom();

PostDom = PostDomNode != nullptr ? PostDomNode->getBlock() : nullptr;

}

// Add the lifetime end intrinsic.

if (PostDom != nullptr) {

IRBuilder<>(PostDom, BasicBlock::iterator(PostDom->getTerminator()))

.CreateLifetimeEnd(SrcAlloca, CI);

}

// Remove all other lifetime markers.

for (IntrinsicInst *II : DestTracker.LifetimeMarkers)

eraseInstruction(II);

for (IntrinsicInst *II : SrcTracker.LifetimeMarkers)

eraseInstruction(II);

// As this transformation can cause memory accesses that didn't previously

// alias to begin to alias one another, we remove !noalias metadata from any

// uses of either alloca. This is conservative, but more precision doesn't

// seem worthwhile right now.

for (Instruction *I : DestTracker.NoAliasInstrs)

I->setMetadata(LLVMContext::MD_noalias, nullptr);

for (Instruction *I : SrcTracker.NoAliasInstrs)

I->setMetadata(LLVMContext::MD_noalias, nullptr);

// We're done! We don't need to delete the memcpy because later passes will do

// it.

LLVM_DEBUG(dbgs() << "Stack Move: Performed stack-move optimization\n");

++NumStackMove;

return true;

}

/// Perform simplification of memcpy's. If we have memcpy A /// Perform simplification of memcpy's. If we have memcpy A

/// which copies X to Y, and memcpy B which copies Y to Z, then we can rewrite /// which copies X to Y, and memcpy B which copies Y to Z, then we can rewrite

/// B to be a memcpy from X to Z (or potentially a memmove, depending on /// B to be a memcpy from X to Z (or potentially a memmove, depending on

/// circumstances). This allows later passes to remove the first memcpy /// circumstances). This allows later passes to remove the first memcpy

/// altogether. /// altogether.

bool MemCpyOptPass::processMemCpy(MemCpyInst *M, BasicBlock::iterator &BBI) { bool MemCpyOptPass::processMemCpy(MemCpyInst *M, BasicBlock::iterator &BBI) {

// We can only optimize non-volatile memcpy's. // We can only optimize non-volatile memcpy's.

if (M->isVolatile()) return false; if (M->isVolatile()) return false;

▲ Show 20 Lines • Show All 41 Lines • ▼ Show 20 Lines if (auto *MD = dyn_cast<MemoryDef>(DestClobber))

if (auto *MDep = dyn_cast_or_null<MemSetInst>(MD->getMemoryInst())) if (auto *MDep = dyn_cast_or_null<MemSetInst>(MD->getMemoryInst()))

if (DestClobber->getBlock() == M->getParent()) if (DestClobber->getBlock() == M->getParent())

if (processMemSetMemCpyDependence(M, MDep, BAA)) if (processMemSetMemCpyDependence(M, MDep, BAA))

return true; return true;

MemoryAccess *SrcClobber = MSSA->getWalker()->getClobberingMemoryAccess( MemoryAccess *SrcClobber = MSSA->getWalker()->getClobberingMemoryAccess(

AnyClobber, MemoryLocation::getForSource(M), BAA); AnyClobber, MemoryLocation::getForSource(M), BAA);

// There are four possible optimizations we can do for memcpy: // There are five possible optimizations we can do for memcpy:

// a) memcpy-memcpy xform which exposes redundance for DSE. // a) memcpy-memcpy xform which exposes redundance for DSE.

// b) call-memcpy xform for return slot optimization. // b) call-memcpy xform for return slot optimization.

// c) memcpy from freshly alloca'd space or space that has just started // c) memcpy from freshly alloca'd space or space that has just started

// its lifetime copies undefined data, and we can therefore eliminate // its lifetime copies undefined data, and we can therefore eliminate

// the memcpy in favor of the data that was already at the destination. // the memcpy in favor of the data that was already at the destination.

// d) memcpy from a just-memset'd source can be turned into memset. // d) memcpy from a just-memset'd source can be turned into memset.

// e) elimination of memcpy via stack-move optimization.

if (auto *MD = dyn_cast<MemoryDef>(SrcClobber)) { if (auto *MD = dyn_cast<MemoryDef>(SrcClobber)) {

if (Instruction *MI = MD->getMemoryInst()) { if (Instruction *MI = MD->getMemoryInst()) {

if (auto *CopySize = dyn_cast<ConstantInt>(M->getLength())) { if (auto *CopySize = dyn_cast<ConstantInt>(M->getLength())) {

if (auto *C = dyn_cast<CallInst>(MI)) { if (auto *C = dyn_cast<CallInst>(MI)) {

// FIXME: Can we pass in either of dest/src alignment here instead // FIXME: Can we pass in either of dest/src alignment here instead

// of conservatively taking the minimum? // of conservatively taking the minimum?

Align Alignment = std::min(M->getDestAlign().valueOrOne(), Align Alignment = std::min(M->getDestAlign().valueOrOne(),

M->getSourceAlign().valueOrOne()); M->getSourceAlign().valueOrOne());

if (performCallSlotOptzn(M, M, M->getDest(), M->getSource(), if (performCallSlotOptzn(M, M, M->getDest(), M->getSource(),

TypeSize::getFixed(CopySize->getZExtValue()), TypeSize::getFixed(CopySize->getZExtValue()),

Alignment, BAA, Alignment, BAA,

[C]() -> CallInst * { return C; })) { [C]() -> CallInst * { return C; })) {

LLVM_DEBUG(dbgs() << "Performed call slot optimization:\n" LLVM_DEBUG(dbgs() << "Performed call slot optimization:\n"

<< " call: " << *C << "\n" << " call: " << *C << "\n"

<< " memcpy: " << *M << "\n"); << " memcpy: " << *M << "\n");

eraseInstruction(M); eraseInstruction(M);

++NumMemCpyInstr; ++NumMemCpyInstr;

return true; return true;

} }

if (auto *MDep = dyn_cast<MemCpyInst>(MI)) if (auto *MDep = dyn_cast<MemCpyInst>(MI)) {

return processMemCpyMemCpyDependence(M, MDep, BAA); if (processMemCpyMemCpyDependence(M, MDep, BAA))

return true;

}

if (auto *MDep = dyn_cast<MemSetInst>(MI)) { if (auto *MDep = dyn_cast<MemSetInst>(MI)) {

if (performMemCpyToMemSetOptzn(M, MDep, BAA)) { if (performMemCpyToMemSetOptzn(M, MDep, BAA)) {

LLVM_DEBUG(dbgs() << "Converted memcpy to memset\n"); LLVM_DEBUG(dbgs() << "Converted memcpy to memset\n");

eraseInstruction(M); eraseInstruction(M);

++NumCpyToSet; ++NumCpyToSet;

return true; return true;

} }

if (hasUndefContents(MSSA, BAA, M->getSource(), MD, M->getLength())) { if (hasUndefContents(MSSA, BAA, M->getSource(), MD, M->getLength())) {

LLVM_DEBUG(dbgs() << "Removed memcpy from undef\n"); LLVM_DEBUG(dbgs() << "Removed memcpy from undef\n");

eraseInstruction(M); eraseInstruction(M);

++NumMemCpyInstr; ++NumMemCpyInstr;

return true; return true;

} }

// If the transfer is from a stack slot to a stack slot, then we may be able

// to perform the stack-move optimization. See the comments in

// performStackMoveOptzn() for more details.

AllocaInst *DestAlloca = dyn_cast<AllocaInst>(M->getDest());

nikicUnsubmitted

Not Done

nit: auto * for dyn_cast

nikic: nit: `auto *` for dyn_cast

if (DestAlloca == nullptr)

return false;

nikicUnsubmitted

Done

stripPointerCasts() is sufficient. You'll never get an alloca looking through aliases.

nikic: stripPointerCasts() is sufficient. You'll never get an alloca looking through aliases.

arsenmUnsubmitted

Done

Is there any real reason for a strip here with opaque pointers? This only does anything for addrspacecast?

arsenm: Is there any real reason for a strip here with opaque pointers? This only does anything for…

arsenmUnsubmitted

Done

More early returns

arsenm: More early returns

AllocaInst *SrcAlloca = dyn_cast<AllocaInst>(M->getSource());

if (SrcAlloca == nullptr)

return false;

ConstantInt *Len = dyn_cast<ConstantInt>(M->getLength());

if (Len == nullptr)

return false;

if (performStackMoveOptzn(M, M, DestAlloca, SrcAlloca, Len->getZExtValue())) {

// Avoid invalidating the iterator.

BBI = M->getNextNonDebugInstruction()->getIterator();

eraseInstruction(M);

++NumMemCpyInstr;

return true;

}

return false; return false;

} }

/// Transforms memmove calls to memcpy calls when the src/dst are guaranteed /// Transforms memmove calls to memcpy calls when the src/dst are guaranteed

/// not to alias. /// not to alias.

bool MemCpyOptPass::processMemMove(MemMoveInst *M) { bool MemCpyOptPass::processMemMove(MemMoveInst *M) {

// See if the source could be modified by this memmove potentially. // See if the source could be modified by this memmove potentially.

if (isModSet(AA->getModRefInfo(M, MemoryLocation::getForSource(M)))) if (isModSet(AA->getModRefInfo(M, MemoryLocation::getForSource(M))))

▲ Show 20 Lines • Show All 139 Lines • ▼ Show 20 Lines bool MemCpyOptPass::iterateOnFunction(Function &F) {

return MadeChange; return MadeChange;

} }

PreservedAnalyses MemCpyOptPass::run(Function &F, FunctionAnalysisManager &AM) { PreservedAnalyses MemCpyOptPass::run(Function &F, FunctionAnalysisManager &AM) {

auto &TLI = AM.getResult<TargetLibraryAnalysis>(F); auto &TLI = AM.getResult<TargetLibraryAnalysis>(F);

auto *AA = &AM.getResult<AAManager>(F); auto *AA = &AM.getResult<AAManager>(F);

auto *AC = &AM.getResult<AssumptionAnalysis>(F); auto *AC = &AM.getResult<AssumptionAnalysis>(F);

auto *DT = &AM.getResult<DominatorTreeAnalysis>(F); auto *DT = &AM.getResult<DominatorTreeAnalysis>(F);

auto *PDT = &AM.getResult<PostDominatorTreeAnalysis>(F);

auto *MSSA = &AM.getResult<MemorySSAAnalysis>(F); auto *MSSA = &AM.getResult<MemorySSAAnalysis>(F);

bool MadeChange = runImpl(F, &TLI, AA, AC, DT, &MSSA->getMSSA()); bool MadeChange = runImpl(F, &TLI, AA, AC, DT, PDT, &MSSA->getMSSA());

if (!MadeChange) if (!MadeChange)

return PreservedAnalyses::all(); return PreservedAnalyses::all();

PreservedAnalyses PA; PreservedAnalyses PA;

PA.preserveSet<CFGAnalyses>(); PA.preserveSet<CFGAnalyses>();

PA.preserve<MemorySSAAnalysis>(); PA.preserve<MemorySSAAnalysis>();

return PA; return PA;

} }

bool MemCpyOptPass::runImpl(Function &F, TargetLibraryInfo *TLI_, bool MemCpyOptPass::runImpl(Function &F, TargetLibraryInfo *TLI_,

AliasAnalysis *AA_, AssumptionCache *AC_, AliasAnalysis *AA_, AssumptionCache *AC_,

DominatorTree *DT_, MemorySSA *MSSA_) { DominatorTree *DT_, PostDominatorTree *PDT_,

MemorySSA *MSSA_) {

bool MadeChange = false; bool MadeChange = false;

TLI = TLI_; TLI = TLI_;

AA = AA_; AA = AA_;

AC = AC_; AC = AC_;

DT = DT_; DT = DT_;

PDT = PDT_;

MSSA = MSSA_; MSSA = MSSA_;

MemorySSAUpdater MSSAU_(MSSA_); MemorySSAUpdater MSSAU_(MSSA_);

MSSAU = &MSSAU_; MSSAU = &MSSAU_;

while (true) { while (true) {

if (!iterateOnFunction(F)) if (!iterateOnFunction(F))

break; break;

MadeChange = true; MadeChange = true;

Show All 9 Lines

bool MemCpyOptLegacyPass::runOnFunction(Function &F) { bool MemCpyOptLegacyPass::runOnFunction(Function &F) {

if (skipFunction(F)) if (skipFunction(F))

return false; return false;

auto *TLI = &getAnalysis<TargetLibraryInfoWrapperPass>().getTLI(F); auto *TLI = &getAnalysis<TargetLibraryInfoWrapperPass>().getTLI(F);

auto *AA = &getAnalysis<AAResultsWrapperPass>().getAAResults(); auto *AA = &getAnalysis<AAResultsWrapperPass>().getAAResults();

auto *AC = &getAnalysis<AssumptionCacheTracker>().getAssumptionCache(F); auto *AC = &getAnalysis<AssumptionCacheTracker>().getAssumptionCache(F);

auto *DT = &getAnalysis<DominatorTreeWrapperPass>().getDomTree(); auto *DT = &getAnalysis<DominatorTreeWrapperPass>().getDomTree();

auto *PDT = &getAnalysis<PostDominatorTreeWrapperPass>().getPostDomTree();

auto *MSSA = &getAnalysis<MemorySSAWrapperPass>().getMSSA(); auto *MSSA = &getAnalysis<MemorySSAWrapperPass>().getMSSA();

return Impl.runImpl(F, TLI, AA, AC, DT, MSSA); return Impl.runImpl(F, TLI, AA, AC, DT, PDT, MSSA);

} }

llvm/test/Analysis/ScopedNoAliasAA/alias-scope-merging.ll

	; RUN: opt < %s -S -passes=memcpyopt \| FileCheck --match-full-lines %s			; RUN: opt < %s -S -passes=memcpyopt -memcpyopt-stack-move-threshold=0 \| FileCheck --match-full-lines %s

	; Alias scopes are merged by taking the intersection of domains, then the union of the scopes within those domains			; Alias scopes are merged by taking the intersection of domains, then the union of the scopes within those domains
	define i8 @test(i8 %input) {			define i8 @test(i8 %input) {
	%tmp = alloca i8			%tmp = alloca i8
	%dst = alloca i8			%dst = alloca i8
	%src = alloca i8			%src = alloca i8
	; CHECK: call void @llvm.memcpy.p0.p0.i64(ptr align 8 %dst, ptr align 8 %src, i64 1, i1 false), !alias.scope ![[SCOPE:[0-9]+]]			; CHECK: call void @llvm.memcpy.p0.p0.i64(ptr align 8 %dst, ptr align 8 %src, i64 1, i1 false), !alias.scope ![[SCOPE:[0-9]+]]
	call void @llvm.lifetime.start.p0(i64 8, ptr nonnull %src), !noalias !4			call void @llvm.lifetime.start.p0(i64 8, ptr nonnull %src), !noalias !4
	Show All 28 Lines

llvm/test/Other/new-pm-defaults.ll

	Show First 20 Lines • Show All 184 Lines • ▼ Show 20 Lines
	; CHECK-O-NEXT: Running pass: LoopFullUnrollPass			; CHECK-O-NEXT: Running pass: LoopFullUnrollPass
	; CHECK-EP-LOOP-END-NEXT: Running pass: NoOpLoopPass			; CHECK-EP-LOOP-END-NEXT: Running pass: NoOpLoopPass
	; CHECK-O-NEXT: Running pass: SROAPass on foo			; CHECK-O-NEXT: Running pass: SROAPass on foo
	; CHECK-O23SZ-NEXT: Running pass: VectorCombinePass			; CHECK-O23SZ-NEXT: Running pass: VectorCombinePass
	; CHECK-O23SZ-NEXT: Running pass: MergedLoadStoreMotionPass			; CHECK-O23SZ-NEXT: Running pass: MergedLoadStoreMotionPass
	; CHECK-O23SZ-NEXT: Running pass: GVNPass			; CHECK-O23SZ-NEXT: Running pass: GVNPass
	; CHECK-O23SZ-NEXT: Running analysis: MemoryDependenceAnalysis			; CHECK-O23SZ-NEXT: Running analysis: MemoryDependenceAnalysis
	; CHECK-O1-NEXT: Running pass: MemCpyOptPass			; CHECK-O1-NEXT: Running pass: MemCpyOptPass
				; CHECK-O1-NEXT: Running analysis: PostDominatorTreeAnalysis
	; CHECK-O-NEXT: Running pass: SCCPPass			; CHECK-O-NEXT: Running pass: SCCPPass
	; CHECK-O-NEXT: Running pass: BDCEPass			; CHECK-O-NEXT: Running pass: BDCEPass
	; CHECK-O-NEXT: Running analysis: DemandedBitsAnalysis			; CHECK-O-NEXT: Running analysis: DemandedBitsAnalysis
	; CHECK-O-NEXT: Running pass: InstCombinePass			; CHECK-O-NEXT: Running pass: InstCombinePass
	; CHECK-EP-PEEPHOLE-NEXT: Running pass: NoOpFunctionPass			; CHECK-EP-PEEPHOLE-NEXT: Running pass: NoOpFunctionPass
	; CHECK-O23SZ-NEXT: Running pass: JumpThreadingPass			; CHECK-O23SZ-NEXT: Running pass: JumpThreadingPass
	; CHECK-O23SZ-NEXT: Running analysis: LazyValueAnalysis			; CHECK-O23SZ-NEXT: Running analysis: LazyValueAnalysis
	; CHECK-O23SZ-NEXT: Running pass: CorrelatedValuePropagationPass			; CHECK-O23SZ-NEXT: Running pass: CorrelatedValuePropagationPass
	; CHECK-O23SZ-NEXT: Invalidating analysis: LazyValueAnalysis			; CHECK-O23SZ-NEXT: Invalidating analysis: LazyValueAnalysis
	; CHECK-O1-NEXT: Running pass: CoroElidePass			; CHECK-O1-NEXT: Running pass: CoroElidePass
	; CHECK-O-NEXT: Running pass: ADCEPass			; CHECK-O-NEXT: Running pass: ADCEPass
	; CHECK-O-NEXT: Running analysis: PostDominatorTreeAnalysis			; CHECK-O23SZ-NEXT: Running analysis: PostDominatorTreeAnalysis
	; CHECK-O23SZ-NEXT: Running pass: MemCpyOptPass			; CHECK-O23SZ-NEXT: Running pass: MemCpyOptPass
	; CHECK-O23SZ-NEXT: Running pass: DSEPass			; CHECK-O23SZ-NEXT: Running pass: DSEPass
	; CHECK-O23SZ-NEXT: Running pass: LoopSimplifyPass			; CHECK-O23SZ-NEXT: Running pass: LoopSimplifyPass
	; CHECK-O23SZ-NEXT: Running pass: LCSSAPass			; CHECK-O23SZ-NEXT: Running pass: LCSSAPass
	; CHECK-O23SZ-NEXT: Running pass: LICMPass			; CHECK-O23SZ-NEXT: Running pass: LICMPass
	; CHECK-O23SZ-NEXT: Running pass: CoroElidePass			; CHECK-O23SZ-NEXT: Running pass: CoroElidePass
	; CHECK-EP-SCALAR-LATE-NEXT: Running pass: NoOpFunctionPass			; CHECK-EP-SCALAR-LATE-NEXT: Running pass: NoOpFunctionPass
	; CHECK-O-NEXT: Running pass: SimplifyCFGPass			; CHECK-O-NEXT: Running pass: SimplifyCFGPass
	▲ Show 20 Lines • Show All 98 Lines • Show Last 20 Lines

llvm/test/Other/new-pm-lto-defaults.ll

	Show First 20 Lines • Show All 104 Lines • ▼ Show 20 Lines
	; CHECK-O23SZ-NEXT: Running analysis: MemorySSAAnalysis on foo			; CHECK-O23SZ-NEXT: Running analysis: MemorySSAAnalysis on foo
	; CHECK-O23SZ-NEXT: Running analysis: AAManager on foo			; CHECK-O23SZ-NEXT: Running analysis: AAManager on foo
	; CHECK-O23SZ-NEXT: Running analysis: ScalarEvolutionAnalysis on foo			; CHECK-O23SZ-NEXT: Running analysis: ScalarEvolutionAnalysis on foo
	; CHECK-O23SZ-NEXT: Running analysis: InnerAnalysisManagerProxy			; CHECK-O23SZ-NEXT: Running analysis: InnerAnalysisManagerProxy
	; CHECK-O23SZ-NEXT: Running pass: LICMPass on loop			; CHECK-O23SZ-NEXT: Running pass: LICMPass on loop
	; CHECK-O23SZ-NEXT: Running pass: GVNPass on foo			; CHECK-O23SZ-NEXT: Running pass: GVNPass on foo
	; CHECK-O23SZ-NEXT: Running analysis: MemoryDependenceAnalysis on foo			; CHECK-O23SZ-NEXT: Running analysis: MemoryDependenceAnalysis on foo
	; CHECK-O23SZ-NEXT: Running pass: MemCpyOptPass on foo			; CHECK-O23SZ-NEXT: Running pass: MemCpyOptPass on foo
	; CHECK-O23SZ-NEXT: Running pass: DSEPass on foo
	; CHECK-O23SZ-NEXT: Running analysis: PostDominatorTreeAnalysis on foo			; CHECK-O23SZ-NEXT: Running analysis: PostDominatorTreeAnalysis on foo
				; CHECK-O23SZ-NEXT: Running pass: DSEPass on foo
	; CHECK-O23SZ-NEXT: Running pass: MergedLoadStoreMotionPass on foo			; CHECK-O23SZ-NEXT: Running pass: MergedLoadStoreMotionPass on foo
	; CHECK-O23SZ-NEXT: Running pass: LoopSimplifyPass on foo			; CHECK-O23SZ-NEXT: Running pass: LoopSimplifyPass on foo
	; CHECK-O23SZ-NEXT: Running pass: LCSSAPass on foo			; CHECK-O23SZ-NEXT: Running pass: LCSSAPass on foo
	; CHECK-O23SZ-NEXT: Running pass: IndVarSimplifyPass on loop			; CHECK-O23SZ-NEXT: Running pass: IndVarSimplifyPass on loop
	; CHECK-O23SZ-NEXT: Running pass: LoopDeletionPass on loop			; CHECK-O23SZ-NEXT: Running pass: LoopDeletionPass on loop
	; CHECK-O23SZ-NEXT: Running pass: LoopFullUnrollPass on loop			; CHECK-O23SZ-NEXT: Running pass: LoopFullUnrollPass on loop
	; CHECK-O23SZ-NEXT: Running pass: LoopDistributePass on foo			; CHECK-O23SZ-NEXT: Running pass: LoopDistributePass on foo
	; CHECK-O23SZ-NEXT: Running analysis: LoopAccessAnalysis on foo			; CHECK-O23SZ-NEXT: Running analysis: LoopAccessAnalysis on foo
	▲ Show 20 Lines • Show All 59 Lines • Show Last 20 Lines

llvm/test/Other/new-pm-thinlto-defaults.ll

	Show First 20 Lines • Show All 157 Lines • ▼ Show 20 Lines
	; CHECK-O-NEXT: Running pass: LoopDeletionPass			; CHECK-O-NEXT: Running pass: LoopDeletionPass
	; CHECK-O-NEXT: Running pass: LoopFullUnrollPass			; CHECK-O-NEXT: Running pass: LoopFullUnrollPass
	; CHECK-O-NEXT: Running pass: SROAPass on foo			; CHECK-O-NEXT: Running pass: SROAPass on foo
	; CHECK-O23SZ-NEXT: Running pass: VectorCombinePass			; CHECK-O23SZ-NEXT: Running pass: VectorCombinePass
	; CHECK-O23SZ-NEXT: Running pass: MergedLoadStoreMotionPass			; CHECK-O23SZ-NEXT: Running pass: MergedLoadStoreMotionPass
	; CHECK-O23SZ-NEXT: Running pass: GVNPass			; CHECK-O23SZ-NEXT: Running pass: GVNPass
	; CHECK-O23SZ-NEXT: Running analysis: MemoryDependenceAnalysis			; CHECK-O23SZ-NEXT: Running analysis: MemoryDependenceAnalysis
	; CHECK-O1-NEXT: Running pass: MemCpyOptPass			; CHECK-O1-NEXT: Running pass: MemCpyOptPass
				; CHECK-O1-NEXT: Running analysis: PostDominatorTreeAnalysis
	; CHECK-O-NEXT: Running pass: SCCPPass			; CHECK-O-NEXT: Running pass: SCCPPass
	; CHECK-O-NEXT: Running pass: BDCEPass			; CHECK-O-NEXT: Running pass: BDCEPass
	; CHECK-O-NEXT: Running analysis: DemandedBitsAnalysis			; CHECK-O-NEXT: Running analysis: DemandedBitsAnalysis
	; CHECK-O-NEXT: Running pass: InstCombinePass			; CHECK-O-NEXT: Running pass: InstCombinePass
	; CHECK-O23SZ-NEXT: Running pass: JumpThreadingPass			; CHECK-O23SZ-NEXT: Running pass: JumpThreadingPass
	; CHECK-O23SZ-NEXT: Running analysis: LazyValueAnalysis			; CHECK-O23SZ-NEXT: Running analysis: LazyValueAnalysis
	; CHECK-O23SZ-NEXT: Running pass: CorrelatedValuePropagationPass			; CHECK-O23SZ-NEXT: Running pass: CorrelatedValuePropagationPass
	; CHECK-O23SZ-NEXT: Invalidating analysis: LazyValueAnalysis			; CHECK-O23SZ-NEXT: Invalidating analysis: LazyValueAnalysis
	; CHECK-O1-NEXT: Running pass: CoroElidePass			; CHECK-O1-NEXT: Running pass: CoroElidePass
	; CHECK-O-NEXT: Running pass: ADCEPass			; CHECK-O-NEXT: Running pass: ADCEPass
	; CHECK-O-NEXT: Running analysis: PostDominatorTreeAnalysis			; CHECK-O23SZ-NEXT: Running analysis: PostDominatorTreeAnalysis
	; CHECK-O23SZ-NEXT: Running pass: MemCpyOptPass			; CHECK-O23SZ-NEXT: Running pass: MemCpyOptPass
	; CHECK-O23SZ-NEXT: Running pass: DSEPass			; CHECK-O23SZ-NEXT: Running pass: DSEPass
	; CHECK-O23SZ-NEXT: Running pass: LoopSimplifyPass			; CHECK-O23SZ-NEXT: Running pass: LoopSimplifyPass
	; CHECK-O23SZ-NEXT: Running pass: LCSSAPass			; CHECK-O23SZ-NEXT: Running pass: LCSSAPass
	; CHECK-O23SZ-NEXT: Running pass: LICMPass on loop			; CHECK-O23SZ-NEXT: Running pass: LICMPass on loop
	; CHECK-O23SZ-NEXT: Running pass: CoroElidePass			; CHECK-O23SZ-NEXT: Running pass: CoroElidePass
	; CHECK-O-NEXT: Running pass: SimplifyCFGPass			; CHECK-O-NEXT: Running pass: SimplifyCFGPass
	; CHECK-O-NEXT: Running pass: InstCombinePass			; CHECK-O-NEXT: Running pass: InstCombinePass
	▲ Show 20 Lines • Show All 90 Lines • Show Last 20 Lines

llvm/test/Transforms/MemCpyOpt/callslot.ll

	; NOTE: Assertions have been autogenerated by utils/update_test_checks.py			; NOTE: Assertions have been autogenerated by utils/update_test_checks.py
	; RUN: opt -S -passes=memcpyopt < %s -verify-memoryssa \| FileCheck %s			; RUN: opt -S -passes=memcpyopt -memcpyopt-stack-move-threshold=0 < %s -verify-memoryssa \| FileCheck %s

	define i8 @read_dest_between_call_and_memcpy() {			define i8 @read_dest_between_call_and_memcpy() {
	; CHECK-LABEL: @read_dest_between_call_and_memcpy(			; CHECK-LABEL: @read_dest_between_call_and_memcpy(
	; CHECK-NEXT: [[DEST:%.*]] = alloca [16 x i8], align 1			; CHECK-NEXT: [[DEST:%.*]] = alloca [16 x i8], align 1
	; CHECK-NEXT: [[SRC:%.*]] = alloca [16 x i8], align 1			; CHECK-NEXT: [[SRC:%.*]] = alloca [16 x i8], align 1
	; CHECK-NEXT: store i8 1, ptr [[DEST]], align 1			; CHECK-NEXT: store i8 1, ptr [[DEST]], align 1
	; CHECK-NEXT: call void @llvm.memset.p0.i64(ptr [[SRC]], i8 0, i64 16, i1 false)			; CHECK-NEXT: call void @llvm.memset.p0.i64(ptr [[SRC]], i8 0, i64 16, i1 false)
	; CHECK-NEXT: [[X:%.*]] = load i8, ptr [[DEST]], align 1			; CHECK-NEXT: [[X:%.*]] = load i8, ptr [[DEST]], align 1
	▲ Show 20 Lines • Show All 207 Lines • Show Last 20 Lines

llvm/test/Transforms/MemCpyOpt/callslot_badaa.ll

	; RUN: opt < %s -S -passes=memcpyopt \| FileCheck --match-full-lines %s			; RUN: opt < %s -S -passes=memcpyopt -memcpyopt-stack-move-threshold=0 \| FileCheck --match-full-lines %s

	; Make sure callslot optimization merges alias.scope metadata correctly when it merges instructions.			; Make sure callslot optimization merges alias.scope metadata correctly when it merges instructions.
	; Merging here naively generates:			; Merging here naively generates:
	; call void @llvm.memcpy.p0.p0.i64(ptr align 8 %dst, ptr align 8 %src, i64 1, i1 false), !alias.scope !3			; call void @llvm.memcpy.p0.p0.i64(ptr align 8 %dst, ptr align 8 %src, i64 1, i1 false), !alias.scope !3
	; call void @llvm.lifetime.end.p0(i64 8, ptr nonnull %src), !noalias !0			; call void @llvm.lifetime.end.p0(i64 8, ptr nonnull %src), !noalias !0
	; ...			; ...
	; !0 = !{!1}			; !0 = !{!1}
	; !1 = distinct !{!1, !2, !"callee1: %a"}			; !1 = distinct !{!1, !2, !"callee1: %a"}
	Show All 30 Lines

llvm/test/Transforms/MemCpyOpt/stack-move.ll

This file was added.

				; NOTE: Assertions have been autogenerated by utils/update_test_checks.py
				; Tests that the stack-move optimization functions properly.
				;
				; RUN: opt -passes=memcpyopt -memcpyopt-stack-move-threshold=8 -S < %s \| FileCheck %s

				%struct.Foo = type { i32, i32, i32 }

				@constant = private unnamed_addr constant %struct.Foo { i32 1, i32 2, i32 3 }, align 4

				; Optimization successes follow:

				; Tests that the optimization succeeds with a basic call to memcpy.
				define void @basic_memcpy() {
				; CHECK-LABEL: @basic_memcpy(
				; CHECK-NEXT: [[SRC:%.]] = alloca [[STRUCT_FOO:%.]], align 4
				; CHECK-NEXT: [[DEST:%.*]] = alloca [[STRUCT_FOO]], align 4
				; CHECK-NEXT: call void @llvm.lifetime.start.p0(i64 12, ptr [[SRC]])
				; CHECK-NEXT: call void @llvm.memcpy.p0.p0.i64(ptr align 4 [[SRC]], ptr align 4 @constant, i64 12, i1 false)
				; CHECK-NEXT: [[TMP1:%.*]] = call i32 @use_nocapture(ptr nocapture noundef [[SRC]])
				; CHECK-NEXT: [[TMP2:%.*]] = call i32 @use_nocapture(ptr nocapture noundef [[SRC]])
				; CHECK-NEXT: call void @llvm.lifetime.end.p0(i64 12, ptr [[SRC]])
				; CHECK-NEXT: ret void
				;
				%src = alloca %struct.Foo, align 4
				%dest = alloca %struct.Foo, align 4
				call void @llvm.lifetime.start.p0(i64 12, ptr nocapture %src)
				call void @llvm.lifetime.start.p0(i64 12, ptr nocapture %dest)
				call void @llvm.memcpy.p0.p0.i64(ptr align 4 %src, ptr align 4 @constant, i64 12, i1 false)
				%1 = call i32 @use_nocapture(ptr noundef nocapture %src)
				call void @llvm.memcpy.p0.p0.i64(ptr align 4 %dest, ptr align 4 %src, i64 12, i1 false)
				call void @llvm.lifetime.end.p0(i64 12, ptr nocapture %src)
				%2 = call i32 @use_nocapture(ptr noundef nocapture %dest)
				call void @llvm.lifetime.end.p0(i64 12, ptr nocapture %dest)
				ret void
				}

				; Tests that the optimization succeeds with a basic call to memmove.
				arsenmUnsubmitted Done Reply Inline Actions Add a test with a volatile memcpy. Also one that shows metadata that was attached to these arsenm: Add a test with a volatile memcpy. Also one that shows metadata that was attached to these
				pcwaltonAuthorUnsubmitted Done Reply Inline Actions I'm not quite sure what you meant by metadata that was attached, but I added a test `remove_alloca_metadata` that ensures that we strip metadata properly. pcwalton: I'm not quite sure what you meant by metadata that was attached, but I added a test…
				define void @basic_memmove() {
				; CHECK-LABEL: @basic_memmove(
				; CHECK-NEXT: [[SRC:%.]] = alloca [[STRUCT_FOO:%.]], align 4
				; CHECK-NEXT: [[DEST:%.*]] = alloca [[STRUCT_FOO]], align 4
				; CHECK-NEXT: call void @llvm.lifetime.start.p0(i64 12, ptr [[SRC]])
				; CHECK-NEXT: call void @llvm.memcpy.p0.p0.i64(ptr align 4 [[SRC]], ptr align 4 @constant, i64 12, i1 false)
				; CHECK-NEXT: [[TMP1:%.*]] = call i32 @use_nocapture(ptr nocapture noundef [[SRC]])
				; CHECK-NEXT: [[TMP2:%.*]] = call i32 @use_nocapture(ptr nocapture noundef [[SRC]])
				; CHECK-NEXT: call void @llvm.lifetime.end.p0(i64 12, ptr [[SRC]])
				; CHECK-NEXT: ret void
				;
				%src = alloca %struct.Foo, align 4
				%dest = alloca %struct.Foo, align 4
				call void @llvm.lifetime.start.p0(i64 12, ptr nocapture %src)
				call void @llvm.lifetime.start.p0(i64 12, ptr nocapture %dest)
				call void @llvm.memmove.p0.p0.i64(ptr align 4 %src, ptr align 4 @constant, i64 12, i1 false)
				%1 = call i32 @use_nocapture(ptr noundef nocapture %src)
				call void @llvm.memmove.p0.p0.i64(ptr align 4 %dest, ptr align 4 %src, i64 12, i1 false)
				call void @llvm.lifetime.end.p0(i64 12, ptr nocapture %src)
				%2 = call i32 @use_nocapture(ptr noundef nocapture %dest)
				call void @llvm.lifetime.end.p0(i64 12, ptr nocapture %dest)
				ret void
				}

				; Tests that the optimization succeeds with a load/store pair.
				define void @load_store() {
				; CHECK-LABEL: @load_store(
				; CHECK-NEXT: [[SRC:%.*]] = alloca i32, align 4
				; CHECK-NEXT: [[DEST:%.*]] = alloca i32, align 4
				; CHECK-NEXT: call void @llvm.lifetime.start.p0(i64 4, ptr [[SRC]])
				; CHECK-NEXT: store i32 42, ptr [[SRC]], align 4
				; CHECK-NEXT: [[TMP1:%.*]] = call i32 @use_nocapture(ptr nocapture noundef [[SRC]])
				; CHECK-NEXT: [[TMP2:%.*]] = call i32 @use_nocapture(ptr nocapture noundef [[SRC]])
				; CHECK-NEXT: call void @llvm.lifetime.end.p0(i64 4, ptr [[SRC]])
				; CHECK-NEXT: ret void
				;
				%src = alloca i32, align 4
				%dest = alloca i32, align 4
				call void @llvm.lifetime.start.p0(i64 4, ptr nocapture %src)
				call void @llvm.lifetime.start.p0(i64 4, ptr nocapture %dest)
				store i32 42, ptr %src
				%1 = call i32 @use_nocapture(ptr noundef nocapture %src)
				%2 = load i32, ptr %src
				store i32 %2, ptr %dest
				call void @llvm.lifetime.end.p0(i64 4, ptr nocapture %src)
				%3 = call i32 @use_nocapture(ptr noundef nocapture %dest)
				call void @llvm.lifetime.end.p0(i64 4, ptr nocapture %dest)
				ret void
				}

				; Tests that merging two allocas with different alignments results in an
				; alloca with the broader alignment.
				define void @align_up() {
				; CHECK-LABEL: @align_up(
				; CHECK-NEXT: [[SRC:%.]] = alloca [[STRUCT_FOO:%.]], align 8
				; CHECK-NEXT: [[DEST:%.*]] = alloca [[STRUCT_FOO]], align 4
				; CHECK-NEXT: call void @llvm.lifetime.start.p0(i64 12, ptr [[SRC]])
				; CHECK-NEXT: call void @llvm.memcpy.p0.p0.i64(ptr align 4 [[SRC]], ptr align 4 @constant, i64 12, i1 false)
				; CHECK-NEXT: [[TMP1:%.*]] = call i32 @use_nocapture(ptr nocapture noundef [[SRC]])
				; CHECK-NEXT: [[TMP2:%.*]] = call i32 @use_nocapture(ptr nocapture noundef [[SRC]])
				; CHECK-NEXT: call void @llvm.lifetime.end.p0(i64 12, ptr [[SRC]])
				; CHECK-NEXT: ret void
				;
				%src = alloca %struct.Foo, align 8
				%dest = alloca %struct.Foo, align 4
				call void @llvm.lifetime.start.p0(i64 12, ptr nocapture %src)
				call void @llvm.lifetime.start.p0(i64 12, ptr nocapture %dest)
				call void @llvm.memcpy.p0.p0.i64(ptr align 4 %src, ptr align 4 @constant, i64 12, i1 false)
				%1 = call i32 @use_nocapture(ptr noundef nocapture %src)
				call void @llvm.memcpy.p0.p0.i64(ptr align 4 %dest, ptr align 4 %src, i64 12, i1 false)
				call void @llvm.lifetime.end.p0(i64 12, ptr nocapture %src)
				%2 = call i32 @use_nocapture(ptr noundef nocapture %dest)
				call void @llvm.lifetime.end.p0(i64 12, ptr nocapture %dest)
				ret void
				}

				; Tests that we correctly remove extra lifetime intrinsics when performing the
				; optimization.
				define void @remove_extra_lifetime_intrinsics() {
				; CHECK-LABEL: @remove_extra_lifetime_intrinsics(
				; CHECK-NEXT: [[SRC:%.]] = alloca [[STRUCT_FOO:%.]], align 4
				; CHECK-NEXT: [[DEST:%.*]] = alloca [[STRUCT_FOO]], align 4
				; CHECK-NEXT: call void @llvm.lifetime.start.p0(i64 12, ptr [[SRC]])
				; CHECK-NEXT: call void @llvm.memcpy.p0.p0.i64(ptr align 4 [[SRC]], ptr align 4 @constant, i64 12, i1 false)
				; CHECK-NEXT: [[TMP1:%.*]] = call i32 @use_nocapture(ptr nocapture noundef [[SRC]])
				; CHECK-NEXT: [[TMP2:%.*]] = call i32 @use_nocapture(ptr nocapture noundef [[SRC]])
				; CHECK-NEXT: [[TMP3:%.*]] = call i32 @use_nocapture(ptr nocapture noundef [[SRC]])
				; CHECK-NEXT: call void @llvm.lifetime.end.p0(i64 12, ptr [[SRC]])
				; CHECK-NEXT: ret void
				;
				%src = alloca %struct.Foo, align 4
				%dest = alloca %struct.Foo, align 4
				call void @llvm.lifetime.start.p0(i64 12, ptr nocapture %src)
				call void @llvm.lifetime.start.p0(i64 12, ptr nocapture %dest)
				call void @llvm.memcpy.p0.p0.i64(ptr align 4 %src, ptr align 4 @constant, i64 12, i1 false)
				%1 = call i32 @use_nocapture(ptr noundef nocapture %src)
				call void @llvm.memcpy.p0.p0.i64(ptr align 4 %dest, ptr align 4 %src, i64 12, i1 false)
				call void @llvm.lifetime.end.p0(i64 12, ptr nocapture %src)
				%2 = call i32 @use_nocapture(ptr noundef nocapture %dest)
				call void @llvm.lifetime.end.p0(i64 12, ptr nocapture %src)
				call void @llvm.lifetime.start.p0(i64 12, ptr nocapture %src)
				%3 = call i32 @use_nocapture(ptr noundef nocapture %dest)
				call void @llvm.lifetime.end.p0(i64 12, ptr nocapture %src)
				call void @llvm.lifetime.start.p0(i64 12, ptr nocapture %dest)
				call void @llvm.lifetime.end.p0(i64 12, ptr nocapture %dest)
				ret void
				}

				; Tests that we remove scoped noalias metadata from a call.
				define void @remove_scoped_noalias() {
				; CHECK-LABEL: @remove_scoped_noalias(
				; CHECK-NEXT: [[SRC:%.]] = alloca [[STRUCT_FOO:%.]], align 4
				; CHECK-NEXT: [[DEST:%.*]] = alloca [[STRUCT_FOO]], align 4
				; CHECK-NEXT: call void @llvm.lifetime.start.p0(i64 12, ptr [[SRC]])
				; CHECK-NEXT: call void @llvm.memcpy.p0.p0.i64(ptr align 4 [[SRC]], ptr align 4 @constant, i64 12, i1 false)
				; CHECK-NEXT: [[TMP1:%.*]] = call i32 @use_nocapture(ptr nocapture noundef [[SRC]]), !alias.scope !0
				; CHECK-NEXT: [[TMP2:%.*]] = call i32 @use_nocapture(ptr nocapture noundef [[SRC]])
				; CHECK-NEXT: call void @llvm.lifetime.end.p0(i64 12, ptr [[SRC]])
				; CHECK-NEXT: ret void
				;
				%src = alloca %struct.Foo, align 4
				%dest = alloca %struct.Foo, align 4
				call void @llvm.lifetime.start.p0(i64 12, ptr nocapture %src)
				call void @llvm.lifetime.start.p0(i64 12, ptr nocapture %dest)
				call void @llvm.memcpy.p0.p0.i64(ptr align 4 %src, ptr align 4 @constant, i64 12, i1 false)
				%1 = call i32 @use_nocapture(ptr noundef nocapture %src), !alias.scope !2
				call void @llvm.memcpy.p0.p0.i64(ptr align 4 %dest, ptr align 4 %src, i64 12, i1 false)
				call void @llvm.lifetime.end.p0(i64 12, ptr nocapture %src)
				%2 = call i32 @use_nocapture(ptr noundef nocapture %dest), !noalias !2
				call void @llvm.lifetime.end.p0(i64 12, ptr nocapture %dest)
				ret void
				}

				; Tests that we remove metadata on the merged alloca.
				define void @remove_alloca_metadata() {
				; CHECK-LABEL: @remove_alloca_metadata(
				; CHECK-NEXT: [[SRC:%.]] = alloca [[STRUCT_FOO:%.]], align 4
				; CHECK-NEXT: [[DEST:%.*]] = alloca [[STRUCT_FOO]], align 4
				; CHECK-NEXT: call void @llvm.lifetime.start.p0(i64 12, ptr [[SRC]])
				; CHECK-NEXT: call void @llvm.memcpy.p0.p0.i64(ptr align 4 [[SRC]], ptr align 4 @constant, i64 12, i1 false)
				; CHECK-NEXT: [[TMP1:%.*]] = call i32 @use_nocapture(ptr nocapture noundef [[SRC]]), !alias.scope !0
				; CHECK-NEXT: [[TMP2:%.*]] = call i32 @use_nocapture(ptr nocapture noundef [[SRC]])
				; CHECK-NEXT: call void @llvm.lifetime.end.p0(i64 12, ptr [[SRC]])
				; CHECK-NEXT: ret void
				;
				%src = alloca %struct.Foo, align 4, !annotation !3
				%dest = alloca %struct.Foo, align 4
				call void @llvm.lifetime.start.p0(i64 12, ptr nocapture %src)
				call void @llvm.lifetime.start.p0(i64 12, ptr nocapture %dest)
				call void @llvm.memcpy.p0.p0.i64(ptr align 4 %src, ptr align 4 @constant, i64 12, i1 false)
				%1 = call i32 @use_nocapture(ptr noundef nocapture %src), !alias.scope !2
				call void @llvm.memcpy.p0.p0.i64(ptr align 4 %dest, ptr align 4 %src, i64 12, i1 false)
				call void @llvm.lifetime.end.p0(i64 12, ptr nocapture %src)
				%2 = call i32 @use_nocapture(ptr noundef nocapture %dest), !noalias !2
				call void @llvm.lifetime.end.p0(i64 12, ptr nocapture %dest)
				ret void
				}

				; Tests that we correctly "shrinkwrap" lifetime intrinsics to the nearest
				; common dominator and postdominator when performing the optimization.
				define void @shrinkwrap_lifetimes() {
				; CHECK-LABEL: @shrinkwrap_lifetimes(
				; CHECK-NEXT: [[SRC:%.]] = alloca [[STRUCT_FOO:%.]], align 4
				; CHECK-NEXT: [[DEST:%.*]] = alloca [[STRUCT_FOO]], align 4
				; CHECK-NEXT: call void @llvm.lifetime.start.p0(i64 12, ptr [[SRC]])
				; CHECK-NEXT: [[TMP1:%.*]] = call i1 @cond()
				; CHECK-NEXT: br i1 [[TMP1]], label [[BB0:%.]], label [[BB1:%.]]
				; CHECK: bb0:
				; CHECK-NEXT: call void @llvm.memcpy.p0.p0.i64(ptr align 4 [[SRC]], ptr align 4 @constant, i64 12, i1 false)
				; CHECK-NEXT: br label [[BB2:%.*]]
				; CHECK: bb1:
				; CHECK-NEXT: call void @llvm.memcpy.p0.p0.i64(ptr align 4 [[SRC]], ptr align 4 @constant, i64 12, i1 false)
				; CHECK-NEXT: br label [[BB2]]
				; CHECK: bb2:
				; CHECK-NEXT: [[TMP2:%.*]] = call i32 @use_nocapture(ptr nocapture noundef [[SRC]])
				; CHECK-NEXT: [[TMP3:%.*]] = call i1 @cond()
				; CHECK-NEXT: br i1 [[TMP3]], label [[BB3:%.]], label [[BB4:%.]]
				; CHECK: bb3:
				; CHECK-NEXT: [[TMP4:%.*]] = call i32 @use_nocapture(ptr nocapture noundef [[SRC]])
				; CHECK-NEXT: br label [[BB5:%.*]]
				; CHECK: bb4:
				; CHECK-NEXT: [[TMP5:%.*]] = call i32 @use_nocapture(ptr nocapture noundef [[SRC]])
				; CHECK-NEXT: br label [[BB5]]
				; CHECK: bb5:
				; CHECK-NEXT: call void @llvm.lifetime.end.p0(i64 12, ptr [[SRC]])
				; CHECK-NEXT: ret void
				;
				%src = alloca %struct.Foo, align 4
				%dest = alloca %struct.Foo, align 4
				%1 = call i1 @cond()
				br i1 %1, label %bb0, label %bb1

				bb0:
				call void @llvm.lifetime.start.p0(i64 12, ptr nocapture %src)
				call void @llvm.memcpy.p0.p0.i64(ptr align 4 %src, ptr align 4 @constant, i64 12, i1 false)
				br label %bb2

				bb1:
				call void @llvm.lifetime.start.p0(i64 12, ptr nocapture %src)
				call void @llvm.memcpy.p0.p0.i64(ptr align 4 %src, ptr align 4 @constant, i64 12, i1 false)
				br label %bb2

				bb2:
				%2 = call i32 @use_nocapture(ptr noundef nocapture %src)
				call void @llvm.lifetime.start.p0(i64 12, ptr nocapture %dest)
				call void @llvm.memcpy.p0.p0.i64(ptr align 4 %dest, ptr align 4 %src, i64 12, i1 false)
				%3 = call i1 @cond()
				br i1 %3, label %bb3, label %bb4

				bb3:
				call void @llvm.lifetime.end.p0(i64 12, ptr nocapture %src)
				%4 = call i32 @use_nocapture(ptr noundef nocapture %dest)
				call void @llvm.lifetime.end.p0(i64 12, ptr nocapture %dest)
				br label %bb5

				bb4:
				call void @llvm.lifetime.end.p0(i64 12, ptr nocapture %src)
				%5 = call i32 @use_nocapture(ptr noundef nocapture %dest)
				call void @llvm.lifetime.end.p0(i64 12, ptr nocapture %dest)
				br label %bb5

				bb5:
				ret void
				}

				; Tests that GEP doesn't count as a use for the purposes of liveness analysis.
				define void @gep_isnt_a_use() {
				; CHECK-LABEL: @gep_isnt_a_use(
				; CHECK-NEXT: [[SRC:%.]] = alloca [[STRUCT_FOO:%.]], align 4
				; CHECK-NEXT: [[DEST:%.*]] = alloca [[STRUCT_FOO]], align 4
				; CHECK-NEXT: call void @llvm.lifetime.start.p0(i64 12, ptr [[SRC]])
				; CHECK-NEXT: call void @llvm.memcpy.p0.p0.i64(ptr align 4 [[SRC]], ptr align 4 @constant, i64 12, i1 false)
				; CHECK-NEXT: [[TMP1:%.*]] = call i32 @use_nocapture(ptr nocapture noundef [[SRC]])
				; CHECK-NEXT: [[TMP2:%.*]] = getelementptr [[STRUCT_FOO]], ptr [[SRC]], i32 0, i32 0
				; CHECK-NEXT: [[TMP3:%.*]] = call i32 @use_nocapture(ptr nocapture noundef [[SRC]])
				; CHECK-NEXT: call void @llvm.lifetime.end.p0(i64 12, ptr [[SRC]])
				; CHECK-NEXT: ret void
				;
				%src = alloca %struct.Foo, align 4
				%dest = alloca %struct.Foo, align 4
				call void @llvm.lifetime.start.p0(i64 12, ptr nocapture %src)
				call void @llvm.lifetime.start.p0(i64 12, ptr nocapture %dest)
				call void @llvm.memcpy.p0.p0.i64(ptr align 4 %src, ptr align 4 @constant, i64 12, i1 false)
				%1 = call i32 @use_nocapture(ptr noundef nocapture %src)
				call void @llvm.memcpy.p0.p0.i64(ptr align 4 %dest, ptr align 4 %src, i64 12, i1 false)
				%2 = getelementptr %struct.Foo, ptr %src, i32 0, i32 0
				call void @llvm.lifetime.end.p0(i64 12, ptr nocapture %src)
				%3 = call i32 @use_nocapture(ptr noundef nocapture %dest)
				call void @llvm.lifetime.end.p0(i64 12, ptr nocapture %dest)
				ret void
				}

				; Tests that a memcpy that completely overwrites a stack value is a definition
				; for the purposes of liveness analysis.
				define void @memcpy_is_def() {
				; CHECK-LABEL: @memcpy_is_def(
				; CHECK-NEXT: [[SRC:%.]] = alloca [[STRUCT_FOO:%.]], align 4
				; CHECK-NEXT: [[DEST:%.*]] = alloca [[STRUCT_FOO]], align 4
				; CHECK-NEXT: call void @llvm.lifetime.start.p0(i64 12, ptr [[SRC]])
				; CHECK-NEXT: call void @llvm.memcpy.p0.p0.i64(ptr align 4 [[SRC]], ptr align 4 @constant, i64 12, i1 false)
				; CHECK-NEXT: [[TMP1:%.*]] = call i32 @use_nocapture(ptr nocapture noundef [[SRC]])
				; CHECK-NEXT: [[TMP2:%.*]] = call i32 @use_nocapture(ptr nocapture noundef [[SRC]])
				; CHECK-NEXT: call void @llvm.memcpy.p0.p0.i64(ptr align 4 [[SRC]], ptr align 4 @constant, i64 12, i1 false)
				; CHECK-NEXT: [[TMP3:%.*]] = call i32 @use_nocapture(ptr nocapture noundef [[SRC]])
				; CHECK-NEXT: call void @llvm.lifetime.end.p0(i64 12, ptr [[SRC]])
				; CHECK-NEXT: ret void
				;
				%src = alloca %struct.Foo, align 4
				%dest = alloca %struct.Foo, align 4
				call void @llvm.lifetime.start.p0(i64 12, ptr nocapture %src)
				call void @llvm.lifetime.start.p0(i64 12, ptr nocapture %dest)
				call void @llvm.memcpy.p0.p0.i64(ptr align 4 %src, ptr align 4 @constant, i64 12, i1 false)
				%1 = call i32 @use_nocapture(ptr noundef nocapture %src)
				call void @llvm.memcpy.p0.p0.i64(ptr align 4 %dest, ptr align 4 %src, i64 12, i1 false)
				%2 = call i32 @use_nocapture(ptr noundef nocapture %dest)
				call void @llvm.memcpy.p0.p0.i64(ptr align 4 %src, ptr align 4 @constant, i64 12, i1 false)
				%3 = call i32 @use_nocapture(ptr noundef nocapture %src)
				call void @llvm.lifetime.end.p0(i64 12, ptr nocapture %src)
				call void @llvm.lifetime.end.p0(i64 12, ptr nocapture %dest)
				ret void
				}

				; Tests that a memset that completely overwrites a stack value is a definition
				; for the purposes of liveness analysis.
				define void @memset_is_def() {
				; CHECK-LABEL: @memset_is_def(
				; CHECK-NEXT: [[SRC:%.]] = alloca [[STRUCT_FOO:%.]], align 4
				; CHECK-NEXT: [[DEST:%.*]] = alloca [[STRUCT_FOO]], align 4
				; CHECK-NEXT: call void @llvm.lifetime.start.p0(i64 12, ptr [[SRC]])
				; CHECK-NEXT: call void @llvm.memcpy.p0.p0.i64(ptr align 4 [[SRC]], ptr align 4 @constant, i64 12, i1 false)
				; CHECK-NEXT: [[TMP1:%.*]] = call i32 @use_nocapture(ptr nocapture noundef [[SRC]])
				; CHECK-NEXT: [[TMP2:%.*]] = call i32 @use_nocapture(ptr nocapture noundef [[SRC]])
				; CHECK-NEXT: call void @llvm.memset.p0.i64(ptr align 4 [[SRC]], i8 42, i64 12, i1 false)
				; CHECK-NEXT: [[TMP3:%.*]] = call i32 @use_nocapture(ptr nocapture noundef [[SRC]])
				; CHECK-NEXT: call void @llvm.lifetime.end.p0(i64 12, ptr [[SRC]])
				; CHECK-NEXT: ret void
				;
				%src = alloca %struct.Foo, align 4
				%dest = alloca %struct.Foo, align 4
				call void @llvm.lifetime.start.p0(i64 12, ptr nocapture %src)
				call void @llvm.lifetime.start.p0(i64 12, ptr nocapture %dest)
				call void @llvm.memcpy.p0.p0.i64(ptr align 4 %src, ptr align 4 @constant, i64 12, i1 false)
				%1 = call i32 @use_nocapture(ptr noundef nocapture %src)
				call void @llvm.memcpy.p0.p0.i64(ptr align 4 %dest, ptr align 4 %src, i64 12, i1 false)
				%2 = call i32 @use_nocapture(ptr noundef nocapture %dest)
				call void @llvm.memset.p0.i64(ptr align 4 %src, i8 42, i64 12, i1 false)
				%3 = call i32 @use_nocapture(ptr noundef nocapture %src)
				call void @llvm.lifetime.end.p0(i64 12, ptr nocapture %src)
				call void @llvm.lifetime.end.p0(i64 12, ptr nocapture %dest)
				ret void
				}

				; Tests that a store that completely overwrites a stack value is a definition
				; for the purposes of liveness analysis.
				define void @store_is_def() {
				; CHECK-LABEL: @store_is_def(
				; CHECK-NEXT: [[SRC:%.*]] = alloca i32, align 4
				; CHECK-NEXT: [[DEST:%.*]] = alloca i32, align 4
				; CHECK-NEXT: call void @llvm.lifetime.start.p0(i64 4, ptr [[SRC]])
				; CHECK-NEXT: store i32 42, ptr [[SRC]], align 4
				; CHECK-NEXT: [[TMP1:%.*]] = call i32 @use_nocapture(ptr nocapture noundef [[SRC]])
				; CHECK-NEXT: [[TMP2:%.*]] = call i32 @use_nocapture(ptr nocapture noundef [[SRC]])
				; CHECK-NEXT: store i32 64, ptr [[SRC]], align 4
				; CHECK-NEXT: [[TMP3:%.*]] = call i32 @use_nocapture(ptr nocapture noundef [[SRC]])
				; CHECK-NEXT: call void @llvm.lifetime.end.p0(i64 4, ptr [[SRC]])
				; CHECK-NEXT: ret void
				;
				%src = alloca i32, align 4
				%dest = alloca i32, align 4
				call void @llvm.lifetime.start.p0(i64 4, ptr nocapture %src)
				call void @llvm.lifetime.start.p0(i64 4, ptr nocapture %dest)
				store i32 42, ptr %src
				%1 = call i32 @use_nocapture(ptr noundef nocapture %src)
				%2 = load i32, ptr %src
				store i32 %2, ptr %dest
				%3 = call i32 @use_nocapture(ptr noundef nocapture %dest)
				store i32 64, ptr %src
				%4 = call i32 @use_nocapture(ptr noundef nocapture %src)
				call void @llvm.lifetime.end.p0(i64 4, ptr nocapture %src)
				call void @llvm.lifetime.end.p0(i64 4, ptr nocapture %dest)
				ret void
				}

				; Optimization failures follow:

				; Tests that the optimization fails with a load/store pair that isn't
				; block-local.
				define void @global_load_store() {
				; CHECK-LABEL: @global_load_store(
				; CHECK-NEXT: [[SRC:%.*]] = alloca i32, align 4
				; CHECK-NEXT: [[DEST:%.*]] = alloca i32, align 4
				; CHECK-NEXT: call void @llvm.lifetime.start.p0(i64 4, ptr nocapture [[SRC]])
				; CHECK-NEXT: call void @llvm.lifetime.start.p0(i64 4, ptr nocapture [[DEST]])
				; CHECK-NEXT: store i32 42, ptr [[SRC]], align 4
				; CHECK-NEXT: [[TMP1:%.*]] = call i32 @use_nocapture(ptr nocapture noundef [[SRC]])
				; CHECK-NEXT: [[TMP2:%.*]] = load i32, ptr [[SRC]], align 4
				; CHECK-NEXT: br label [[BB0:%.*]]
				; CHECK: bb0:
				; CHECK-NEXT: store i32 [[TMP2]], ptr [[DEST]], align 4
				; CHECK-NEXT: call void @llvm.lifetime.end.p0(i64 4, ptr nocapture [[SRC]])
				; CHECK-NEXT: [[TMP3:%.*]] = call i32 @use_nocapture(ptr nocapture noundef [[DEST]])
				; CHECK-NEXT: call void @llvm.lifetime.end.p0(i64 4, ptr nocapture [[DEST]])
				; CHECK-NEXT: ret void
				;
				%src = alloca i32, align 4
				%dest = alloca i32, align 4
				call void @llvm.lifetime.start.p0(i64 4, ptr nocapture %src)
				call void @llvm.lifetime.start.p0(i64 4, ptr nocapture %dest)
				store i32 42, ptr %src
				%1 = call i32 @use_nocapture(ptr noundef nocapture %src)
				%2 = load i32, ptr %src
				br label %bb0

				bb0:
				store i32 %2, ptr %dest
				call void @llvm.lifetime.end.p0(i64 4, ptr nocapture %src)
				%3 = call i32 @use_nocapture(ptr noundef nocapture %dest)
				call void @llvm.lifetime.end.p0(i64 4, ptr nocapture %dest)
				ret void
				}

				; Tests that dynamically-sized allocas are never merged.
				define void @dynamically_sized_alloca(i64 %i) {
				; CHECK-LABEL: @dynamically_sized_alloca(
				; CHECK-NEXT: [[SRC:%.]] = alloca i8, i64 [[I:%.]], align 4
				; CHECK-NEXT: [[DEST:%.*]] = alloca i8, i64 [[I]], align 4
				; CHECK-NEXT: call void @llvm.lifetime.start.p0(i64 -1, ptr nocapture [[SRC]])
				; CHECK-NEXT: call void @llvm.lifetime.start.p0(i64 -1, ptr nocapture [[DEST]])
				; CHECK-NEXT: call void @llvm.memcpy.p0.p0.i64(ptr align 4 [[SRC]], ptr align 4 @constant, i64 12, i1 false)
				; CHECK-NEXT: [[TMP1:%.*]] = call i32 @use_nocapture(ptr nocapture noundef [[SRC]])
				; CHECK-NEXT: call void @llvm.memcpy.p0.p0.i64(ptr align 4 [[DEST]], ptr align 4 [[SRC]], i64 12, i1 false)
				; CHECK-NEXT: call void @llvm.lifetime.end.p0(i64 -1, ptr nocapture [[SRC]])
				; CHECK-NEXT: [[TMP2:%.*]] = call i32 @use_nocapture(ptr nocapture noundef [[DEST]])
				; CHECK-NEXT: call void @llvm.lifetime.end.p0(i64 -1, ptr nocapture [[DEST]])
				; CHECK-NEXT: ret void
				;
				%src = alloca i8, i64 %i, align 4
				%dest = alloca i8, i64 %i, align 4
				call void @llvm.lifetime.start.p0(i64 -1, ptr nocapture %src)
				call void @llvm.lifetime.start.p0(i64 -1, ptr nocapture %dest)
				call void @llvm.memcpy.p0.p0.i64(ptr align 4 %src, ptr align 4 @constant, i64 12, i1 false)
				%1 = call i32 @use_nocapture(ptr noundef nocapture %src)
				call void @llvm.memcpy.p0.p0.i64(ptr align 4 %dest, ptr align 4 %src, i64 12, i1 false)
				call void @llvm.lifetime.end.p0(i64 -1, ptr nocapture %src)
				%2 = call i32 @use_nocapture(ptr noundef nocapture %dest)
				call void @llvm.lifetime.end.p0(i64 -1, ptr nocapture %dest)
				ret void
				}

				; Tests that a memcpy with a dynamic size is never optimized.
				define void @dynamically_sized_memcpy(i64 %size) {
				; CHECK-LABEL: @dynamically_sized_memcpy(
				; CHECK-NEXT: [[SRC:%.]] = alloca [[STRUCT_FOO:%.]], align 4
				; CHECK-NEXT: [[DEST:%.*]] = alloca [[STRUCT_FOO]], align 4
				; CHECK-NEXT: call void @llvm.lifetime.start.p0(i64 12, ptr nocapture [[SRC]])
				; CHECK-NEXT: call void @llvm.lifetime.start.p0(i64 12, ptr nocapture [[DEST]])
				; CHECK-NEXT: call void @llvm.memcpy.p0.p0.i64(ptr align 4 [[SRC]], ptr align 4 @constant, i64 12, i1 false)
				; CHECK-NEXT: [[TMP1:%.*]] = call i32 @use_nocapture(ptr nocapture noundef [[SRC]])
				; CHECK-NEXT: call void @llvm.memcpy.p0.p0.i64(ptr align 4 [[DEST]], ptr align 4 [[SRC]], i64 [[SIZE:%.*]], i1 false)
				; CHECK-NEXT: call void @llvm.lifetime.end.p0(i64 12, ptr nocapture [[SRC]])
				; CHECK-NEXT: [[TMP2:%.*]] = call i32 @use_nocapture(ptr nocapture noundef [[DEST]])
				; CHECK-NEXT: call void @llvm.lifetime.end.p0(i64 12, ptr nocapture [[DEST]])
				; CHECK-NEXT: ret void
				;
				%src = alloca %struct.Foo, align 4
				%dest = alloca %struct.Foo, align 4
				call void @llvm.lifetime.start.p0(i64 12, ptr nocapture %src)
				call void @llvm.lifetime.start.p0(i64 12, ptr nocapture %dest)
				call void @llvm.memcpy.p0.p0.i64(ptr align 4 %src, ptr align 4 @constant, i64 12, i1 false)
				%1 = call i32 @use_nocapture(ptr noundef nocapture %src)
				call void @llvm.memcpy.p0.p0.i64(ptr align 4 %dest, ptr align 4 %src, i64 %size, i1 false)
				call void @llvm.lifetime.end.p0(i64 12, ptr nocapture %src)
				%2 = call i32 @use_nocapture(ptr noundef nocapture %dest)
				call void @llvm.lifetime.end.p0(i64 12, ptr nocapture %dest)
				ret void
				}

				; Tests that allocas with different sizes aren't merged together.
				define void @mismatched_alloca_size() {
				; CHECK-LABEL: @mismatched_alloca_size(
				; CHECK-NEXT: [[SRC:%.*]] = alloca i8, i64 24, align 4
				; CHECK-NEXT: [[DEST:%.*]] = alloca i8, i64 12, align 4
				; CHECK-NEXT: call void @llvm.lifetime.start.p0(i64 24, ptr nocapture [[SRC]])
				; CHECK-NEXT: call void @llvm.lifetime.start.p0(i64 12, ptr nocapture [[DEST]])
				; CHECK-NEXT: call void @llvm.memcpy.p0.p0.i64(ptr align 4 [[SRC]], ptr align 4 @constant, i64 12, i1 false)
				; CHECK-NEXT: [[TMP1:%.*]] = call i32 @use_nocapture(ptr nocapture noundef [[SRC]])
				; CHECK-NEXT: call void @llvm.memcpy.p0.p0.i64(ptr align 4 [[DEST]], ptr align 4 [[SRC]], i64 12, i1 false)
				; CHECK-NEXT: call void @llvm.lifetime.end.p0(i64 24, ptr nocapture [[SRC]])
				; CHECK-NEXT: [[TMP2:%.*]] = call i32 @use_nocapture(ptr nocapture noundef [[DEST]])
				; CHECK-NEXT: call void @llvm.lifetime.end.p0(i64 12, ptr nocapture [[DEST]])
				; CHECK-NEXT: ret void
				;
				%src = alloca i8, i64 24, align 4
				%dest = alloca i8, i64 12, align 4
				call void @llvm.lifetime.start.p0(i64 24, ptr nocapture %src)
				call void @llvm.lifetime.start.p0(i64 12, ptr nocapture %dest)
				call void @llvm.memcpy.p0.p0.i64(ptr align 4 %src, ptr align 4 @constant, i64 12, i1 false)
				%1 = call i32 @use_nocapture(ptr noundef nocapture %src)
				call void @llvm.memcpy.p0.p0.i64(ptr align 4 %dest, ptr align 4 %src, i64 12, i1 false)
				call void @llvm.lifetime.end.p0(i64 24, ptr nocapture %src)
				%2 = call i32 @use_nocapture(ptr noundef nocapture %dest)
				call void @llvm.lifetime.end.p0(i64 12, ptr nocapture %dest)
				ret void
				}

				; Tests that allocas with mismatched address spaces aren't combined.
				define void @mismatched_alloca_addrspace() {
				; CHECK-LABEL: @mismatched_alloca_addrspace(
				; CHECK-NEXT: [[SRC:%.*]] = alloca i8, i64 24, align 4, addrspace(1)
				; CHECK-NEXT: [[DEST:%.*]] = alloca i8, i64 12, align 4, addrspace(2)
				; CHECK-NEXT: call void @llvm.lifetime.start.p1(i64 24, ptr addrspace(1) nocapture [[SRC]])
				; CHECK-NEXT: call void @llvm.lifetime.start.p2(i64 12, ptr addrspace(2) nocapture [[DEST]])
				; CHECK-NEXT: call void @llvm.memcpy.p1.p0.i64(ptr addrspace(1) align 4 [[SRC]], ptr align 4 @constant, i64 12, i1 false)
				; CHECK-NEXT: [[TMP1:%.*]] = call i32 @use_nocapture(ptr addrspace(1) nocapture noundef [[SRC]])
				; CHECK-NEXT: call void @llvm.memcpy.p2.p1.i64(ptr addrspace(2) align 4 [[DEST]], ptr addrspace(1) align 4 [[SRC]], i64 12, i1 false)
				; CHECK-NEXT: call void @llvm.lifetime.end.p1(i64 24, ptr addrspace(1) nocapture [[SRC]])
				; CHECK-NEXT: [[TMP2:%.*]] = call i32 @use_nocapture(ptr addrspace(2) nocapture noundef [[DEST]])
				; CHECK-NEXT: call void @llvm.lifetime.end.p2(i64 12, ptr addrspace(2) nocapture [[DEST]])
				; CHECK-NEXT: ret void
				;
				%src = alloca i8, i64 24, align 4, addrspace(1)
				%dest = alloca i8, i64 12, align 4, addrspace(2)
				call void @llvm.lifetime.start.p1(i64 24, ptr addrspace(1) nocapture %src)
				call void @llvm.lifetime.start.p2(i64 12, ptr addrspace(2) nocapture %dest)
				call void @llvm.memcpy.p1.p0.i64(ptr addrspace(1) align 4 %src, ptr align 4 @constant, i64 12, i1 false)
				%1 = call i32 @use_nocapture(ptr addrspace(1) noundef nocapture %src)
				call void @llvm.memcpy.p2.p1.i64(ptr addrspace(2) align 4 %dest, ptr addrspace(1) align 4 %src, i64 12, i1 false)
				call void @llvm.lifetime.end.p1(i64 24, ptr addrspace(1) nocapture %src)
				%2 = call i32 @use_nocapture(ptr addrspace(2) noundef nocapture %dest)
				call void @llvm.lifetime.end.p2(i64 12, ptr addrspace(2) nocapture %dest)
				ret void
				}

				; Tests that volatile memcpys aren't removed.
				define void @volatile_memcpy() {
				; CHECK-LABEL: @volatile_memcpy(
				; CHECK-NEXT: [[SRC:%.]] = alloca [[STRUCT_FOO:%.]], align 4
				; CHECK-NEXT: [[DEST:%.*]] = alloca [[STRUCT_FOO]], align 4
				; CHECK-NEXT: call void @llvm.lifetime.start.p0(i64 12, ptr nocapture [[SRC]])
				; CHECK-NEXT: call void @llvm.lifetime.start.p0(i64 12, ptr nocapture [[DEST]])
				; CHECK-NEXT: call void @llvm.memcpy.p0.p0.i64(ptr align 4 [[SRC]], ptr align 4 @constant, i64 12, i1 true)
				; CHECK-NEXT: [[TMP1:%.*]] = call i32 @use_nocapture(ptr nocapture noundef [[SRC]])
				; CHECK-NEXT: call void @llvm.memcpy.p0.p0.i64(ptr align 4 [[DEST]], ptr align 4 [[SRC]], i64 12, i1 true)
				; CHECK-NEXT: call void @llvm.lifetime.end.p0(i64 12, ptr nocapture [[SRC]])
				; CHECK-NEXT: [[TMP2:%.*]] = call i32 @use_nocapture(ptr nocapture noundef [[DEST]])
				; CHECK-NEXT: call void @llvm.lifetime.end.p0(i64 12, ptr nocapture [[DEST]])
				; CHECK-NEXT: ret void
				;
				%src = alloca %struct.Foo, align 4
				%dest = alloca %struct.Foo, align 4
				call void @llvm.lifetime.start.p0(i64 12, ptr nocapture %src)
				call void @llvm.lifetime.start.p0(i64 12, ptr nocapture %dest)
				call void @llvm.memcpy.p0.p0.i64(ptr align 4 %src, ptr align 4 @constant, i64 12, i1 true)
				%1 = call i32 @use_nocapture(ptr noundef nocapture %src)
				call void @llvm.memcpy.p0.p0.i64(ptr align 4 %dest, ptr align 4 %src, i64 12, i1 true)
				call void @llvm.lifetime.end.p0(i64 12, ptr nocapture %src)
				%2 = call i32 @use_nocapture(ptr noundef nocapture %dest)
				call void @llvm.lifetime.end.p0(i64 12, ptr nocapture %dest)
				ret void
				}

				; Tests that the optimization isn't performed when the destination is captured.
				define void @dest_captured() {
				; CHECK-LABEL: @dest_captured(
				; CHECK-NEXT: [[SRC:%.]] = alloca [[STRUCT_FOO:%.]], align 4
				; CHECK-NEXT: [[DEST:%.*]] = alloca [[STRUCT_FOO]], align 4
				; CHECK-NEXT: call void @llvm.lifetime.start.p0(i64 12, ptr nocapture [[SRC]])
				; CHECK-NEXT: call void @llvm.lifetime.start.p0(i64 12, ptr nocapture [[DEST]])
				; CHECK-NEXT: call void @llvm.memcpy.p0.p0.i64(ptr align 4 [[SRC]], ptr align 4 @constant, i64 12, i1 false)
				; CHECK-NEXT: [[TMP1:%.*]] = call i32 @use_nocapture(ptr nocapture noundef [[SRC]])
				; CHECK-NEXT: call void @llvm.memcpy.p0.p0.i64(ptr align 4 [[DEST]], ptr align 4 [[SRC]], i64 12, i1 false)
				; CHECK-NEXT: call void @llvm.lifetime.end.p0(i64 12, ptr nocapture [[SRC]])
				; CHECK-NEXT: [[TMP2:%.*]] = call i32 @use_capture(ptr noundef [[DEST]])
				; CHECK-NEXT: call void @llvm.lifetime.end.p0(i64 12, ptr nocapture [[DEST]])
				; CHECK-NEXT: ret void
				;
				%src = alloca %struct.Foo, align 4
				%dest = alloca %struct.Foo, align 4
				call void @llvm.lifetime.start.p0(i64 12, ptr nocapture %src)
				call void @llvm.lifetime.start.p0(i64 12, ptr nocapture %dest)
				call void @llvm.memcpy.p0.p0.i64(ptr align 4 %src, ptr align 4 @constant, i64 12, i1 false)
				%1 = call i32 @use_nocapture(ptr noundef nocapture %src)
				call void @llvm.memcpy.p0.p0.i64(ptr align 4 %dest, ptr align 4 %src, i64 12, i1 false)
				call void @llvm.lifetime.end.p0(i64 12, ptr nocapture %src)
				%2 = call i32 @use_capture(ptr noundef %dest)
				call void @llvm.lifetime.end.p0(i64 12, ptr nocapture %dest)
				ret void
				}

				; Tests that the optimization isn't performed when the source is captured.
				define void @src_captured() {
				; CHECK-LABEL: @src_captured(
				; CHECK-NEXT: [[SRC:%.]] = alloca [[STRUCT_FOO:%.]], align 4
				; CHECK-NEXT: [[DEST:%.*]] = alloca [[STRUCT_FOO]], align 4
				; CHECK-NEXT: call void @llvm.lifetime.start.p0(i64 12, ptr nocapture [[SRC]])
				; CHECK-NEXT: call void @llvm.lifetime.start.p0(i64 12, ptr nocapture [[DEST]])
				; CHECK-NEXT: call void @llvm.memcpy.p0.p0.i64(ptr align 4 [[SRC]], ptr align 4 @constant, i64 12, i1 false)
				; CHECK-NEXT: [[TMP1:%.*]] = call i32 @use_capture(ptr noundef [[SRC]])
				; CHECK-NEXT: call void @llvm.memcpy.p0.p0.i64(ptr align 4 [[DEST]], ptr align 4 [[SRC]], i64 12, i1 false)
				; CHECK-NEXT: call void @llvm.lifetime.end.p0(i64 12, ptr nocapture [[SRC]])
				; CHECK-NEXT: [[TMP2:%.*]] = call i32 @use_nocapture(ptr nocapture noundef [[DEST]])
				; CHECK-NEXT: call void @llvm.lifetime.end.p0(i64 12, ptr nocapture [[DEST]])
				; CHECK-NEXT: ret void
				;
				%src = alloca %struct.Foo, align 4
				%dest = alloca %struct.Foo, align 4
				call void @llvm.lifetime.start.p0(i64 12, ptr nocapture %src)
				call void @llvm.lifetime.start.p0(i64 12, ptr nocapture %dest)
				call void @llvm.memcpy.p0.p0.i64(ptr align 4 %src, ptr align 4 @constant, i64 12, i1 false)
				%1 = call i32 @use_capture(ptr noundef %src)
				call void @llvm.memcpy.p0.p0.i64(ptr align 4 %dest, ptr align 4 %src, i64 12, i1 false)
				call void @llvm.lifetime.end.p0(i64 12, ptr nocapture %src)
				%2 = call i32 @use_nocapture(ptr noundef nocapture %dest)
				call void @llvm.lifetime.end.p0(i64 12, ptr nocapture %dest)
				ret void
				}

				; Tests that the optimization isn't performed when the source and destination
				; are simultaneously live within the basic block.
				define void @local_liveness_conflict() {
				; CHECK-LABEL: @local_liveness_conflict(
				; CHECK-NEXT: [[SRC:%.]] = alloca [[STRUCT_FOO:%.]], align 4
				; CHECK-NEXT: [[DEST:%.*]] = alloca [[STRUCT_FOO]], align 4
				; CHECK-NEXT: call void @llvm.lifetime.start.p0(i64 12, ptr nocapture [[SRC]])
				; CHECK-NEXT: call void @llvm.lifetime.start.p0(i64 12, ptr nocapture [[DEST]])
				; CHECK-NEXT: call void @llvm.memcpy.p0.p0.i64(ptr align 4 [[SRC]], ptr align 4 @constant, i64 12, i1 false)
				; CHECK-NEXT: call void @llvm.memcpy.p0.p0.i64(ptr align 4 [[DEST]], ptr align 4 @constant, i64 12, i1 false)
				; CHECK-NEXT: [[TMP1:%.*]] = call i32 @use_nocapture(ptr nocapture noundef [[SRC]])
				; CHECK-NEXT: call void @llvm.lifetime.end.p0(i64 12, ptr nocapture [[SRC]])
				; CHECK-NEXT: [[TMP2:%.*]] = call i32 @use_nocapture(ptr nocapture noundef [[DEST]])
				; CHECK-NEXT: call void @llvm.lifetime.end.p0(i64 12, ptr nocapture [[DEST]])
				; CHECK-NEXT: ret void
				;
				%src = alloca %struct.Foo, align 4
				%dest = alloca %struct.Foo, align 4
				call void @llvm.lifetime.start.p0(i64 12, ptr nocapture %src)
				call void @llvm.lifetime.start.p0(i64 12, ptr nocapture %dest)
				call void @llvm.memcpy.p0.p0.i64(ptr align 4 %src, ptr align 4 @constant, i64 12, i1 false)
				call void @llvm.memcpy.p0.p0.i64(ptr align 4 %dest, ptr align 4 %src, i64 12, i1 false)
				%1 = call i32 @use_nocapture(ptr noundef nocapture %src)
				call void @llvm.lifetime.end.p0(i64 12, ptr nocapture %src)
				%2 = call i32 @use_nocapture(ptr noundef nocapture %dest)
				call void @llvm.lifetime.end.p0(i64 12, ptr nocapture %dest)
				ret void
				}

				; Tests that the optimization isn't performed when the source and destination
				; are simultaneously live in a way that can only be determined by examining
				; multiple basic blocks.
				define void @global_liveness_conflict() {
				; CHECK-LABEL: @global_liveness_conflict(
				; CHECK-NEXT: [[SRC:%.]] = alloca [[STRUCT_FOO:%.]], align 4
				; CHECK-NEXT: [[DEST:%.*]] = alloca [[STRUCT_FOO]], align 4
				; CHECK-NEXT: call void @llvm.lifetime.start.p0(i64 12, ptr nocapture [[SRC]])
				; CHECK-NEXT: call void @llvm.lifetime.start.p0(i64 12, ptr nocapture [[DEST]])
				; CHECK-NEXT: call void @llvm.memcpy.p0.p0.i64(ptr align 4 [[SRC]], ptr align 4 @constant, i64 12, i1 false)
				; CHECK-NEXT: [[TMP1:%.*]] = call i32 @use_nocapture(ptr nocapture noundef [[SRC]])
				; CHECK-NEXT: call void @llvm.memcpy.p0.p0.i64(ptr align 4 [[DEST]], ptr align 4 [[SRC]], i64 12, i1 false)
				; CHECK-NEXT: [[TMP2:%.*]] = call i1 @cond()
				; CHECK-NEXT: br i1 [[TMP2]], label [[BB0:%.]], label [[BB1:%.]]
				; CHECK: bb0:
				; CHECK-NEXT: [[TMP3:%.*]] = call i32 @use_nocapture(ptr nocapture noundef [[SRC]])
				; CHECK-NEXT: br label [[BB2:%.*]]
				; CHECK: bb1:
				; CHECK-NEXT: [[TMP4:%.*]] = call i32 @use_nocapture(ptr nocapture noundef [[DEST]])
				; CHECK-NEXT: br label [[BB2]]
				; CHECK: bb2:
				; CHECK-NEXT: call void @llvm.lifetime.end.p0(i64 12, ptr nocapture [[SRC]])
				; CHECK-NEXT: call void @llvm.lifetime.end.p0(i64 12, ptr nocapture [[DEST]])
				; CHECK-NEXT: ret void
				;
				%src = alloca %struct.Foo, align 4
				%dest = alloca %struct.Foo, align 4
				call void @llvm.lifetime.start.p0(i64 12, ptr nocapture %src)
				call void @llvm.lifetime.start.p0(i64 12, ptr nocapture %dest)
				call void @llvm.memcpy.p0.p0.i64(ptr align 4 %src, ptr align 4 @constant, i64 12, i1 false)
				%1 = call i32 @use_nocapture(ptr noundef nocapture %src)
				call void @llvm.memcpy.p0.p0.i64(ptr align 4 %dest, ptr align 4 %src, i64 12, i1 false)
				%2 = call i1 @cond()
				br i1 %2, label %bb0, label %bb1

				bb0:
				%3 = call i32 @use_nocapture(ptr noundef nocapture %src)
				br label %bb2

				bb1:
				%4 = call i32 @use_nocapture(ptr noundef nocapture %dest)
				br label %bb2

				bb2:
				call void @llvm.lifetime.end.p0(i64 12, ptr nocapture %src)
				call void @llvm.lifetime.end.p0(i64 12, ptr nocapture %dest)
				ret void
				}

				; Tests that a memcpy that doesn't completely overwrite a stack value is a use
				; for the purposes of liveness analysis, not a definition.
				define void @incomplete_memcpy_is_use() {
				; CHECK-LABEL: @incomplete_memcpy_is_use(
				; CHECK-NEXT: [[SRC:%.]] = alloca [[STRUCT_FOO:%.]], align 4
				; CHECK-NEXT: [[DEST:%.*]] = alloca [[STRUCT_FOO]], align 4
				; CHECK-NEXT: call void @llvm.lifetime.start.p0(i64 12, ptr nocapture [[SRC]])
				; CHECK-NEXT: call void @llvm.lifetime.start.p0(i64 12, ptr nocapture [[DEST]])
				; CHECK-NEXT: call void @llvm.memcpy.p0.p0.i64(ptr align 4 [[SRC]], ptr align 4 @constant, i64 12, i1 false)
				; CHECK-NEXT: [[TMP1:%.*]] = call i32 @use_nocapture(ptr nocapture noundef [[SRC]])
				; CHECK-NEXT: call void @llvm.memcpy.p0.p0.i64(ptr align 4 [[DEST]], ptr align 4 [[SRC]], i64 11, i1 false)
				; CHECK-NEXT: [[TMP2:%.*]] = call i32 @use_nocapture(ptr nocapture noundef [[DEST]])
				; CHECK-NEXT: call void @llvm.memcpy.p0.p0.i64(ptr align 4 [[SRC]], ptr align 4 @constant, i64 12, i1 false)
				; CHECK-NEXT: [[TMP3:%.*]] = call i32 @use_nocapture(ptr nocapture noundef [[SRC]])
				; CHECK-NEXT: call void @llvm.lifetime.end.p0(i64 12, ptr nocapture [[SRC]])
				; CHECK-NEXT: call void @llvm.lifetime.end.p0(i64 12, ptr nocapture [[DEST]])
				; CHECK-NEXT: ret void
				;
				%src = alloca %struct.Foo, align 4
				%dest = alloca %struct.Foo, align 4
				call void @llvm.lifetime.start.p0(i64 12, ptr nocapture %src)
				call void @llvm.lifetime.start.p0(i64 12, ptr nocapture %dest)
				call void @llvm.memcpy.p0.p0.i64(ptr align 4 %src, ptr align 4 @constant, i64 12, i1 false)
				%1 = call i32 @use_nocapture(ptr noundef nocapture %src)
				call void @llvm.memcpy.p0.p0.i64(ptr align 4 %dest, ptr align 4 %src, i64 11, i1 false)
				%2 = call i32 @use_nocapture(ptr noundef nocapture %dest)
				call void @llvm.memcpy.p0.p0.i64(ptr align 4 %src, ptr align 4 @constant, i64 12, i1 false)
				%3 = call i32 @use_nocapture(ptr noundef nocapture %src)
				call void @llvm.lifetime.end.p0(i64 12, ptr nocapture %src)
				call void @llvm.lifetime.end.p0(i64 12, ptr nocapture %dest)
				ret void
				}

				; Tests that a store that doesn't completely overwrite a stack value is a use
				; for the purposes of liveness analysis, not a definition.
				define void @incomplete_store_is_use() {
				; CHECK-LABEL: @incomplete_store_is_use(
				; CHECK-NEXT: [[SRC:%.]] = alloca [[STRUCT_FOO:%.]], align 4
				; CHECK-NEXT: [[DEST:%.*]] = alloca [[STRUCT_FOO]], align 4
				; CHECK-NEXT: call void @llvm.lifetime.start.p0(i64 12, ptr nocapture [[SRC]])
				; CHECK-NEXT: call void @llvm.lifetime.start.p0(i64 12, ptr nocapture [[DEST]])
				; CHECK-NEXT: call void @llvm.memcpy.p0.p0.i64(ptr align 4 [[SRC]], ptr align 4 @constant, i64 12, i1 false)
				; CHECK-NEXT: [[TMP1:%.*]] = call i32 @use_nocapture(ptr nocapture noundef [[SRC]])
				; CHECK-NEXT: [[TMP2:%.*]] = load i32, ptr [[SRC]], align 4
				; CHECK-NEXT: store i32 [[TMP2]], ptr [[DEST]], align 4
				; CHECK-NEXT: [[TMP3:%.*]] = call i32 @use_nocapture(ptr nocapture noundef [[DEST]])
				; CHECK-NEXT: call void @llvm.memcpy.p0.p0.i64(ptr align 4 [[SRC]], ptr align 4 @constant, i64 12, i1 false)
				; CHECK-NEXT: [[TMP4:%.*]] = call i32 @use_nocapture(ptr nocapture noundef [[SRC]])
				; CHECK-NEXT: call void @llvm.lifetime.end.p0(i64 12, ptr nocapture [[SRC]])
				; CHECK-NEXT: call void @llvm.lifetime.end.p0(i64 12, ptr nocapture [[DEST]])
				; CHECK-NEXT: ret void
				;
				%src = alloca %struct.Foo, align 4
				%dest = alloca %struct.Foo, align 4
				call void @llvm.lifetime.start.p0(i64 12, ptr nocapture %src)
				call void @llvm.lifetime.start.p0(i64 12, ptr nocapture %dest)
				call void @llvm.memcpy.p0.p0.i64(ptr align 4 %src, ptr align 4 @constant, i64 12, i1 false)
				%1 = call i32 @use_nocapture(ptr noundef nocapture %src)
				%2 = load i32, ptr %src
				store i32 %2, ptr %dest
				%3 = call i32 @use_nocapture(ptr noundef nocapture %dest)
				call void @llvm.memcpy.p0.p0.i64(ptr align 4 %src, ptr align 4 @constant, i64 12, i1 false)
				%4 = call i32 @use_nocapture(ptr noundef nocapture %src)
				call void @llvm.lifetime.end.p0(i64 12, ptr nocapture %src)
				call void @llvm.lifetime.end.p0(i64 12, ptr nocapture %dest)
				ret void
				}

				; Tests that we don't incorrectly try to perform the optimization for allocas
				; that succeed llvm.stacksave intrinsics, by restricting our optimization to
				; allocas that were defined at the start of the entry block.
				define void @stacksave() {
				; CHECK-LABEL: @stacksave(
				; CHECK-NEXT: [[SRC:%.]] = alloca [[STRUCT_FOO:%.]], align 4
				; CHECK-NEXT: [[TMP1:%.*]] = call ptr @llvm.stacksave()
				; CHECK-NEXT: [[DEST:%.*]] = alloca [[STRUCT_FOO]], align 4
				; CHECK-NEXT: call void @llvm.lifetime.start.p0(i64 12, ptr nocapture [[SRC]])
				; CHECK-NEXT: call void @llvm.lifetime.start.p0(i64 12, ptr nocapture [[DEST]])
				; CHECK-NEXT: call void @llvm.memcpy.p0.p0.i64(ptr align 4 [[SRC]], ptr align 4 @constant, i64 12, i1 false)
				; CHECK-NEXT: [[TMP2:%.*]] = call i32 @use_nocapture(ptr nocapture noundef [[SRC]])
				; CHECK-NEXT: call void @llvm.memcpy.p0.p0.i64(ptr align 4 [[DEST]], ptr align 4 [[SRC]], i64 12, i1 false)
				; CHECK-NEXT: call void @llvm.lifetime.end.p0(i64 12, ptr nocapture [[SRC]])
				; CHECK-NEXT: [[TMP3:%.*]] = call i32 @use_nocapture(ptr nocapture noundef [[DEST]])
				; CHECK-NEXT: call void @llvm.lifetime.end.p0(i64 12, ptr nocapture [[DEST]])
				; CHECK-NEXT: ret void
				;
				%src = alloca %struct.Foo, align 4
				%1 = call ptr @llvm.stacksave()
				%dest = alloca %struct.Foo, align 4
				call void @llvm.lifetime.start.p0(i64 12, ptr nocapture %src)
				call void @llvm.lifetime.start.p0(i64 12, ptr nocapture %dest)
				call void @llvm.memcpy.p0.p0.i64(ptr align 4 %src, ptr align 4 @constant, i64 12, i1 false)
				%2 = call i32 @use_nocapture(ptr noundef nocapture %src)
				call void @llvm.memcpy.p0.p0.i64(ptr align 4 %dest, ptr align 4 %src, i64 12, i1 false)
				call void @llvm.lifetime.end.p0(i64 12, ptr nocapture %src)
				%3 = call i32 @use_nocapture(ptr noundef nocapture %dest)
				call void @llvm.lifetime.end.p0(i64 12, ptr nocapture %dest)
				ret void
				}

				; Tests that the optimization fails if too many basic blocks are examined.
				define void @too_many_basic_blocks() {
				; CHECK-LABEL: @too_many_basic_blocks(
				; CHECK-NEXT: [[SRC:%.]] = alloca [[STRUCT_FOO:%.]], align 4
				; CHECK-NEXT: [[DEST:%.*]] = alloca [[STRUCT_FOO]], align 4
				; CHECK-NEXT: call void @llvm.lifetime.start.p0(i64 12, ptr nocapture [[SRC]])
				; CHECK-NEXT: call void @llvm.lifetime.start.p0(i64 12, ptr nocapture [[DEST]])
				; CHECK-NEXT: call void @llvm.memcpy.p0.p0.i64(ptr align 4 [[SRC]], ptr align 4 @constant, i64 12, i1 false)
				; CHECK-NEXT: [[TMP1:%.*]] = call i32 @use_nocapture(ptr nocapture noundef [[SRC]])
				; CHECK-NEXT: br label [[BB0:%.*]]
				; CHECK: bb0:
				; CHECK-NEXT: br label [[BB1:%.*]]
				; CHECK: bb1:
				; CHECK-NEXT: br label [[BB2:%.*]]
				; CHECK: bb2:
				; CHECK-NEXT: br label [[BB3:%.*]]
				; CHECK: bb3:
				; CHECK-NEXT: br label [[BB4:%.*]]
				; CHECK: bb4:
				; CHECK-NEXT: br label [[BB5:%.*]]
				; CHECK: bb5:
				; CHECK-NEXT: br label [[BB6:%.*]]
				; CHECK: bb6:
				; CHECK-NEXT: br label [[BB7:%.*]]
				; CHECK: bb7:
				; CHECK-NEXT: br label [[BB8:%.*]]
				; CHECK: bb8:
				; CHECK-NEXT: br label [[BB9:%.*]]
				; CHECK: bb9:
				; CHECK-NEXT: br label [[BB10:%.*]]
				; CHECK: bb10:
				; CHECK-NEXT: call void @llvm.memcpy.p0.p0.i64(ptr align 4 [[DEST]], ptr align 4 [[SRC]], i64 12, i1 false)
				; CHECK-NEXT: call void @llvm.lifetime.end.p0(i64 12, ptr nocapture [[SRC]])
				; CHECK-NEXT: [[TMP2:%.*]] = call i32 @use_nocapture(ptr nocapture noundef [[DEST]])
				; CHECK-NEXT: call void @llvm.lifetime.end.p0(i64 12, ptr nocapture [[DEST]])
				; CHECK-NEXT: ret void
				;
				%src = alloca %struct.Foo, align 4
				%dest = alloca %struct.Foo, align 4
				call void @llvm.lifetime.start.p0(i64 12, ptr nocapture %src)
				call void @llvm.lifetime.start.p0(i64 12, ptr nocapture %dest)
				call void @llvm.memcpy.p0.p0.i64(ptr align 4 %src, ptr align 4 @constant, i64 12, i1 false)
				%1 = call i32 @use_nocapture(ptr noundef nocapture %src)
				br label %bb0

				bb0:
				br label %bb1
				bb1:
				br label %bb2
				bb2:
				br label %bb3
				bb3:
				br label %bb4
				bb4:
				br label %bb5
				bb5:
				br label %bb6
				bb6:
				br label %bb7
				bb7:
				br label %bb8
				bb8:
				br label %bb9
				bb9:
				br label %bb10

				bb10:
				call void @llvm.memcpy.p0.p0.i64(ptr align 4 %dest, ptr align 4 %src, i64 12, i1 false)
				call void @llvm.lifetime.end.p0(i64 12, ptr nocapture %src)
				%2 = call i32 @use_nocapture(ptr noundef nocapture %dest)
				call void @llvm.lifetime.end.p0(i64 12, ptr nocapture %dest)
				ret void
				}

				declare void @llvm.memcpy.p0.p0.i64(ptr noalias nocapture writeonly, ptr noalias nocapture readonly, i64, i1 immarg)
				declare void @llvm.memcpy.p1.p0.i64(ptr addrspace(1) noalias nocapture writeonly, ptr noalias nocapture readonly, i64, i1 immarg)
				declare void @llvm.memcpy.p2.p1.i64(ptr addrspace(2) noalias nocapture writeonly, ptr addrspace(1) noalias nocapture readonly, i64, i1 immarg)
				declare void @llvm.memmove.p0.p0.i64(ptr nocapture writeonly, ptr nocapture readonly, i64, i1 immarg)
				declare void @llvm.memset.p0.i64(ptr nocapture writeonly, i8, i64, i1 immarg)
				declare void @llvm.lifetime.start.p0(i64, ptr nocapture)
				declare void @llvm.lifetime.end.p0(i64, ptr nocapture)
				declare void @llvm.lifetime.start.p1(i64, ptr addrspace(1) nocapture)
				declare void @llvm.lifetime.end.p1(i64, ptr addrspace(1) nocapture)
				declare void @llvm.lifetime.start.p2(i64, ptr addrspace(2) nocapture)
				declare void @llvm.lifetime.end.p2(i64, ptr addrspace(2) nocapture)
				declare ptr @llvm.stacksave()

				declare i32 @use_nocapture(ptr noundef nocapture)
				declare i32 @use_nocapture_p1(ptr addrspace(1) noundef nocapture)
				declare i32 @use_nocapture_p2(ptr addrspace(2) noundef nocapture)
				declare i32 @use_capture(ptr noundef)
				declare i1 @cond()

				; Scope domain
				!0 = !{!0}
				; Scope in that domain
				!1 = !{!1, !0}
				; Scope list
				!2 = !{!1}

				!3 = !{!"Whatever"}

This is an archive of the discontinued LLVM Phabricator instance.

[MemCpyOpt] Add a stack-move optimization to opportunistically merge allocas together.Needs ReviewPublic

Details

Diff Detail

Event Timeline

Revision Contents

Diff 484132

llvm/include/llvm/Analysis/CaptureTracking.h

llvm/include/llvm/Transforms/Scalar/MemCpyOptimizer.h

llvm/lib/Analysis/CaptureTracking.cpp

llvm/lib/Transforms/Scalar/MemCpyOptimizer.cpp

llvm/test/Analysis/ScopedNoAliasAA/alias-scope-merging.ll

llvm/test/Other/new-pm-defaults.ll

llvm/test/Other/new-pm-lto-defaults.ll

llvm/test/Other/new-pm-thinlto-defaults.ll

llvm/test/Transforms/MemCpyOpt/callslot.ll

llvm/test/Transforms/MemCpyOpt/callslot_badaa.ll

llvm/test/Transforms/MemCpyOpt/stack-move.ll

[MemCpyOpt] Add a stack-move optimization to opportunistically merge allocas together.
Needs ReviewPublic