This is an archive of the discontinued LLVM Phabricator instance.

Paths

Table of Contentst

-
llvm/
-
lib/Transforms/Scalar/
-
Transforms/
-
Scalar/
-
EarlyCSE.cpp
1
GVN.cpp
-
test/Transforms/Coroutines/
-
Transforms/
-
Coroutines/
-
coro-no-value-reuse-across-suspend.ll

Differential D89711

[Coroutine] Prevent value reusing across coroutine suspensions in EarlyCSE and GVN
Changes PlannedPublic

Authored by lxfind on Oct 19 2020, 10:02 AM.

Download Raw Diff

Details

Reviewers

wenlei
efriedma
nikic
fhahn

Summary

There are two problems with reusing values across coroutine suspensions:

It has been assumed that all the code within the same function would run in the same thread. For example, the glibc library pthread_self is defined with attribute((const)) (which would have readnone attribute in LLVM IR) even though it in fact reads global memory. This was OK because within the same function the thread ID will never change in a non-coroutine function. Optimizazers take advantage of that and would reuse the results if there are multiple calls to pthread_self within the same function. However with coroutines this is no longer true. We cannot reuse the results of pthread_self if they cross suspension points because they could be running on different threads.
Any data that needs to lives across coroutine suspensions will have to be spilled onto the coroutine frame (heap). Value reusing across suspensions would generate lots of data that needs to live on the frame. This can be expensive and makes the frame large unnecessarily in common cases.

Diff Detail

Repository: rG LLVM Github Monorepo

Unit TestsFailed

	Time	Test
	290 ms	linux > LLVM.Transforms/Coroutines::ArgAddr.ll
	50 ms	linux > LLVM.Transforms/Coroutines::coro-no-value-reuse-across-suspend.ll
	270 ms	linux > LLVM.Transforms/Coroutines::coro-split-01.ll
	280 ms	linux > LLVM.Transforms/Coroutines::ex1.ll
	290 ms	linux > LLVM.Transforms/Coroutines::ex2.ll
		View Full Test Results (16 Failed)

Event Timeline

lxfind created this revision.Oct 19 2020, 10:02 AM

Herald added a project: Restricted Project. · View Herald TranscriptOct 19 2020, 10:02 AM

Herald added subscribers: llvm-commits, modimo, jfb and 2 others. · View Herald Transcript

lxfind requested review of this revision.Oct 19 2020, 10:02 AM

Harbormaster completed remote builds in B75554: Diff 299084.Oct 19 2020, 10:44 AM

rjmccall added inline comments.Oct 19 2020, 10:49 AM

llvm/lib/Transforms/Scalar/GVN.cpp
2141	Different coroutine lowerings use different suspend instructions. Can you write this (and similar conditions elsewhere in the patch) so that it'll apply in any lowering? Maybe add a `canSuspendCoroutine` method to `IntrinsicInst`.

For the first point, if the IR definition isn't consistent, I'd prefer to actually fix that, instead of work around it. There are a lot of places that assume alias analysis is accurate.

For example, the glibc library pthread_self is defined with attribute((const)) (which would have readnone attribute in LLVM IR) even though it in fact reads global memory.

Can we fix up the attributes on pthread_self in clang? Or are we more generally concerned with people marking their functions const?

On a related note, we probably need to change the representation of references to thread-local variables.

For the second point, it isn't obvious to me that disabling CSE is universally profitable. We can actually end up reducing the number of values live across the suspend point in some cases. And it seems simpler to teach coroutine lowering to rematerialize instructions when it's profitable.

In D89711#2339394, @efriedma wrote:

For the first point, if the IR definition isn't consistent, I'd prefer to actually fix that, instead of work around it. There are a lot of places that assume alias analysis is accurate.

For example, the glibc library pthread_self is defined with attribute((const)) (which would have readnone attribute in LLVM IR) even though it in fact reads global memory.

Can we fix up the attributes on pthread_self in clang? Or are we more generally concerned with people marking their functions const?

On a related note, we probably need to change the representation of references to thread-local variables.

For the second point, it isn't obvious to me that disabling CSE is universally profitable. We can actually end up reducing the number of values live across the suspend point in some cases. And it seems simpler to teach coroutine lowering to rematerialize instructions when it's profitable.

Thanks for the suggestions.
On the first point, I could certainly try fixing up pthread_self in Clang, but that would also mean we would never be able to optimize out redundant pthread_self() calls. I am not sure if that's acceptable in general. So far I only find pthread_self() to be problematic, but I am overall worried there might be more things like this that I am not aware of. Anyway, I could give it a try.

And yes thread local variables are another set of problems. I don't have a solution yet on how to handle them.

The second point makes sense to me.

but that would also mean we would never be able to optimize out redundant pthread_self() calls

We can probably mess with alias analysis so it understands that pthread_self doesn't alias operations other than calls to a coroutine suspend; that should be enough to recover the relevant optimizations. Not sure if we want to add some sort of IR attribute, or just special-case that specific library call using TargetLibraryInfo.

And yes thread local variables are another set of problems. I don't have a solution yet on how to handle them.

We probably need an intrinsic that computes the runtime address of a thread-local variable, so we compute the address at some specific point in the function.

In D89711#2339528, @efriedma wrote:

but that would also mean we would never be able to optimize out redundant pthread_self() calls

We can probably mess with alias analysis so it understands that pthread_self doesn't alias operations other than calls to a coroutine suspend; that should be enough to recover the relevant optimizations. Not sure if we want to add some sort of IR attribute, or just special-case that specific library call using TargetLibraryInfo.

And yes thread local variables are another set of problems. I don't have a solution yet on how to handle them.

We probably need an intrinsic that computes the runtime address of a thread-local variable, so we compute the address at some specific point in the function.

After thinking about it more:
First of all, we cannot drop the readnone tag in the definition of pthread_self in Clang, the regression in the non-coroutine cases are likely unacceptable and they shouldn't pay for it if not using coroutines.
Secondly, because one can call pthread_self through indirect function calls, hence just checking for pthread_self in coroutines is not sufficient. Instead, in a coroutine function, we never want to reuse the results of function calls.
There doesn't seem to be a way to tag a callsite that it might access memory (except through operand bundles, which doesn't seem to fit here), so it seems to me there are only two possible solutions:

Rewrite Clang frontend for coroutine so that it directly emit multiple functions for each suspension region. It eliminates the problem but then optimizing across multiple functions that in fact belong to one will be quite challenging and the change will be very significant.
In all the relevant passes that would reuse call results (EarlyCSE and GVN as far as I know, but do let me know if there are others), do not reuse call results within a coroutine.

Since the first solution is way to heavy and has a lot of downsides, the second solution seems the way to go. It will be basically along the shape of this patch, but limit the damage to only call result sharing, not other expressions. What do you think?

First of all, we cannot drop the readnone tag in the definition of pthread_self in Clang, the regression in the non-coroutine cases are likely unacceptable and they shouldn't pay for it if not using coroutines.

Like I noted before, we can recover the optimization power by special-casing it in alias analysis. For almost all optimizations, it doesn't matter that it reads memory if that read doesn't actually alias anything.

Secondly, because one can call pthread_self through indirect function calls, hence just checking for pthread_self in coroutines is not sufficient. Instead, in a coroutine function, we never want to reuse the results of function calls.

This depends on what we decide the "const" attribute actually means, in the context of coroutines. We could decide that const actually means the value can't depend on the thread ID, then treat fixing up the libc header as a compatibility hack. Or we could decide that "const" is actually allowed to change across a coroutine suspend, in which case we need to do something more invasive.

Either way, I'd still prefer that the IR readnone exclude functions that behave like pthread_self, I think.

In all the relevant passes that would reuse call results (EarlyCSE and GVN as far as I know, but do let me know if there are others), do not reuse call results within a coroutine.

We also have a few other GVN-ish passes: NewGVN, GVNHoist and GVNSink. SimplifyCFG does a form of CSE, although I'm not sure it actually affects this case. LICM also reuses call results in a sense that's relevant here.

Even if we do catch all the cases that are relevant right now, having an "extra" form of memory access that isn't visible to alias analysis makes understanding existing code, and writing new code, harder.

I think this discussion is getting to the point where it would benefit from a wider audience on llvm-dev.

In D89711#2339528, @efriedma wrote:

but that would also mean we would never be able to optimize out redundant pthread_self() calls

We can probably mess with alias analysis so it understands that pthread_self doesn't alias operations other than calls to a coroutine suspend; that should be enough to recover the relevant optimizations. Not sure if we want to add some sort of IR attribute, or just special-case that specific library call using TargetLibraryInfo.

And yes thread local variables are another set of problems. I don't have a solution yet on how to handle them.

We probably need an intrinsic that computes the runtime address of a thread-local variable, so we compute the address at some specific point in the function.

After thinking about it more:
First of all, we cannot drop the readnone tag in the definition of pthread_self in Clang, the regression in the non-coroutine cases are likely unacceptable and they should pay for it if not using coroutines.
Secondly, because one can call pthread_self through indirect function calls, hence just checking for pthread_self in coroutines is not sufficient. Instead, in a coroutine function, we never want to reuse the results of function calls.
There doesn't seem to be a way to tag a callsite that it might access memory (except through operand bundles, which doesn't seem to fit here), so it seems to me there are only two possible solutions:

Rewrite Clang frontend for coroutine so that it directly emit multiple functions for each suspension region. It eliminates the problem but then optimizing across multiple functions that in fact belong to one will be quite challenging and the change will be very significant.
In all the relevant passes that would reuse call results (EarlyCSE and GVN as far as I know, but do let me know if there are others), do not reuse call results within a coroutine.

In D89711#2342389, @efriedma wrote:

First of all, we cannot drop the readnone tag in the definition of pthread_self in Clang, the regression in the non-coroutine cases are likely unacceptable and they shouldn't pay for it if not using coroutines.

Like I noted before, we can recover the optimization power by special-casing it in alias analysis. For almost all optimizations, it doesn't matter that it reads memory if that read doesn't actually alias anything.

Secondly, because one can call pthread_self through indirect function calls, hence just checking for pthread_self in coroutines is not sufficient. Instead, in a coroutine function, we never want to reuse the results of function calls.

This depends on what we decide the "const" attribute actually means, in the context of coroutines. We could decide that const actually means the value can't depend on the thread ID, then treat fixing up the libc header as a compatibility hack. Or we could decide that "const" is actually allowed to change across a coroutine suspend, in which case we need to do something more invasive.

Either way, I'd still prefer that the IR readnone exclude functions that behave like pthread_self, I think.

In all the relevant passes that would reuse call results (EarlyCSE and GVN as far as I know, but do let me know if there are others), do not reuse call results within a coroutine.

We also have a few other GVN-ish passes: NewGVN, GVNHoist and GVNSink. SimplifyCFG does a form of CSE, although I'm not sure it actually affects this case. LICM also reuses call results in a sense that's relevant here.

Even if we do catch all the cases that are relevant right now, having an "extra" form of memory access that isn't visible to alias analysis makes understanding existing code, and writing new code, harder.

I think this discussion is getting to the point where it would benefit from a wider audience on llvm-dev.

Thanks. I think I get it now. I will write up something in more details and post in llvm-dev.
To summarize:

In Clang we special handle the pthread_self() function declaration and remove the readnone attribute.
In Alias Analysis we make it such that pthread_self() only interferes with Coroutine suspension intrinsics.

lxfind planned changes to this revision.Nov 10 2020, 10:00 AM

Revision Contents

Path

Size

llvm/

lib/

Transforms/

Scalar/

EarlyCSE.cpp

126 lines

GVN.cpp

10 lines

test/

Transforms/

Coroutines/

coro-no-value-reuse-across-suspend.ll

62 lines

Diff 299084

llvm/lib/Transforms/Scalar/EarlyCSE.cpp

Show First 20 Lines • Show All 528 Lines • ▼ Show 20 Lines	public:
const TargetLibraryInfo &TLI;		const TargetLibraryInfo &TLI;
const TargetTransformInfo &TTI;		const TargetTransformInfo &TTI;
DominatorTree &DT;		DominatorTree &DT;
AssumptionCache &AC;		AssumptionCache &AC;
const SimplifyQuery SQ;		const SimplifyQuery SQ;
MemorySSA *MSSA;		MemorySSA *MSSA;
std::unique_ptr<MemorySSAUpdater> MSSAUpdater;		std::unique_ptr<MemorySSAUpdater> MSSAUpdater;

		/// Pair current values with a generation number, in order to prevent from
		/// resuing values across coroutine suspension points. Value reusing typically
		/// assumes that the execution within the same function happens
		/// in the same thread, which is no longer true after coroutine suspension.
		/// For example, pthread_self() from glibc is defined as readnone, but cannot
		/// be CSE-ed across coroutine suspension points.
		/// Furthermore, Value reuse generates local variables that needs to stay
		/// alive acorss suspension points, leanding to coroutine frame size increase.
		struct GenerationalValue {
		Value *V = nullptr;
		unsigned Generation = 0;
		};

using AllocatorTy =		using AllocatorTy =
RecyclingAllocator<BumpPtrAllocator,		RecyclingAllocator<BumpPtrAllocator,
ScopedHashTableVal<SimpleValue, Value *>>;		ScopedHashTableVal<SimpleValue, GenerationalValue>>;
using ScopedHTType =		using ScopedHTType = ScopedHashTable<SimpleValue, GenerationalValue,
ScopedHashTable<SimpleValue, Value *, DenseMapInfo<SimpleValue>,		DenseMapInfo<SimpleValue>, AllocatorTy>;
AllocatorTy>;

/// A scoped hash table of the current values of all of our simple		/// A scoped hash table of the current values of all of our simple
/// scalar expressions.		/// scalar expressions.
///		///
/// As we walk down the domtree, we look to see if instructions are in this:		/// As we walk down the domtree, we look to see if instructions are in this:
/// if so, we replace them with what we find, otherwise we insert them so		/// if so, we replace them with what we find, otherwise we insert them so
/// that dominated values can succeed in their lookup.		/// that dominated values can succeed in their lookup.
ScopedHTType AvailableValues;		ScopedHTType AvailableValues;
▲ Show 20 Lines • Show All 48 Lines • ▼ Show 20 Lines	public:
/// A scoped hash table of the current values of read-only call		/// A scoped hash table of the current values of read-only call
/// values.		/// values.
///		///
/// It uses the same generation count as loads.		/// It uses the same generation count as loads.
using CallHTType =		using CallHTType =
ScopedHashTable<CallValue, std::pair<Instruction *, unsigned>>;		ScopedHashTable<CallValue, std::pair<Instruction *, unsigned>>;
CallHTType AvailableCalls;		CallHTType AvailableCalls;

/// This is the current generation of the memory value.		struct GenerationPair {
unsigned CurrentGeneration = 0;		unsigned MemoryGeneration = 0;
		unsigned ValueGeneration = 0;
		};

		/// This is the current generation of memory and expression values.
		GenerationPair CurrentGeneration;

/// Set up the EarlyCSE runner for a particular function.		/// Set up the EarlyCSE runner for a particular function.
EarlyCSE(const DataLayout &DL, const TargetLibraryInfo &TLI,		EarlyCSE(const DataLayout &DL, const TargetLibraryInfo &TLI,
const TargetTransformInfo &TTI, DominatorTree &DT,		const TargetTransformInfo &TTI, DominatorTree &DT,
AssumptionCache &AC, MemorySSA *MSSA)		AssumptionCache &AC, MemorySSA *MSSA)
: TLI(TLI), TTI(TTI), DT(DT), AC(AC), SQ(DL, &TLI, &DT, &AC), MSSA(MSSA),		: TLI(TLI), TTI(TTI), DT(DT), AC(AC), SQ(DL, &TLI, &DT, &AC), MSSA(MSSA),
MSSAUpdater(std::make_unique<MemorySSAUpdater>(MSSA)) {}		MSSAUpdater(std::make_unique<MemorySSAUpdater>(MSSA)) {}

Show All 23 Lines	private:
// Contains all the needed information to create a stack for doing a depth		// Contains all the needed information to create a stack for doing a depth
// first traversal of the tree. This includes scopes for values, loads, and		// first traversal of the tree. This includes scopes for values, loads, and
// calls as well as the generation. There is a child iterator so that the		// calls as well as the generation. There is a child iterator so that the
// children do not need to be store separately.		// children do not need to be store separately.
class StackNode {		class StackNode {
public:		public:
StackNode(ScopedHTType &AvailableValues, LoadHTType &AvailableLoads,		StackNode(ScopedHTType &AvailableValues, LoadHTType &AvailableLoads,
InvariantHTType &AvailableInvariants, CallHTType &AvailableCalls,		InvariantHTType &AvailableInvariants, CallHTType &AvailableCalls,
unsigned cg, DomTreeNode *n, DomTreeNode::const_iterator child,		GenerationPair gp, DomTreeNode *n,
		Lint: Pre-merge checks Inline Actions clang-tidy: warning: invalid case style for parameter 'gp' [readability-identifier-naming] not useful clang-tidy: warning: invalid case style for parameter 'n' [readability-identifier-naming] not useful Lint: Pre-merge checks: clang-tidy: warning: invalid case style for parameter 'gp' [readability-identifier-naming]…
		DomTreeNode::const_iterator child,
		Lint: Pre-merge checks Inline Actions clang-tidy: warning: invalid case style for parameter 'child' [readability-identifier-naming] not useful Lint: Pre-merge checks: clang-tidy: warning: invalid case style for parameter 'child' [readability-identifier-naming]…
DomTreeNode::const_iterator end)		DomTreeNode::const_iterator end)
: CurrentGeneration(cg), ChildGeneration(cg), Node(n), ChildIter(child),		: CurrentGeneration(gp), ChildGeneration(gp), Node(n), ChildIter(child),
EndIter(end),		EndIter(end), Scopes(AvailableValues, AvailableLoads,
Scopes(AvailableValues, AvailableLoads, AvailableInvariants,		AvailableInvariants, AvailableCalls) {}
AvailableCalls)
{}
StackNode(const StackNode &) = delete;		StackNode(const StackNode &) = delete;
StackNode &operator=(const StackNode &) = delete;		StackNode &operator=(const StackNode &) = delete;

// Accessors.		// Accessors.
unsigned currentGeneration() { return CurrentGeneration; }		GenerationPair currentGeneration() { return CurrentGeneration; }
unsigned childGeneration() { return ChildGeneration; }		GenerationPair childGeneration() { return ChildGeneration; }
void childGeneration(unsigned generation) { ChildGeneration = generation; }		void childGeneration(GenerationPair generation) {
		Lint: Pre-merge checks Inline Actions clang-tidy: warning: invalid case style for parameter 'generation' [readability-identifier-naming] not useful Lint: Pre-merge checks: clang-tidy: warning: invalid case style for parameter 'generation' [readability-identifier…
		ChildGeneration = generation;
		}
DomTreeNode *node() { return Node; }		DomTreeNode *node() { return Node; }
DomTreeNode::const_iterator childIter() { return ChildIter; }		DomTreeNode::const_iterator childIter() { return ChildIter; }

DomTreeNode *nextChild() {		DomTreeNode *nextChild() {
DomTreeNode child = ChildIter;		DomTreeNode child = ChildIter;
++ChildIter;		++ChildIter;
return child;		return child;
}		}

DomTreeNode::const_iterator end() { return EndIter; }		DomTreeNode::const_iterator end() { return EndIter; }
bool isProcessed() { return Processed; }		bool isProcessed() { return Processed; }
void process() { Processed = true; }		void process() { Processed = true; }

private:		private:
unsigned CurrentGeneration;		GenerationPair CurrentGeneration;
unsigned ChildGeneration;		GenerationPair ChildGeneration;
DomTreeNode *Node;		DomTreeNode *Node;
DomTreeNode::const_iterator ChildIter;		DomTreeNode::const_iterator ChildIter;
DomTreeNode::const_iterator EndIter;		DomTreeNode::const_iterator EndIter;
NodeScope Scopes;		NodeScope Scopes;
bool Processed = false;		bool Processed = false;
};		};

/// Wrapper class to handle memory instructions, including loads,		/// Wrapper class to handle memory instructions, including loads,
▲ Show 20 Lines • Show All 139 Lines • ▼ Show 20 Lines	private:
}		}

bool processNode(DomTreeNode *Node);		bool processNode(DomTreeNode *Node);

bool handleBranchCondition(Instruction CondInst, const BranchInst BI,		bool handleBranchCondition(Instruction CondInst, const BranchInst BI,
const BasicBlock BB, const BasicBlock Pred);		const BasicBlock BB, const BasicBlock Pred);

Value *getMatchingValue(LoadValue &InVal, ParseMemoryInst &MemInst,		Value *getMatchingValue(LoadValue &InVal, ParseMemoryInst &MemInst,
unsigned CurrentGeneration);		unsigned CurrentMemoryGeneration);

bool overridingStores(const ParseMemoryInst &Earlier,		bool overridingStores(const ParseMemoryInst &Earlier,
const ParseMemoryInst &Later);		const ParseMemoryInst &Later);

Value getOrCreateResult(Value Inst, Type *ExpectedType) const {		Value getOrCreateResult(Value Inst, Type *ExpectedType) const {
if (auto *LI = dyn_cast<LoadInst>(Inst))		if (auto *LI = dyn_cast<LoadInst>(Inst))
return LI;		return LI;
if (auto *SI = dyn_cast<StoreInst>(Inst))		if (auto *SI = dyn_cast<StoreInst>(Inst))
return SI->getValueOperand();		return SI->getValueOperand();
assert(isa<IntrinsicInst>(Inst) && "Instruction not supported");		assert(isa<IntrinsicInst>(Inst) && "Instruction not supported");
auto *II = cast<IntrinsicInst>(Inst);		auto *II = cast<IntrinsicInst>(Inst);
if (isHandledNonTargetIntrinsic(II->getIntrinsicID()))		if (isHandledNonTargetIntrinsic(II->getIntrinsicID()))
return getOrCreateResultNonTargetMemIntrinsic(II, ExpectedType);		return getOrCreateResultNonTargetMemIntrinsic(II, ExpectedType);
return TTI.getOrCreateResultFromMemIntrinsic(II, ExpectedType);		return TTI.getOrCreateResultFromMemIntrinsic(II, ExpectedType);
}		}

Value getOrCreateResultNonTargetMemIntrinsic(IntrinsicInst II,		static Value getOrCreateResultNonTargetMemIntrinsic(IntrinsicInst II,
Type *ExpectedType) const {		Type *ExpectedType) {
switch (II->getIntrinsicID()) {		switch (II->getIntrinsicID()) {
case Intrinsic::masked_load:		case Intrinsic::masked_load:
return II;		return II;
case Intrinsic::masked_store:		case Intrinsic::masked_store:
return II->getOperand(0);		return II->getOperand(0);
}		}
return nullptr;		return nullptr;
}		}
▲ Show 20 Lines • Show All 215 Lines • ▼ Show 20 Lines	bool EarlyCSE::handleBranchCondition(Instruction *CondInst,

bool MadeChanges = false;		bool MadeChanges = false;
SmallVector<Instruction *, 4> WorkList;		SmallVector<Instruction *, 4> WorkList;
SmallPtrSet<Instruction *, 4> Visited;		SmallPtrSet<Instruction *, 4> Visited;
WorkList.push_back(CondInst);		WorkList.push_back(CondInst);
while (!WorkList.empty()) {		while (!WorkList.empty()) {
Instruction *Curr = WorkList.pop_back_val();		Instruction *Curr = WorkList.pop_back_val();

AvailableValues.insert(Curr, TorF);		AvailableValues.insert(Curr, {TorF, CurrentGeneration.ValueGeneration});
LLVM_DEBUG(dbgs() << "EarlyCSE CVP: Add conditional value for '"		LLVM_DEBUG(dbgs() << "EarlyCSE CVP: Add conditional value for '"
<< Curr->getName() << "' as " << *TorF << " in "		<< Curr->getName() << "' as " << *TorF << " in "
<< BB->getName() << "\n");		<< BB->getName() << "\n");
if (!DebugCounter::shouldExecute(CSECounter)) {		if (!DebugCounter::shouldExecute(CSECounter)) {
LLVM_DEBUG(dbgs() << "Skipping due to debug counter\n");		LLVM_DEBUG(dbgs() << "Skipping due to debug counter\n");
} else {		} else {
// Replace all dominated uses with the known value.		// Replace all dominated uses with the known value.
if (unsigned Count = replaceDominatedUsesWith(Curr, TorF, DT,		if (unsigned Count = replaceDominatedUsesWith(Curr, TorF, DT,
Show All 9 Lines	if (MatchBinOp(Curr, PropagateOpcode))
if (SimpleValue::canHandle(OPI) && Visited.insert(OPI).second)		if (SimpleValue::canHandle(OPI) && Visited.insert(OPI).second)
WorkList.push_back(OPI);		WorkList.push_back(OPI);
}		}

return MadeChanges;		return MadeChanges;
}		}

Value *EarlyCSE::getMatchingValue(LoadValue &InVal, ParseMemoryInst &MemInst,		Value *EarlyCSE::getMatchingValue(LoadValue &InVal, ParseMemoryInst &MemInst,
unsigned CurrentGeneration) {		unsigned CurrentMemoryGeneration) {
if (InVal.DefInst == nullptr)		if (InVal.DefInst == nullptr)
return nullptr;		return nullptr;
if (InVal.MatchingId != MemInst.getMatchingId())		if (InVal.MatchingId != MemInst.getMatchingId())
return nullptr;		return nullptr;
// We don't yet handle removing loads with ordering of any kind.		// We don't yet handle removing loads with ordering of any kind.
if (MemInst.isVolatile() \|\| !MemInst.isUnordered())		if (MemInst.isVolatile() \|\| !MemInst.isUnordered())
return nullptr;		return nullptr;
// We can't replace an atomic load with one which isn't also atomic.		// We can't replace an atomic load with one which isn't also atomic.
Show All 22 Lines	if (OtherNTI != MatchingNTI)
return nullptr;		return nullptr;
if (OtherNTI && MatchingNTI) {		if (OtherNTI && MatchingNTI) {
if (!isNonTargetIntrinsicMatch(cast<IntrinsicInst>(InVal.DefInst),		if (!isNonTargetIntrinsicMatch(cast<IntrinsicInst>(InVal.DefInst),
cast<IntrinsicInst>(MemInst.get())))		cast<IntrinsicInst>(MemInst.get())))
return nullptr;		return nullptr;
}		}

if (!isOperatingOnInvariantMemAt(MemInst.get(), InVal.Generation) &&		if (!isOperatingOnInvariantMemAt(MemInst.get(), InVal.Generation) &&
!isSameMemGeneration(InVal.Generation, CurrentGeneration, InVal.DefInst,		!isSameMemGeneration(InVal.Generation, CurrentMemoryGeneration,
MemInst.get()))		InVal.DefInst, MemInst.get()))
return nullptr;		return nullptr;

if (!Result)		if (!Result)
Result = getOrCreateResult(Matching, Other->getType());		Result = getOrCreateResult(Matching, Other->getType());
return Result;		return Result;
}		}

bool EarlyCSE::overridingStores(const ParseMemoryInst &Earlier,		bool EarlyCSE::overridingStores(const ParseMemoryInst &Earlier,
Show All 33 Lines	bool EarlyCSE::processNode(DomTreeNode *Node) {

// If this block has a single predecessor, then the predecessor is the parent		// If this block has a single predecessor, then the predecessor is the parent
// of the domtree node and all of the live out memory values are still current		// of the domtree node and all of the live out memory values are still current
// in this block. If this block has multiple predecessors, then they could		// in this block. If this block has multiple predecessors, then they could
// have invalidated the live-out memory values of our parent value. For now,		// have invalidated the live-out memory values of our parent value. For now,
// just be conservative and invalidate memory if this block has multiple		// just be conservative and invalidate memory if this block has multiple
// predecessors.		// predecessors.
if (!BB->getSinglePredecessor())		if (!BB->getSinglePredecessor())
++CurrentGeneration;		++CurrentGeneration.MemoryGeneration;

// If this node has a single predecessor which ends in a conditional branch,		// If this node has a single predecessor which ends in a conditional branch,
// we can infer the value of the branch condition given that we took this		// we can infer the value of the branch condition given that we took this
// path. We need the single predecessor to ensure there's not another path		// path. We need the single predecessor to ensure there's not another path
// which reaches this block where the condition might hold a different		// which reaches this block where the condition might hold a different
// value. Since we're adding this to the scoped hash table (like any other		// value. Since we're adding this to the scoped hash table (like any other
// def), it will have been popped if we encounter a future merge block.		// def), it will have been popped if we encounter a future merge block.
if (BasicBlock *Pred = BB->getSinglePredecessor()) {		if (BasicBlock *Pred = BB->getSinglePredecessor()) {
Show All 36 Lines	for (Instruction &Inst : make_early_inc_range(BB->getInstList())) {
// and this pass will not bother with its removal. However, we should mark		// and this pass will not bother with its removal. However, we should mark
// its condition as true for all dominated blocks.		// its condition as true for all dominated blocks.
if (match(&Inst, m_Intrinsic<Intrinsic::assume>())) {		if (match(&Inst, m_Intrinsic<Intrinsic::assume>())) {
auto *CondI =		auto *CondI =
dyn_cast<Instruction>(cast<CallInst>(Inst).getArgOperand(0));		dyn_cast<Instruction>(cast<CallInst>(Inst).getArgOperand(0));
if (CondI && SimpleValue::canHandle(CondI)) {		if (CondI && SimpleValue::canHandle(CondI)) {
LLVM_DEBUG(dbgs() << "EarlyCSE considering assumption: " << Inst		LLVM_DEBUG(dbgs() << "EarlyCSE considering assumption: " << Inst
<< '\n');		<< '\n');
AvailableValues.insert(CondI, ConstantInt::getTrue(BB->getContext()));		AvailableValues.insert(CondI, {ConstantInt::getTrue(BB->getContext()),
		CurrentGeneration.ValueGeneration});
} else		} else
LLVM_DEBUG(dbgs() << "EarlyCSE skipping assumption: " << Inst << '\n');		LLVM_DEBUG(dbgs() << "EarlyCSE skipping assumption: " << Inst << '\n');
continue;		continue;
}		}

// Skip sideeffect intrinsics, for the same reason as assume intrinsics.		// Skip sideeffect intrinsics, for the same reason as assume intrinsics.
if (match(&Inst, m_Intrinsic<Intrinsic::sideeffect>())) {		if (match(&Inst, m_Intrinsic<Intrinsic::sideeffect>())) {
LLVM_DEBUG(dbgs() << "EarlyCSE skipping sideeffect: " << Inst << '\n');		LLVM_DEBUG(dbgs() << "EarlyCSE skipping sideeffect: " << Inst << '\n');
continue;		continue;
}		}

		if (match(&Inst, m_Intrinsic<Intrinsic::coro_suspend>())) {
		++CurrentGeneration.ValueGeneration;
		}

// We can skip all invariant.start intrinsics since they only read memory,		// We can skip all invariant.start intrinsics since they only read memory,
// and we can forward values across it. For invariant starts without		// and we can forward values across it. For invariant starts without
// invariant ends, we can use the fact that the invariantness never ends to		// invariant ends, we can use the fact that the invariantness never ends to
// start a scope in the current generaton which is true for all future		// start a scope in the current generaton which is true for all future
// generations. Also, we dont need to consume the last store since the		// generations. Also, we dont need to consume the last store since the
// semantics of invariant.start allow us to perform DSE of the last		// semantics of invariant.start allow us to perform DSE of the last
// store, if there was a store following invariant.start. Consider:		// store, if there was a store following invariant.start. Consider:
//		//
// store 30, i8* p		// store 30, i8* p
// invariant.start(p)		// invariant.start(p)
// store 40, i8* p		// store 40, i8* p
// We can DSE the store to 30, since the store 40 to invariant location p		// We can DSE the store to 30, since the store 40 to invariant location p
// causes undefined behaviour.		// causes undefined behaviour.
if (match(&Inst, m_Intrinsic<Intrinsic::invariant_start>())) {		if (match(&Inst, m_Intrinsic<Intrinsic::invariant_start>())) {
// If there are any uses, the scope might end.		// If there are any uses, the scope might end.
if (!Inst.use_empty())		if (!Inst.use_empty())
continue;		continue;
MemoryLocation MemLoc =		MemoryLocation MemLoc =
MemoryLocation::getForArgument(&cast<CallInst>(Inst), 1, TLI);		MemoryLocation::getForArgument(&cast<CallInst>(Inst), 1, TLI);
// Don't start a scope if we already have a better one pushed		// Don't start a scope if we already have a better one pushed
if (!AvailableInvariants.count(MemLoc))		if (!AvailableInvariants.count(MemLoc))
AvailableInvariants.insert(MemLoc, CurrentGeneration);		AvailableInvariants.insert(MemLoc, CurrentGeneration.MemoryGeneration);
continue;		continue;
}		}

if (isGuard(&Inst)) {		if (isGuard(&Inst)) {
if (auto *CondI =		if (auto *CondI =
dyn_cast<Instruction>(cast<CallInst>(Inst).getArgOperand(0))) {		dyn_cast<Instruction>(cast<CallInst>(Inst).getArgOperand(0))) {
if (SimpleValue::canHandle(CondI)) {		if (SimpleValue::canHandle(CondI)) {
// Do we already know the actual value of this condition?		// Do we already know the actual value of this condition?
if (auto *KnownCond = AvailableValues.lookup(CondI)) {		auto P = AvailableValues.lookup(CondI);
		Value *KnownCond = P.V;
		if (KnownCond && P.Generation == CurrentGeneration.ValueGeneration) {
// Is the condition known to be true?		// Is the condition known to be true?
if (isa<ConstantInt>(KnownCond) &&		if (isa<ConstantInt>(P.V) &&
cast<ConstantInt>(KnownCond)->isOne()) {		cast<ConstantInt>(KnownCond)->isOne()) {
LLVM_DEBUG(dbgs()		LLVM_DEBUG(dbgs()
<< "EarlyCSE removing guard: " << Inst << '\n');		<< "EarlyCSE removing guard: " << Inst << '\n');
salvageKnowledge(&Inst, &AC);		salvageKnowledge(&Inst, &AC);
removeMSSA(Inst);		removeMSSA(Inst);
Inst.eraseFromParent();		Inst.eraseFromParent();
Changed = true;		Changed = true;
continue;		continue;
} else		} else
// Use the known value if it wasn't true.		// Use the known value if it wasn't true.
cast<CallInst>(Inst).setArgOperand(0, KnownCond);		cast<CallInst>(Inst).setArgOperand(0, KnownCond);
}		}
// The condition we're on guarding here is true for all dominated		// The condition we're on guarding here is true for all dominated
// locations.		// locations.
AvailableValues.insert(CondI, ConstantInt::getTrue(BB->getContext()));		AvailableValues.insert(CondI, {ConstantInt::getTrue(BB->getContext()),
		CurrentGeneration.ValueGeneration});
}		}
}		}

// Guard intrinsics read all memory, but don't write any memory.		// Guard intrinsics read all memory, but don't write any memory.
// Accordingly, don't update the generation but consume the last store (to		// Accordingly, don't update the generation but consume the last store (to
// avoid an incorrect DSE).		// avoid an incorrect DSE).
LastStore = nullptr;		LastStore = nullptr;
continue;		continue;
Show All 24 Lines	if (Value *V = SimplifyInstruction(&Inst, SQ)) {
if (Killed)		if (Killed)
continue;		continue;
}		}
}		}

// If this is a simple instruction that we can value number, process it.		// If this is a simple instruction that we can value number, process it.
if (SimpleValue::canHandle(&Inst)) {		if (SimpleValue::canHandle(&Inst)) {
// See if the instruction has an available value. If so, use it.		// See if the instruction has an available value. If so, use it.
if (Value *V = AvailableValues.lookup(&Inst)) {		auto P = AvailableValues.lookup(&Inst);
		Value *V = P.V;
		if (V && P.Generation == CurrentGeneration.ValueGeneration) {
LLVM_DEBUG(dbgs() << "EarlyCSE CSE: " << Inst << " to: " << *V		LLVM_DEBUG(dbgs() << "EarlyCSE CSE: " << Inst << " to: " << *V
<< '\n');		<< '\n');
if (!DebugCounter::shouldExecute(CSECounter)) {		if (!DebugCounter::shouldExecute(CSECounter)) {
LLVM_DEBUG(dbgs() << "Skipping due to debug counter\n");		LLVM_DEBUG(dbgs() << "Skipping due to debug counter\n");
continue;		continue;
}		}
if (auto *I = dyn_cast<Instruction>(V))		if (auto *I = dyn_cast<Instruction>(V))
I->andIRFlags(&Inst);		I->andIRFlags(&Inst);
Inst.replaceAllUsesWith(V);		Inst.replaceAllUsesWith(V);
salvageKnowledge(&Inst, &AC);		salvageKnowledge(&Inst, &AC);
removeMSSA(Inst);		removeMSSA(Inst);
Inst.eraseFromParent();		Inst.eraseFromParent();
Changed = true;		Changed = true;
++NumCSE;		++NumCSE;
continue;		continue;
}		}

// Otherwise, just remember that this value is available.		// Otherwise, just remember that this value is available.
AvailableValues.insert(&Inst, &Inst);		AvailableValues.insert(&Inst, {&Inst, CurrentGeneration.ValueGeneration});
continue;		continue;
}		}

ParseMemoryInst MemInst(&Inst, TTI);		ParseMemoryInst MemInst(&Inst, TTI);
// If this is a non-volatile load, process it.		// If this is a non-volatile load, process it.
if (MemInst.isValid() && MemInst.isLoad()) {		if (MemInst.isValid() && MemInst.isLoad()) {
// (conservatively) we can't peak past the ordering implied by this		// (conservatively) we can't peak past the ordering implied by this
// operation, but we can add this load to our set of available values		// operation, but we can add this load to our set of available values
if (MemInst.isVolatile() \|\| !MemInst.isUnordered()) {		if (MemInst.isVolatile() \|\| !MemInst.isUnordered()) {
LastStore = nullptr;		LastStore = nullptr;
++CurrentGeneration;		++CurrentGeneration.MemoryGeneration;
}		}

if (MemInst.isInvariantLoad()) {		if (MemInst.isInvariantLoad()) {
// If we pass an invariant load, we know that memory location is		// If we pass an invariant load, we know that memory location is
// indefinitely constant from the moment of first dereferenceability.		// indefinitely constant from the moment of first dereferenceability.
// We conservatively treat the invariant_load as that moment. If we		// We conservatively treat the invariant_load as that moment. If we
// pass a invariant load after already establishing a scope, don't		// pass a invariant load after already establishing a scope, don't
// restart it since we want to preserve the earliest point seen.		// restart it since we want to preserve the earliest point seen.
auto MemLoc = MemoryLocation::get(&Inst);		auto MemLoc = MemoryLocation::get(&Inst);
if (!AvailableInvariants.count(MemLoc))		if (!AvailableInvariants.count(MemLoc))
AvailableInvariants.insert(MemLoc, CurrentGeneration);		AvailableInvariants.insert(MemLoc,
		CurrentGeneration.MemoryGeneration);
}		}

// If we have an available version of this load, and if it is the right		// If we have an available version of this load, and if it is the right
// generation or the load is known to be from an invariant location,		// generation or the load is known to be from an invariant location,
// replace this instruction.		// replace this instruction.
//		//
// If either the dominating load or the current load are invariant, then		// If either the dominating load or the current load are invariant, then
// we can assume the current load loads the same value as the dominating		// we can assume the current load loads the same value as the dominating
// load.		// load.
LoadValue InVal = AvailableLoads.lookup(MemInst.getPointerOperand());		LoadValue InVal = AvailableLoads.lookup(MemInst.getPointerOperand());
if (Value *Op = getMatchingValue(InVal, MemInst, CurrentGeneration)) {		if (Value *Op = getMatchingValue(InVal, MemInst,
		CurrentGeneration.MemoryGeneration)) {
LLVM_DEBUG(dbgs() << "EarlyCSE CSE LOAD: " << Inst		LLVM_DEBUG(dbgs() << "EarlyCSE CSE LOAD: " << Inst
<< " to: " << *InVal.DefInst << '\n');		<< " to: " << *InVal.DefInst << '\n');
if (!DebugCounter::shouldExecute(CSECounter)) {		if (!DebugCounter::shouldExecute(CSECounter)) {
LLVM_DEBUG(dbgs() << "Skipping due to debug counter\n");		LLVM_DEBUG(dbgs() << "Skipping due to debug counter\n");
continue;		continue;
}		}
if (!Inst.use_empty())		if (!Inst.use_empty())
Inst.replaceAllUsesWith(Op);		Inst.replaceAllUsesWith(Op);
salvageKnowledge(&Inst, &AC);		salvageKnowledge(&Inst, &AC);
removeMSSA(Inst);		removeMSSA(Inst);
Inst.eraseFromParent();		Inst.eraseFromParent();
Changed = true;		Changed = true;
++NumCSELoad;		++NumCSELoad;
continue;		continue;
}		}

// Otherwise, remember that we have this instruction.		// Otherwise, remember that we have this instruction.
AvailableLoads.insert(MemInst.getPointerOperand(),		AvailableLoads.insert(MemInst.getPointerOperand(),
LoadValue(&Inst, CurrentGeneration,		LoadValue(&Inst, CurrentGeneration.MemoryGeneration,
MemInst.getMatchingId(),		MemInst.getMatchingId(),
MemInst.isAtomic()));		MemInst.isAtomic()));
LastStore = nullptr;		LastStore = nullptr;
continue;		continue;
}		}

// If this instruction may read from memory or throw (and potentially read		// If this instruction may read from memory or throw (and potentially read
// from memory in the exception handler), forget LastStore. Load/store		// from memory in the exception handler), forget LastStore. Load/store
// intrinsics will indicate both a read and a write to memory. The target		// intrinsics will indicate both a read and a write to memory. The target
// may override this (e.g. so that a store intrinsic does not read from		// may override this (e.g. so that a store intrinsic does not read from
// memory, and thus will be treated the same as a regular store for		// memory, and thus will be treated the same as a regular store for
// commoning purposes).		// commoning purposes).
if ((Inst.mayReadFromMemory() \|\| Inst.mayThrow()) &&		if ((Inst.mayReadFromMemory() \|\| Inst.mayThrow()) &&
!(MemInst.isValid() && !MemInst.mayReadFromMemory()))		!(MemInst.isValid() && !MemInst.mayReadFromMemory()))
LastStore = nullptr;		LastStore = nullptr;

// If this is a read-only call, process it.		// If this is a read-only call, process it.
if (CallValue::canHandle(&Inst)) {		if (CallValue::canHandle(&Inst)) {
// If we have an available version of this call, and if it is the right		// If we have an available version of this call, and if it is the right
// generation, replace this instruction.		// generation, replace this instruction.
std::pair<Instruction *, unsigned> InVal = AvailableCalls.lookup(&Inst);		std::pair<Instruction *, unsigned> InVal = AvailableCalls.lookup(&Inst);
if (InVal.first != nullptr &&		if (InVal.first != nullptr &&
isSameMemGeneration(InVal.second, CurrentGeneration, InVal.first,		isSameMemGeneration(InVal.second, CurrentGeneration.MemoryGeneration,
&Inst)) {		InVal.first, &Inst)) {
LLVM_DEBUG(dbgs() << "EarlyCSE CSE CALL: " << Inst		LLVM_DEBUG(dbgs() << "EarlyCSE CSE CALL: " << Inst
<< " to: " << *InVal.first << '\n');		<< " to: " << *InVal.first << '\n');
if (!DebugCounter::shouldExecute(CSECounter)) {		if (!DebugCounter::shouldExecute(CSECounter)) {
LLVM_DEBUG(dbgs() << "Skipping due to debug counter\n");		LLVM_DEBUG(dbgs() << "Skipping due to debug counter\n");
continue;		continue;
}		}
if (!Inst.use_empty())		if (!Inst.use_empty())
Inst.replaceAllUsesWith(InVal.first);		Inst.replaceAllUsesWith(InVal.first);
salvageKnowledge(&Inst, &AC);		salvageKnowledge(&Inst, &AC);
removeMSSA(Inst);		removeMSSA(Inst);
Inst.eraseFromParent();		Inst.eraseFromParent();
Changed = true;		Changed = true;
++NumCSECall;		++NumCSECall;
continue;		continue;
}		}

// Otherwise, remember that we have this instruction.		// Otherwise, remember that we have this instruction.
AvailableCalls.insert(&Inst, std::make_pair(&Inst, CurrentGeneration));		AvailableCalls.insert(
		&Inst, std::make_pair(&Inst, CurrentGeneration.MemoryGeneration));
continue;		continue;
}		}

// A release fence requires that all stores complete before it, but does		// A release fence requires that all stores complete before it, but does
// not prevent the reordering of following loads 'before' the fence. As a		// not prevent the reordering of following loads 'before' the fence. As a
// result, we don't need to consider it as writing to memory and don't need		// result, we don't need to consider it as writing to memory and don't need
// to advance the generation. We do need to prevent DSE across the fence,		// to advance the generation. We do need to prevent DSE across the fence,
// but that's handled above.		// but that's handled above.
if (auto *FI = dyn_cast<FenceInst>(&Inst))		if (auto *FI = dyn_cast<FenceInst>(&Inst))
if (FI->getOrdering() == AtomicOrdering::Release) {		if (FI->getOrdering() == AtomicOrdering::Release) {
assert(Inst.mayReadFromMemory() && "relied on to prevent DSE above");		assert(Inst.mayReadFromMemory() && "relied on to prevent DSE above");
continue;		continue;
}		}

// write back DSE - If we write back the same value we just loaded from		// write back DSE - If we write back the same value we just loaded from
// the same location and haven't passed any intervening writes or ordering		// the same location and haven't passed any intervening writes or ordering
// operations, we can remove the write. The primary benefit is in allowing		// operations, we can remove the write. The primary benefit is in allowing
// the available load table to remain valid and value forward past where		// the available load table to remain valid and value forward past where
// the store originally was.		// the store originally was.
if (MemInst.isValid() && MemInst.isStore()) {		if (MemInst.isValid() && MemInst.isStore()) {
LoadValue InVal = AvailableLoads.lookup(MemInst.getPointerOperand());		LoadValue InVal = AvailableLoads.lookup(MemInst.getPointerOperand());
if (InVal.DefInst &&		if (InVal.DefInst &&
InVal.DefInst == getMatchingValue(InVal, MemInst, CurrentGeneration)) {		InVal.DefInst ==
		getMatchingValue(InVal, MemInst,
		CurrentGeneration.MemoryGeneration)) {
// It is okay to have a LastStore to a different pointer here if MemorySSA		// It is okay to have a LastStore to a different pointer here if MemorySSA
// tells us that the load and store are from the same memory generation.		// tells us that the load and store are from the same memory generation.
// In that case, LastStore should keep its present value since we're		// In that case, LastStore should keep its present value since we're
// removing the current store.		// removing the current store.
assert((!LastStore \|\|		assert((!LastStore \|\|
ParseMemoryInst(LastStore, TTI).getPointerOperand() ==		ParseMemoryInst(LastStore, TTI).getPointerOperand() ==
MemInst.getPointerOperand() \|\|		MemInst.getPointerOperand() \|\|
MSSA) &&		MSSA) &&
Show All 13 Lines	if (MemInst.isValid() && MemInst.isStore()) {
continue;		continue;
}		}
}		}

// Okay, this isn't something we can CSE at all. Check to see if it is		// Okay, this isn't something we can CSE at all. Check to see if it is
// something that could modify memory. If so, our available memory values		// something that could modify memory. If so, our available memory values
// cannot be used so bump the generation count.		// cannot be used so bump the generation count.
if (Inst.mayWriteToMemory()) {		if (Inst.mayWriteToMemory()) {
++CurrentGeneration;		++CurrentGeneration.MemoryGeneration;

if (MemInst.isValid() && MemInst.isStore()) {		if (MemInst.isValid() && MemInst.isStore()) {
// We do a trivial form of DSE if there are two stores to the same		// We do a trivial form of DSE if there are two stores to the same
// location with no intervening loads. Delete the earlier store.		// location with no intervening loads. Delete the earlier store.
if (LastStore) {		if (LastStore) {
if (overridingStores(ParseMemoryInst(LastStore, TTI), MemInst)) {		if (overridingStores(ParseMemoryInst(LastStore, TTI), MemInst)) {
LLVM_DEBUG(dbgs() << "EarlyCSE DEAD STORE: " << *LastStore		LLVM_DEBUG(dbgs() << "EarlyCSE DEAD STORE: " << *LastStore
<< " due to: " << Inst << '\n');		<< " due to: " << Inst << '\n');
Show All 11 Lines	if (Inst.mayWriteToMemory()) {
// fallthrough - we can exploit information about this store		// fallthrough - we can exploit information about this store
}		}

// Okay, we just invalidated anything we knew about loaded values. Try		// Okay, we just invalidated anything we knew about loaded values. Try
// to salvage something by remembering that the stored value is a live		// to salvage something by remembering that the stored value is a live
// version of the pointer. It is safe to forward from volatile stores		// version of the pointer. It is safe to forward from volatile stores
// to non-volatile loads, so we don't have to check for volatility of		// to non-volatile loads, so we don't have to check for volatility of
// the store.		// the store.
AvailableLoads.insert(MemInst.getPointerOperand(),		AvailableLoads.insert(
LoadValue(&Inst, CurrentGeneration,		MemInst.getPointerOperand(),
MemInst.getMatchingId(),		LoadValue(&Inst, CurrentGeneration.MemoryGeneration,
MemInst.isAtomic()));		MemInst.getMatchingId(), MemInst.isAtomic()));

// Remember that this was the last unordered store we saw for DSE. We		// Remember that this was the last unordered store we saw for DSE. We
// don't yet handle DSE on ordered or volatile stores since we don't		// don't yet handle DSE on ordered or volatile stores since we don't
// have a good way to model the ordering requirement for following		// have a good way to model the ordering requirement for following
// passes once the store is removed. We could insert a fence, but		// passes once the store is removed. We could insert a fence, but
// since fences are slightly stronger than stores in their ordering,		// since fences are slightly stronger than stores in their ordering,
// it's not clear this is a profitable transform. Another option would		// it's not clear this is a profitable transform. Another option would
// be to merge the ordering with that of the post dominating store.		// be to merge the ordering with that of the post dominating store.
Show All 19 Lines	bool EarlyCSE::run() {
bool Changed = false;		bool Changed = false;

// Process the root node.		// Process the root node.
nodesToProcess.push_back(new StackNode(		nodesToProcess.push_back(new StackNode(
AvailableValues, AvailableLoads, AvailableInvariants, AvailableCalls,		AvailableValues, AvailableLoads, AvailableInvariants, AvailableCalls,
CurrentGeneration, DT.getRootNode(),		CurrentGeneration, DT.getRootNode(),
DT.getRootNode()->begin(), DT.getRootNode()->end()));		DT.getRootNode()->begin(), DT.getRootNode()->end()));

assert(!CurrentGeneration && "Create a new EarlyCSE instance to rerun it.");		assert(!CurrentGeneration.MemoryGeneration &&
		"Create a new EarlyCSE instance to rerun it.");

// Process the stack.		// Process the stack.
while (!nodesToProcess.empty()) {		while (!nodesToProcess.empty()) {
// Grab the first item off the stack. Set the current generation, remove		// Grab the first item off the stack. Set the current generation, remove
// the node from the stack, and process it.		// the node from the stack, and process it.
StackNode *NodeToProcess = nodesToProcess.back();		StackNode *NodeToProcess = nodesToProcess.back();

// Initialize class members.		// Initialize class members.
▲ Show 20 Lines • Show All 139 Lines • Show Last 20 Lines

llvm/lib/Transforms/Scalar/GVN.cpp

Show First 20 Lines • Show All 2,129 Lines • ▼ Show 20 Lines	if (Value *V = SimplifyInstruction(I, {DL, TLI, DT, AC})) {
if (Changed) {		if (Changed) {
if (MD && V->getType()->isPtrOrPtrVectorTy())		if (MD && V->getType()->isPtrOrPtrVectorTy())
MD->invalidateCachedPointerInfo(V);		MD->invalidateCachedPointerInfo(V);
++NumGVNSimpl;		++NumGVNSimpl;
return true;		return true;
}		}
}		}

if (IntrinsicInst *IntrinsicI = dyn_cast<IntrinsicInst>(I))		if (IntrinsicInst *IntrinsicI = dyn_cast<IntrinsicInst>(I)) {
if (IntrinsicI->getIntrinsicID() == Intrinsic::assume)		if (IntrinsicI->getIntrinsicID() == Intrinsic::assume)
return processAssumeIntrinsic(IntrinsicI);		return processAssumeIntrinsic(IntrinsicI);
		if (IntrinsicI->getIntrinsicID() == Intrinsic::coro_suspend) {
		rjmccallUnsubmitted Not Done Reply Inline Actions Different coroutine lowerings use different suspend instructions. Can you write this (and similar conditions elsewhere in the patch) so that it'll apply in any lowering? Maybe add a `canSuspendCoroutine` method to `IntrinsicInst`. rjmccall: Different coroutine lowerings use different suspend instructions. Can you write this (and…
		// Prevent value reusing across coroutine suspensions. Values
		// may change since they could run on different threads.
		VN.clear();
		LeaderTable.clear();
		return false;
		}
		}

if (LoadInst *LI = dyn_cast<LoadInst>(I)) {		if (LoadInst *LI = dyn_cast<LoadInst>(I)) {
if (processLoad(LI))		if (processLoad(LI))
return true;		return true;

unsigned Num = VN.lookupOrAdd(LI);		unsigned Num = VN.lookupOrAdd(LI);
addToLeaderTable(Num, LI, LI->getParent());		addToLeaderTable(Num, LI, LI->getParent());
return false;		return false;
▲ Show 20 Lines • Show All 749 Lines • Show Last 20 Lines

llvm/test/Transforms/Coroutines/coro-no-value-reuse-across-suspend.ll

This file was added.

				; check that no optimization pass would ever reuse values across
				; suspension points in coroutines.
				; RUN: opt < %s -O3 -S \| FileCheck %s

				define i8* @foo(i64 %arg) {
				entry:
				%id = call token @llvm.coro.id(i32 0, i8* null, i8* null, i8* null)
				%size = call i32 @llvm.coro.size.i32()
				%alloc = call i8* @myAlloc(i32 %size)
				%hdl = call i8* @llvm.coro.begin(token %id, i8* %alloc)
				%x1 = call i64 @pthread_self()
				call void @print(i64 %x1)
				%y1 = mul i64 %arg, 2
				call void @print(i64 %y1)

				%0 = call i8 @llvm.coro.suspend(token none, i1 false)
				switch i8 %0, label %suspend [i8 0, label %resume
				i8 1, label %cleanup]
				resume:
				%x2 = call i64 @pthread_self()
				call void @print(i64 %x2)
				%y2 = mul i64 %arg, 2
				call void @print(i64 %y2)

				br label %cleanup

				cleanup:
				%mem = call i8* @llvm.coro.free(token %id, i8* %hdl)
				call void @free(i8* %mem)
				br label %suspend

				suspend:
				call i1 @llvm.coro.end(i8* %hdl, i1 0)
				ret i8* %hdl
				}

				; CHECK-LABLE: define i8* @foo()
				; CHECK: entry:
				; CHECK: %x1 = call i64 @pthread_self()
				; CHECK: %y1 = mul i64 %arg, 2
				; CHECK: resume:
				; CHECK: %x2 = call i64 @pthread_self()
				; CHECK: %y2 = mul i64 %arg, 2

				declare i8* @llvm.coro.free(token, i8*)
				declare i32 @llvm.coro.size.i32()
				declare i8 @llvm.coro.suspend(token, i1)
				declare void @llvm.coro.resume(i8*)
				declare void @llvm.coro.destroy(i8*)

				declare token @llvm.coro.id(i32, i8, i8, i8*)
				declare i1 @llvm.coro.alloc(token)
				declare i8* @llvm.coro.begin(token, i8*)
				declare i1 @llvm.coro.end(i8*, i1)

				declare noalias i8* @myAlloc(i32)
				declare void @free(i8*)
				declare void @print(i64)

				declare dso_local i64 @pthread_self() #1

				attributes #1 = { nounwind readnone }