This is an archive of the discontinued LLVM Phabricator instance.

Paths

Table of Contentst

-
include/llvm/Analysis/
-
llvm/
-
Analysis/
-
MemoryLocation.h
-
lib/
-
Analysis/
1/4
BasicAliasAnalysis.cpp
-
Transforms/Scalar/
-
Scalar/
7
DeadStoreElimination.cpp

Differential D53203

Allow MemoryLocation to carry pre-existing knowledge to AA to elide expensive repeated checks
Needs ReviewPublic

Authored by dsanders on Oct 12 2018, 9:39 AM.

Download Raw Diff

Details

Reviewers

hfinkel
sanjoy

Summary

DeadStoreElimination currently calls getModRefInfo() for each combination of
exit block and alloca (and similarly local allocs). Each one of these calls
is checking whether the given memory location is a non-escaping local. In
the case of one function I have, this is ~5,000 exit blocks * ~13,000 allocas

~65 million calls to getModRefInfo(). As a result it spends ~57s on

DeadStoreElimination. Unfortunately, DeadStoreElimination finds that it's not
able to make any changes at all so getModRefInfo() is returning the same result
5,000 times for each allocation. While none of the calls to getModRefInfo() are
redundant, 99.98% of the checks that the argument is a non-escaping local inside
it (which involves an expensive call to PointerMayBeCaptured()) are redundant
since this portion of getModRefInfo() depends on a property of the Value being
queried and not the particular call site.

DeadStoreElimination knows that all of its queries are about locals
(or equivalent to a local such as a non-escaping heap alloc), it doesn't cause
non-escaping locals to escape since it's only removing dead stores, and it
knows when a change may cause an escaping local to stop escaping. Therefore
it has everything it getModRefInfo() needs to cache the expensive
PointerMayBeCaptured() call and provide it to future calls.

This patch introduces a means to do that by extending MemoryLocation with a
KnownFlags member which can record pre-existing knowledge which can be used by
its clients to elide particularly expensive checks. This patch currently applies
this caching very conservatively within DeadStoreElimination. Any change at all
to the IR flushes the whole cache. This is partly to keep the overhead of the
cache maintenance down and partly to keep it simple. In the worst case, we end
up doing all the PointerMayBeCaptured() with a very small amount of overhead to
maintain the cache.

CTMark isn't significantly affected by this patch. With 10x multisampling, I see
two regressions:

pairlocalalign: 0.10%
sqlite3-link: 0.66%

and several minor improvements, the top 3 are:

7-zip-benchmark-link: -1.37%
bullet-link: -0.92%
kc-link: -0.82%

and overall the geomean has improved slightly (-0.21%). The resulting binaries
are unchanged. The more interesting result is the motivating function mentioned
above. Previously, DeadStoreElimination was taking ~57s on this function and now
takes ~20s (-65%).

Diff Detail

Repository

rL LLVM

Build Status

Buildable 23744
Build 23743: arc lint + arc unit

Event Timeline

dsanders created this revision.Oct 12 2018, 9:39 AM

Harbormaster completed remote builds in B23744: Diff 169434.Oct 12 2018, 9:39 AM

dmgreen added a subscriber: dmgreen.Oct 13 2018, 1:52 PM

Hi Sanjoy,

I don't seem to be having much luck getting reviewers for this and I noticed that you reviewed the last big change in getModRefInfo(). Would you be able to take a look at this patch?

dsanders added a reviewer: hfinkel.Oct 31 2018, 10:11 AM

Hi Hal,

I didn't spot you in CODE_OWNERS.txt at first because I was searching for 'Alias' instead of 'alias' :-). Would you be able to take a look at this patch too?

asbirlea added a subscriber: asbirlea.Oct 31 2018, 12:04 PM

asbirlea added inline comments.

lib/Analysis/BasicAliasAnalysis.cpp
819	Add an assert before this to check: assert((!Loc.isKnownToBeLocalObject() \|\| !Loc.isKnownCouldBeNonLocalObject()) && "Location cannot be both known local and known non-local"); assert((!Loc.isKnownToBeNonEscaping() \|\| !Loc.isKnownMayEscape() && "Location cannot be both known non escaping and known it may escape");
822	Update flags in Loc here if isNonEscapingLocalObject was called? This will make Loc non-const, which is a larger chance in itself and there should be some comments added if we make getModRefInfo have side-effects.
lib/Transforms/Scalar/DeadStoreElimination.cpp
774	Can we avoid this call and leave the Escaping bits both not set? Precisely because PointerMayBeCaptured is expensive. Ideally this should be computed only on demand (in geModRefInfo) and the info passed back when caching.
876	Check if Loc was updated by the ModRef call and update the cache?

Thanks Alina

lib/Analysis/BasicAliasAnalysis.cpp
822	I like the idea of the Loc being updated with things that become known as a result of AA as I'm sure there's plenty more caching that can be done. However, I'm a bit worried about the effects of making the MemoryLocation non-const. My main concern is that callers will find that caching is happening unexpectedly and that passes will therefore cache automatically, modify the IR, but forget to clear the known flags because they weren't aware of the caching. The other concerns are mostly about unintended implicit copies caused by assigning const MemoryLocation to non-const MemoryLocation, and about the size of the patch as dropping the const will likely propagate outwards quite a long way. I think it would probably be better to add an extra argument to make the caching something the callers have to knowingly opt into. I'm thinking something like: ModRefInfo getModRefInfo(ImmutableCallSite CS, const MemoryLocation &Loc, MemoryLocationKnowledge &Knowledge); ModRefInfo getModRefInfo(ImmutableCallSite CS, const MemoryLocation &Loc) { MemoryLocationKnowledge Knowledge; return getModRefInfo(CS, Loc, Knowledge); } That way, callers that aren't aware of it or don't want to preserve it can keep calling: getModRefInfo(CS, Loc) and discarding any cachable knowledge. Those that want full caching (e.g. analysis passes) can do something like: getModRefInfo(CS, Loc, Loc.Knowledge) meanwhile those that need to be more careful can do: MemoryLocationKnowledge Knowledge; getModRefInfo(CS, Loc, Knowledge) ... some transformation ... Knowledge.forgetX(); Knowledge.forgetY(); Loc.Knowledge = Knowledge; Does that sound like a good direction to go?
lib/Transforms/Scalar/DeadStoreElimination.cpp
774	We can, although for DSE at least we won't end up saving any further calls to it as a result. DSE calls getModRef() for every local (or local equivalent) in DeadStackObjects so we'll always call PointerMayBeCaptured() once for each item either way

Add the sanity checks

Harbormaster completed remote builds in B24435: Diff 172065.Oct 31 2018, 5:37 PM

In D53203#1282454, @dsanders wrote:

Hi Hal,

I didn't spot you in CODE_OWNERS.txt at first because I was searching for 'Alias' instead of 'alias' :-). Would you be able to take a look at this patch too?

Thanks. I suppose that I should echo Chris's question about the proposed local-dominance cache with respect to this change too: If we switch this pass to using MemorySSA does this problem go away?

Hal,

I have mixed feelings here because I understand Chris's point, but there are differences in this case which make this patch worth pushing forward IMO.

MemorySSA in the current format will not solve this. It can be (and should be) extended to handle cases with *some* similarity (because we have other use cases for it). Trying to explain below.

I'm not that familiar with DSE but if I understand correctly, the sequence is:

Keep a list of instructions, known to be allocas (or alike?)
Call for all callsites *after* those allocas getModRefInfo
If CS can read from any of those allocas, all stores above the CS are live (and it's iterating BBs in reverse so marking all preceding stores live as soon as one call is found to Ref)

So the patch adds info about the list of allocas inside the MemoryLocation they modify and saves some of the work getModRefInfo does.

MemorySSA cannot currently provide info about isClobberedBy, it provides uses which are not yet optimized (this is a useful and expensive extension, and we should have a walker for it)
But even if we had this in MemorySSA now, the info cached in this case does not necessarily overlap with storing "isLocal" and "isNonEscaping" for a MemoryLocation, plus it is more expensive.
We're looking here for ModRef between two particular instances that may have many other memory-accessing instructions inbetween, with whom they MayAlias. A getModRefInfo call should be cheaper than a MemorySSA query here.
Perhaps I could argue that rewriting checks entirely (to avoid this particular ModRef call) would make MemorySSA more viable, but that's a whole other discussion and it would involve re-writing the pass.

The cost is clearly the extra memory added to MemoryLocation (Chris's objection for BBs) which is a core data structure. Unlike BBs we create MemoryLocations often on the spot and drop them right away.
This may make things better (as far as memory cost throughout the compilation) or worse (as far as allocations).
IMO, as far as caching invalidation, it is clearly better than BBs, since MemoryLocations are not passed between Analyses or Transforms. Hence the KnownFlags set by a pass will be used only in that pass, and whatever caching the pass does is dropped for others.
It could be argued that this could be another extension to add to MemorySSA so we do pass cached info forward, but I think that's beyond the scope of MemorySSA right now.

Hope this makes sense, and please feel free to pull in Chris and the others to give feedback.

asbirlea added a subscriber: george.burgess.iv.Nov 1 2018, 2:09 PM

FWIW, long term DSE should ideally use MemorySSA instead of MemoryDependenceResults.
But this particular optimization has a high chance of being useful even then.

lib/Analysis/BasicAliasAnalysis.cpp
822	Yes, I think it's good to keep the MemoryLocation const in this patch.
lib/Transforms/Scalar/DeadStoreElimination.cpp
774	Got it. I'm wondering what's the cause of the regressions you're seeing? Just from adding to the cache and clearing it?

In D53203#1283422, @hfinkel wrote:

In D53203#1282454, @dsanders wrote:

Hi Hal,

I didn't spot you in CODE_OWNERS.txt at first because I was searching for 'Alias' instead of 'alias' :-). Would you be able to take a look at this patch too?

Thanks. I suppose that I should echo Chris's question about the proposed local-dominance cache with respect to this change too: If we switch this pass to using MemorySSA does this problem go away?

Do you mean the question at http://lists.llvm.org/pipermail/llvm-dev/2018-September/126375.html?

I'm not very familiar with the MemorySSA pass but based on a fairly quick skim of it ...
It looks like the MemoryDef/MemoryUse objects are attached to the BB rather than the instructions. That wouldn't give us the accuracy we need for DSE to eliminate the store in something like:

%p = alloca ...
%1 = load %p
store %p, %1
unreachable

but it would give a reasonable early test as to whether it's worth asking further questions about the instructions. If the block containing the call site lacks a MemoryUse for a given alloca's MemoryDef then we could avoid asking whether the the call site accesses the given alloca because we know that nothing in the BB accesses it. However, this is only helpful if MemorySSA tries to eliminate MemoryUses for allocas that don't escape or where the called function is inspected and confirmed to not use that particular alloca. If it's conservative (e.g. adds a MemoryUse for all allocas at every call site) then we can't cull the expensive 'is it a non-escaping local?' check because MemorySSA only tells us that something in the BB used it, rather than a particular instruction used it.

Looking at MemorySSA's code, I'm wondering if it may be hitting the same performance issue that this patch is targeting. I see that getModRefInfo() is called quite a bit, most notably for every instruction in buildMemorySSA() (via createNewAccess()). Every one of those calls potentially calls the expensive PointerMayBeCaptured() (subject to early exits) and doesn't cache the results of any of them even though it's a property of the MemoryLocation rather than the Instruction. It's also potentially called again inside instructionClobbersQuery() which also looks like it's called fairly often (in walkToPhiOrClobber() and optimizeUsesInBlock()).

In D53203#1284533, @asbirlea wrote:

Hal,

I have mixed feelings here because I understand Chris's point, but there are differences in this case which make this patch worth pushing forward IMO.

MemorySSA in the current format will not solve this. It can be (and should be) extended to handle cases with *some* similarity (because we have other use cases for it). Trying to explain below.

I'm not that familiar with DSE but if I understand correctly, the sequence is:

Keep a list of instructions, known to be allocas (or alike?)

Call for all callsites *after* those allocas getModRefInfo

If CS can read from any of those allocas, all stores above the CS are live (and it's iterating BBs in reverse so marking all preceding stores live as soon as one call is found to Ref)

That's right. The particular case that was causing performance problems for me was:

call void @__assert_rtn(i8* %0, i8* %1, i8* %2)
unreachable

I had 5,000 exit blocks like that and 13,000 allocas to consider. Every alloca was potentially dead at the unreachable and it called getModRefInfo() for each one to find out if __assert_rtn was able to access it. The answer was always no because the allocas were all non-escaping locals but finding that out was expensive and it re-checked for every local alloca and every call site.

So the patch adds info about the list of allocas inside the MemoryLocation they modify and saves some of the work getModRefInfo does.

MemorySSA cannot currently provide info about isClobberedBy, it provides uses which are not yet optimized (this is a useful and expensive extension, and we should have a walker for it)
But even if we had this in MemorySSA now, the info cached in this case does not necessarily overlap with storing "isLocal" and "isNonEscaping" for a MemoryLocation, plus it is more expensive.
We're looking here for ModRef between two particular instances that may have many other memory-accessing instructions inbetween, with whom they MayAlias. A getModRefInfo call should be cheaper than a MemorySSA query here.
Perhaps I could argue that rewriting checks entirely (to avoid this particular ModRef call) would make MemorySSA more viable, but that's a whole other discussion and it would involve re-writing the pass.

The cost is clearly the extra memory added to MemoryLocation (Chris's objection for BBs) which is a core data structure.

We can potentially eliminate that cost in MemoryLocation using the separate non-const MemoryLocationKnowledge object we were talking about as an input and output. Callers that don't provide it would still need to allocate memory for it but it would be a default-constructed object in the callees stack frame.

Unlike BBs we create MemoryLocations often on the spot and drop them right away.
This may make things better (as far as memory cost throughout the compilation) or worse (as far as allocations).
IMO, as far as caching invalidation, it is clearly better than BBs, since MemoryLocations are not passed between Analyses or Transforms. Hence the KnownFlags set by a pass will be used only in that pass, and whatever caching the pass does is dropped for others.
It could be argued that this could be another extension to add to MemorySSA so we do pass cached info forward, but I think that's beyond the scope of MemorySSA right now.

Hope this makes sense, and please feel free to pull in Chris and the others to give feedback.

lib/Transforms/Scalar/DeadStoreElimination.cpp
774	I think so but I haven't dug into it yet to confirm it

I'm not very familiar with the MemorySSA pass but based on a fairly quick skim of it ...
It looks like the MemoryDef/MemoryUse objects are attached to the BB rather than the instructions.

I'll write more later, but quickly, this is incorrect. MemorySSA has per-instruction granularity.

In D53203#1285430, @hfinkel wrote:

I'm not very familiar with the MemorySSA pass but based on a fairly quick skim of it ...
It looks like the MemoryDef/MemoryUse objects are attached to the BB rather than the instructions.

I'll write more later, but quickly, this is incorrect. MemorySSA has per-instruction granularity.

Thanks. I got that from some comments about attaching MemoryUse/MemoryDef to basic blocks (not the ones about phi's) but I'm having trouble finding the same comments this morning. Is it storing the additional data in the BB objects or something like that? That could explain how I reached the wrong conclusion.

In that case it will come down to how conservative it is about attaching MemoryUse. Looking at buildMemorySSA() again, createNewAccess() is calling getModRefInfo(const Instruction *, Optional<MemoryLocation>) and is always providing None as the second argument. Following that code path, this is causing it to report the behaviour using getModRefBehavior(ImmutableCallSite) which is good news performance-wise since this path doesn't have the expensive is-this-a-non-escaping-local check. It's also good news for DSE when doesNotAccessMemory() is true since that avoids the need for the expensive call for all allocas on the basis that none of them can be accessed if it doesn't access memory. However, it's not enough for DSE when doesNotAccessMemory() is false since DSE wants to know if specific allocas are accessed by the call site so we'd still have to iterate over all the possibly-dead allocas checking each one with getModRefInfo(ImmutableCallSite *, MemoryLocation) which (without something like this patch) would repeatedly perform the expensive check to see if the MemoryLocation is a non-escaping local.

MemorySSA is an analysis, all info is kept "on the side". It's standalone and no info is kept in BBs.
Some doc can be found here: http://releases.llvm.org/6.0.1/docs/MemorySSA.html

MemorySSA would be a net improvement vs MemoryDependenceAnalysis currently used in DSE.

asbirlea added inline comments.Nov 5 2018, 10:09 AM

lib/Transforms/Scalar/DeadStoreElimination.cpp
1382–1383	Could this go inside eliminateDeadStores(F, ...)?
1415–1416	Same as above.

sanjoy resigned from this revision.Jan 29 2022, 5:30 PM

Herald added a project: Restricted Project. · View Herald TranscriptJan 29 2022, 5:30 PM

Herald added a subscriber: jeroen.dobbelaere. · View Herald Transcript

Revision Contents

Path

Size

include/

llvm/

Analysis/

MemoryLocation.h

34 lines

lib/

Analysis/

BasicAliasAnalysis.cpp

4 lines

Transforms/

Scalar/

DeadStoreElimination.cpp

88 lines

Diff 169434

include/llvm/Analysis/MemoryLocation.h

Show First 20 Lines • Show All 154 Lines • ▼ Show 20 Lines	public:
uint64_t toRaw() const { return Value; }		uint64_t toRaw() const { return Value; }
};		};

inline raw_ostream &operator<<(raw_ostream &OS, LocationSize Size) {		inline raw_ostream &operator<<(raw_ostream &OS, LocationSize Size) {
Size.print(OS);		Size.print(OS);
return OS;		return OS;
}		}

		enum MemoryLocationFlags {
		/// The memory location is known to be a local object in the function.
		MLK_KnownToBeLocalObject = 1,
		/// The memory location is known to be (or possibly be) a non-local object in
		/// the function.
		MLK_KnownCouldBeNonLocalObject = 2,
		/// The memory location is known to not escape the function. The minimum
		/// standard for this flag to be permissible is that PointerMayBeCaptured(V,
		/// false, true) must be false. It's acceptable to be more conservative than
		/// that at the cost of pessimizing the analysis but not less conservative.
		MLK_KnownToBeNonEscaping = 4,
		/// The memory location may possibly escape from the function.
		MLK_KnownMayEscape = 8,
		};

/// Representation for a specific memory location.		/// Representation for a specific memory location.
///		///
/// This abstraction can be used to represent a specific location in memory.		/// This abstraction can be used to represent a specific location in memory.
/// The goal of the location is to represent enough information to describe		/// The goal of the location is to represent enough information to describe
/// abstract aliasing, modification, and reference behaviors of whatever		/// abstract aliasing, modification, and reference behaviors of whatever
/// value(s) are stored in memory at the particular location.		/// value(s) are stored in memory at the particular location.
///		///
/// The primary user of this interface is LLVM's Alias Analysis, but other		/// The primary user of this interface is LLVM's Alias Analysis, but other
Show All 16 Lines	public:
/// one object and into another. See		/// one object and into another. See
/// http://llvm.org/docs/LangRef.html#pointeraliasing		/// http://llvm.org/docs/LangRef.html#pointeraliasing
LocationSize Size;		LocationSize Size;

/// The metadata nodes which describes the aliasing of the location (each		/// The metadata nodes which describes the aliasing of the location (each
/// member is null if that kind of information is unavailable).		/// member is null if that kind of information is unavailable).
AAMDNodes AATags;		AAMDNodes AATags;

		/// Some things are rather expensive to discover by analyses such as AA and we
		/// should minimize the frequency with which we query them. We can inform such
		/// analyses that something is already known through these flags.
		///
		/// One example of an expensive check that we may already know the result of
		/// is the isNonEscapingLocalObject() that AA uses.
		uint8_t KnownFlags = 0;

/// Return a location with information about the memory reference by the given		/// Return a location with information about the memory reference by the given
/// instruction.		/// instruction.
static MemoryLocation get(const LoadInst *LI);		static MemoryLocation get(const LoadInst *LI);
static MemoryLocation get(const StoreInst *SI);		static MemoryLocation get(const StoreInst *SI);
static MemoryLocation get(const VAArgInst *VI);		static MemoryLocation get(const VAArgInst *VI);
static MemoryLocation get(const AtomicCmpXchgInst *CXI);		static MemoryLocation get(const AtomicCmpXchgInst *CXI);
static MemoryLocation get(const AtomicRMWInst *RMWI);		static MemoryLocation get(const AtomicRMWInst *RMWI);
static MemoryLocation get(const Instruction *Inst) {		static MemoryLocation get(const Instruction *Inst) {
▲ Show 20 Lines • Show All 53 Lines • ▼ Show 20 Lines	public:
}		}

MemoryLocation getWithoutAATags() const {		MemoryLocation getWithoutAATags() const {
MemoryLocation Copy(*this);		MemoryLocation Copy(*this);
Copy.AATags = AAMDNodes();		Copy.AATags = AAMDNodes();
return Copy;		return Copy;
}		}

		bool isKnownToBeLocalObject() const {
		return KnownFlags & MLK_KnownToBeLocalObject;
		}
		bool isKnownCouldBeNonLocalObject() const {
		return KnownFlags & MLK_KnownCouldBeNonLocalObject;
		}
		bool isKnownToBeNonEscaping() const {
		return KnownFlags & MLK_KnownToBeNonEscaping;
		}
		bool isKnownMayEscape() const { return KnownFlags & MLK_KnownMayEscape; }

bool operator==(const MemoryLocation &Other) const {		bool operator==(const MemoryLocation &Other) const {
return Ptr == Other.Ptr && Size == Other.Size && AATags == Other.AATags;		return Ptr == Other.Ptr && Size == Other.Size && AATags == Other.AATags;
}		}
};		};

// Specialize DenseMapInfo.		// Specialize DenseMapInfo.
template <> struct DenseMapInfo<LocationSize> {		template <> struct DenseMapInfo<LocationSize> {
static inline LocationSize getEmptyKey() {		static inline LocationSize getEmptyKey() {
Show All 34 Lines

lib/Analysis/BasicAliasAnalysis.cpp

Show First 20 Lines • Show All 810 Lines • ▼ Show 20 Lines	if (const CallInst *CI = dyn_cast<CallInst>(CS.getInstruction()))
if (CI->isTailCall() &&		if (CI->isTailCall() &&
!CI->getAttributes().hasAttrSomewhere(Attribute::ByVal))		!CI->getAttributes().hasAttrSomewhere(Attribute::ByVal))
return ModRefInfo::NoModRef;		return ModRefInfo::NoModRef;

// If the pointer is to a locally allocated object that does not escape,		// If the pointer is to a locally allocated object that does not escape,
// then the call can not mod/ref the pointer unless the call takes the pointer		// then the call can not mod/ref the pointer unless the call takes the pointer
// as an argument, and itself doesn't capture it.		// as an argument, and itself doesn't capture it.
if (!isa<Constant>(Object) && CS.getInstruction() != Object &&		if (!isa<Constant>(Object) && CS.getInstruction() != Object &&
isNonEscapingLocalObject(Object)) {		((Loc.isKnownToBeLocalObject() && Loc.isKnownToBeNonEscaping()) \|\|
		asbirleaUnsubmitted Done Reply Inline Actions Add an assert before this to check: assert((!Loc.isKnownToBeLocalObject() \|\| !Loc.isKnownCouldBeNonLocalObject()) && "Location cannot be both known local and known non-local"); assert((!Loc.isKnownToBeNonEscaping() \|\| !Loc.isKnownMayEscape() && "Location cannot be both known non escaping and known it may escape"); asbirlea: Add an assert before this to check: ``` assert((!Loc.isKnownToBeLocalObject() \|\| !Loc.
		(!Loc.isKnownCouldBeNonLocalObject() && !Loc.isKnownMayEscape() &&
		isNonEscapingLocalObject(Object)))) {

		asbirleaUnsubmitted Not Done Reply Inline Actions Update flags in Loc here if isNonEscapingLocalObject was called? This will make Loc non-const, which is a larger chance in itself and there should be some comments added if we make getModRefInfo have side-effects. asbirlea: Update flags in Loc here if isNonEscapingLocalObject was called? This will make Loc non-const…
		dsandersAuthorUnsubmitted Not Done Reply Inline Actions I like the idea of the Loc being updated with things that become known as a result of AA as I'm sure there's plenty more caching that can be done. However, I'm a bit worried about the effects of making the MemoryLocation non-const. My main concern is that callers will find that caching is happening unexpectedly and that passes will therefore cache automatically, modify the IR, but forget to clear the known flags because they weren't aware of the caching. The other concerns are mostly about unintended implicit copies caused by assigning const MemoryLocation to non-const MemoryLocation, and about the size of the patch as dropping the const will likely propagate outwards quite a long way. I think it would probably be better to add an extra argument to make the caching something the callers have to knowingly opt into. I'm thinking something like: ModRefInfo getModRefInfo(ImmutableCallSite CS, const MemoryLocation &Loc, MemoryLocationKnowledge &Knowledge); ModRefInfo getModRefInfo(ImmutableCallSite CS, const MemoryLocation &Loc) { MemoryLocationKnowledge Knowledge; return getModRefInfo(CS, Loc, Knowledge); } That way, callers that aren't aware of it or don't want to preserve it can keep calling: getModRefInfo(CS, Loc) and discarding any cachable knowledge. Those that want full caching (e.g. analysis passes) can do something like: getModRefInfo(CS, Loc, Loc.Knowledge) meanwhile those that need to be more careful can do: MemoryLocationKnowledge Knowledge; getModRefInfo(CS, Loc, Knowledge) ... some transformation ... Knowledge.forgetX(); Knowledge.forgetY(); Loc.Knowledge = Knowledge; Does that sound like a good direction to go? dsanders: I like the idea of the Loc being updated with things that become known as a result of AA as I'm…
		asbirleaUnsubmitted Not Done Reply Inline Actions Yes, I think it's good to keep the MemoryLocation const in this patch. asbirlea: Yes, I think it's good to keep the MemoryLocation const in this patch.
// Optimistically assume that call doesn't touch Object and check this		// Optimistically assume that call doesn't touch Object and check this
// assumption in the following loop.		// assumption in the following loop.
ModRefInfo Result = ModRefInfo::NoModRef;		ModRefInfo Result = ModRefInfo::NoModRef;
bool IsMustAlias = true;		bool IsMustAlias = true;

unsigned OperandNo = 0;		unsigned OperandNo = 0;
for (auto CI = CS.data_operands_begin(), CE = CS.data_operands_end();		for (auto CI = CS.data_operands_begin(), CE = CS.data_operands_end();
CI != CE; ++CI, ++OperandNo) {		CI != CE; ++CI, ++OperandNo) {
▲ Show 20 Lines • Show All 1,154 Lines • Show Last 20 Lines

lib/Transforms/Scalar/DeadStoreElimination.cpp

Show First 20 Lines • Show All 740 Lines • ▼ Show 20 Lines
}		}

/// Remove dead stores to stack-allocated locations in the function end block.		/// Remove dead stores to stack-allocated locations in the function end block.
/// Ex:		/// Ex:
/// %A = alloca i32		/// %A = alloca i32
/// ...		/// ...
/// store i32 1, i32* %A		/// store i32 1, i32* %A
/// ret void		/// ret void
static bool handleEndBlock(BasicBlock &BB, AliasAnalysis *AA,		static bool
MemoryDependenceResults *MD,		handleEndBlock(BasicBlock &BB, AliasAnalysis AA, MemoryDependenceResults MD,
const TargetLibraryInfo *TLI,		const TargetLibraryInfo *TLI, InstOverlapIntervalsTy &IOL,
InstOverlapIntervalsTy &IOL,		DenseMap<Instruction , size_t> InstrOrdering,
DenseMap<Instruction, size_t> InstrOrdering) {		DenseMap<Value *, MemoryLocationFlags> &CachedMemLocFlags) {
bool MadeChange = false;		bool MadeChange = false;

// Keep track of all of the stack objects that are dead at the end of the		// Keep track of all of the stack objects that are dead at the end of the
// function.		// function.
SmallSetVector<Value*, 16> DeadStackObjects;		SmallSetVector<Value*, 16> DeadStackObjects;

// Find all of the alloca'd pointers in the entry block.		// Find all of the alloca'd pointers in the entry block.
BasicBlock &Entry = BB.getParent()->front();		BasicBlock &Entry = BB.getParent()->front();
for (Instruction &I : Entry) {		for (Instruction &I : Entry) {
if (isa<AllocaInst>(&I))		if (isa<AllocaInst>(&I)) {
DeadStackObjects.insert(&I);		DeadStackObjects.insert(&I);
		if (CachedMemLocFlags.count(&I) == 0) {
		unsigned KnownFlags = MLK_KnownToBeLocalObject;
		// If it's an alloca and we already know that the functions we call
		// can't possibly load via a pointer (whether via args or indirectly)
		// then we know it's not escaping into any of the functions we're going
		// to call. Setting this flag and providing it to AA->getModRefInfo()
		// will prevent AA from re-discovering this fact for every call site and
		// alloca. This can be a significant saving in compile time when there's
		// lots of allocas and lots of noreturn functions (e.g. assert)
		KnownFlags \|= PointerMayBeCaptured(&I, false, true)
		asbirleaUnsubmitted Not Done Reply Inline Actions Can we avoid this call and leave the Escaping bits both not set? Precisely because PointerMayBeCaptured is expensive. Ideally this should be computed only on demand (in geModRefInfo) and the info passed back when caching. asbirlea: Can we avoid this call and leave the Escaping bits both not set? Precisely because…
		dsandersAuthorUnsubmitted Not Done Reply Inline Actions We can, although for DSE at least we won't end up saving any further calls to it as a result. DSE calls getModRef() for every local (or local equivalent) in DeadStackObjects so we'll always call PointerMayBeCaptured() once for each item either way dsanders: We can, although for DSE at least we won't end up saving any further calls to it as a result.
		asbirleaUnsubmitted Not Done Reply Inline Actions Got it. I'm wondering what's the cause of the regressions you're seeing? Just from adding to the cache and clearing it? asbirlea: Got it. I'm wondering what's the cause of the regressions you're seeing? Just from adding to…
		dsandersAuthorUnsubmitted Not Done Reply Inline Actions I think so but I haven't dug into it yet to confirm it dsanders: I think so but I haven't dug into it yet to confirm it
		? MLK_KnownMayEscape
		: MLK_KnownToBeNonEscaping;
		CachedMemLocFlags[&I] = (MemoryLocationFlags)KnownFlags;
		}
		}

// Okay, so these are dead heap objects, but if the pointer never escapes		// Okay, so these are dead heap objects, but if the pointer never escapes
// then it's leaked by this function anyways.		// then it's leaked by this function anyways.
else if (isAllocLikeFn(&I, TLI) && !PointerMayBeCaptured(&I, true, true))		else if (isAllocLikeFn(&I, TLI) && !PointerMayBeCaptured(&I, true, true)) {
DeadStackObjects.insert(&I);		DeadStackObjects.insert(&I);
		CachedMemLocFlags[&I] = (MemoryLocationFlags)(MLK_KnownToBeLocalObject \|
		MLK_KnownToBeNonEscaping);
		}
}		}

// Treat byval or inalloca arguments the same, stores to them are dead at the		// Treat byval or inalloca arguments the same, stores to them are dead at the
// end of the function.		// end of the function.
for (Argument &AI : BB.getParent()->args())		for (Argument &AI : BB.getParent()->args())
if (AI.hasByValOrInAllocaAttr())		if (AI.hasByValOrInAllocaAttr())
DeadStackObjects.insert(&AI);		DeadStackObjects.insert(&AI);

▲ Show 20 Lines • Show All 66 Lines • ▼ Show 20 Lines	if (auto CS = CallSite(&*BBI)) {
// pointers.		// pointers.
if (AA->doesNotAccessMemory(CS))		if (AA->doesNotAccessMemory(CS))
continue;		continue;

// If the call might load from any of our allocas, then any store above		// If the call might load from any of our allocas, then any store above
// the call is live.		// the call is live.
DeadStackObjects.remove_if([&](Value *I) {		DeadStackObjects.remove_if([&](Value *I) {
// See if the call site touches the value.		// See if the call site touches the value.
return isRefSet(AA->getModRefInfo(CS, I, getPointerSize(I, DL, *TLI,		MemoryLocation Loc(I, getPointerSize(I, DL, *TLI, BB.getParent()));
BB.getParent())));
		const auto &MemLocFlags = CachedMemLocFlags.find(I);
		if (MemLocFlags != CachedMemLocFlags.end())
		Loc.KnownFlags = MemLocFlags->second;

		return isRefSet(AA->getModRefInfo(CS, Loc));
		asbirleaUnsubmitted Not Done Reply Inline Actions Check if Loc was updated by the ModRef call and update the cache? asbirlea: Check if Loc was updated by the ModRef call and update the cache?
});		});

// If all of the allocas were clobbered by the call then we're not going		// If all of the allocas were clobbered by the call then we're not going
// to find anything else to process.		// to find anything else to process.
if (DeadStackObjects.empty())		if (DeadStackObjects.empty())
break;		break;

continue;		continue;
▲ Show 20 Lines • Show All 200 Lines • ▼ Show 20 Lines	if (UnderlyingPointer && isCallocLikeFn(UnderlyingPointer, TLI) &&
deleteDeadInstruction(SI, &BBI, MD, TLI, IOL, InstrOrdering);		deleteDeadInstruction(SI, &BBI, MD, TLI, IOL, InstrOrdering);
++NumRedundantStores;		++NumRedundantStores;
return true;		return true;
}		}
}		}
return false;		return false;
}		}

static bool eliminateDeadStores(BasicBlock &BB, AliasAnalysis *AA,		static bool
		eliminateDeadStores(BasicBlock &BB, AliasAnalysis *AA,
MemoryDependenceResults MD, DominatorTree DT,		MemoryDependenceResults MD, DominatorTree DT,
const TargetLibraryInfo *TLI) {		const TargetLibraryInfo *TLI,
		DenseMap<Value *, MemoryLocationFlags> &CachedMemLocFlags) {
const DataLayout &DL = BB.getModule()->getDataLayout();		const DataLayout &DL = BB.getModule()->getDataLayout();
bool MadeChange = false;		bool MadeChange = false;

// FIXME: Maybe change this to use some abstraction like OrderedBasicBlock?		// FIXME: Maybe change this to use some abstraction like OrderedBasicBlock?
// The current OrderedBasicBlock can't deal with mutation at the moment.		// The current OrderedBasicBlock can't deal with mutation at the moment.
size_t LastThrowingInstIndex = 0;		size_t LastThrowingInstIndex = 0;
DenseMap<Instruction*, size_t> InstrOrdering;		DenseMap<Instruction*, size_t> InstrOrdering;
size_t InstrIndex = 1;		size_t InstrIndex = 1;
▲ Show 20 Lines • Show All 217 Lines • ▼ Show 20 Lines	while (InstDep.isDef() \|\| InstDep.isClobber()) {
DepWrite->getIterator(), &BB,		DepWrite->getIterator(), &BB,
/QueryInst=/ nullptr, &Limit);		/QueryInst=/ nullptr, &Limit);
}		}
}		}

if (EnablePartialOverwriteTracking)		if (EnablePartialOverwriteTracking)
MadeChange \|= removePartiallyOverlappedStores(AA, DL, IOL);		MadeChange \|= removePartiallyOverlappedStores(AA, DL, IOL);

		// Flushing the cache in response to changes isn't particularly urgent since
		// stale data is conservatively correct because:
		// * A non-escaping local is still non-escaping so long as we don't
		// introduce new methods of capturing/escaping.
		// * An escaping local may become non-escaping as a result of the
		// deletion/rewrite.
		//
		// That said, we flush it here because handleEndBlock() is about to use the
		// cache and we want to discover any newly non-escaping objects.
		if (MadeChange)
		CachedMemLocFlags.clear();

		bool MadeChangeInHandleEndBlock = false;
// If this block ends in a return, unwind, or unreachable, all allocas are		// If this block ends in a return, unwind, or unreachable, all allocas are
// dead at its end, which means stores to them are also dead.		// dead at its end, which means stores to them are also dead.
if (BB.getTerminator()->getNumSuccessors() == 0)		if (BB.getTerminator()->getNumSuccessors() == 0) {
MadeChange \|= handleEndBlock(BB, AA, MD, TLI, IOL, &InstrOrdering);		MadeChangeInHandleEndBlock =
		handleEndBlock(BB, AA, MD, TLI, IOL, &InstrOrdering, CachedMemLocFlags);
		MadeChange \|= MadeChangeInHandleEndBlock;
		}

		// Also flush it if handleEndBlock made a change.
		if (MadeChangeInHandleEndBlock)
		CachedMemLocFlags.clear();

return MadeChange;		return MadeChange;
}		}

static bool eliminateDeadStores(Function &F, AliasAnalysis *AA,		static bool
MemoryDependenceResults MD, DominatorTree DT,		eliminateDeadStores(Function &F, AliasAnalysis AA, MemoryDependenceResults MD,
const TargetLibraryInfo *TLI) {		DominatorTree DT, const TargetLibraryInfo TLI,
		DenseMap<Value *, MemoryLocationFlags> &CachedMemLocFlags) {
bool MadeChange = false;		bool MadeChange = false;
for (BasicBlock &BB : F)		for (BasicBlock &BB : F)
// Only check non-dead blocks. Dead blocks may have strange pointer		// Only check non-dead blocks. Dead blocks may have strange pointer
// cycles that will confuse alias analysis.		// cycles that will confuse alias analysis.
if (DT->isReachableFromEntry(&BB))		if (DT->isReachableFromEntry(&BB))
MadeChange \|= eliminateDeadStores(BB, AA, MD, DT, TLI);		MadeChange \|= eliminateDeadStores(BB, AA, MD, DT, TLI, CachedMemLocFlags);

return MadeChange;		return MadeChange;
}		}

//===----------------------------------------------------------------------===//		//===----------------------------------------------------------------------===//
// DSE Pass		// DSE Pass
//===----------------------------------------------------------------------===//		//===----------------------------------------------------------------------===//
PreservedAnalyses DSEPass::run(Function &F, FunctionAnalysisManager &AM) {		PreservedAnalyses DSEPass::run(Function &F, FunctionAnalysisManager &AM) {
AliasAnalysis *AA = &AM.getResult<AAManager>(F);		AliasAnalysis *AA = &AM.getResult<AAManager>(F);
DominatorTree *DT = &AM.getResult<DominatorTreeAnalysis>(F);		DominatorTree *DT = &AM.getResult<DominatorTreeAnalysis>(F);
MemoryDependenceResults *MD = &AM.getResult<MemoryDependenceAnalysis>(F);		MemoryDependenceResults *MD = &AM.getResult<MemoryDependenceAnalysis>(F);
const TargetLibraryInfo *TLI = &AM.getResult<TargetLibraryAnalysis>(F);		const TargetLibraryInfo *TLI = &AM.getResult<TargetLibraryAnalysis>(F);

if (!eliminateDeadStores(F, AA, MD, DT, TLI))		DenseMap<Value *, MemoryLocationFlags> CachedMemLocFlags;
		if (!eliminateDeadStores(F, AA, MD, DT, TLI, CachedMemLocFlags))
		asbirleaUnsubmitted Not Done Reply Inline Actions Could this go inside eliminateDeadStores(F, ...)? asbirlea: Could this go inside eliminateDeadStores(F, ...)?
return PreservedAnalyses::all();		return PreservedAnalyses::all();

PreservedAnalyses PA;		PreservedAnalyses PA;
PA.preserveSet<CFGAnalyses>();		PA.preserveSet<CFGAnalyses>();
PA.preserve<GlobalsAA>();		PA.preserve<GlobalsAA>();
PA.preserve<MemoryDependenceAnalysis>();		PA.preserve<MemoryDependenceAnalysis>();
return PA;		return PA;
}		}
Show All 15 Lines	bool runOnFunction(Function &F) override {

DominatorTree *DT = &getAnalysis<DominatorTreeWrapperPass>().getDomTree();		DominatorTree *DT = &getAnalysis<DominatorTreeWrapperPass>().getDomTree();
AliasAnalysis *AA = &getAnalysis<AAResultsWrapperPass>().getAAResults();		AliasAnalysis *AA = &getAnalysis<AAResultsWrapperPass>().getAAResults();
MemoryDependenceResults *MD =		MemoryDependenceResults *MD =
&getAnalysis<MemoryDependenceWrapperPass>().getMemDep();		&getAnalysis<MemoryDependenceWrapperPass>().getMemDep();
const TargetLibraryInfo *TLI =		const TargetLibraryInfo *TLI =
&getAnalysis<TargetLibraryInfoWrapperPass>().getTLI();		&getAnalysis<TargetLibraryInfoWrapperPass>().getTLI();

return eliminateDeadStores(F, AA, MD, DT, TLI);		DenseMap<Value *, MemoryLocationFlags> CachedMemLocFlags;
		return eliminateDeadStores(F, AA, MD, DT, TLI, CachedMemLocFlags);
		asbirleaUnsubmitted Not Done Reply Inline Actions Same as above. asbirlea: Same as above.
}		}

void getAnalysisUsage(AnalysisUsage &AU) const override {		void getAnalysisUsage(AnalysisUsage &AU) const override {
AU.setPreservesCFG();		AU.setPreservesCFG();
AU.addRequired<DominatorTreeWrapperPass>();		AU.addRequired<DominatorTreeWrapperPass>();
AU.addRequired<AAResultsWrapperPass>();		AU.addRequired<AAResultsWrapperPass>();
AU.addRequired<MemoryDependenceWrapperPass>();		AU.addRequired<MemoryDependenceWrapperPass>();
AU.addRequired<TargetLibraryInfoWrapperPass>();		AU.addRequired<TargetLibraryInfoWrapperPass>();
Show All 23 Lines