This is an archive of the discontinued LLVM Phabricator instance.

Paths

Table of Contentst

-
llvm/trunk/
-
trunk/
-
include/llvm/Analysis/
-
llvm/
-
Analysis/
-
MemoryDependenceAnalysis.h
-
lib/Analysis/
-
Analysis/
-
MemoryDependenceAnalysis.cpp
-
test/Transforms/
-
Transforms/
-
GVN/
-
assume-equal.ll
-
invariant.group.ll
-
NewGVN/
-
assume-equal.ll
-
invariant.group.ll

Differential D28137

[Devirtualization] MemDep returns non-local !invariant.group dependencies
ClosedPublic

Authored by Prazek on Dec 28 2016, 7:45 AM.

Download Raw Diff

Details

Reviewers

chandlerc
reames
rjmccall
• dberlin
nlewycky
rsmith

Commits

rG9530883e8c47: [Devirtualization] MemDep returns non-local !invariant.group dependencies
rL291762: [Devirtualization] MemDep returns non-local !invariant.group dependencies

Summary

Memory Dependence Analysis was limited to return only local dependencies
for invariant.group handling. Now it returns NonLocal when it finds it
and then by asking getNonLocalPointerDependency we get found dep.

Thanks to this we are able to devirtualize loops!

void indirect(A &a, int n) {
  for (int i = 0 ; i < n; i++)
    a.foo();

}
void test(int n) {
  A a;
  indirect(a);
}

After inlining a.foo() will be changed to direct call, even if it is
external (but only if vtable definition will be available).

Diff Detail

Repository: rL LLVM

Event Timeline

Prazek updated this revision to Diff 82593.Dec 28 2016, 7:45 AM

Prazek retitled this revision from to [Devirtualization] MemDep returns non-local !invariant.group dependencies.

Prazek updated this object.

Prazek added reviewers: nlewycky, • dberlin, chandlerc, rsmith.

Prazek added subscribers: llvm-commits, davide, mehdi_amini.

Prazek added a parent revision: D28126: [MemDep] Handle gep with zeros for invariant.group.Dec 28 2016, 7:47 AM

davide added inline comments.Dec 28 2016, 7:59 AM

include/llvm/Analysis/MemoryDependenceAnalysis.h
305–307 ↗	(On Diff #82593)	I have hard time parsing this comment, I think you can remove the `so then` part or remove it altogether.
449 ↗	(On Diff #82593)	nitpicking, typos: retreived -> retrieved quered -> queried.
lib/Analysis/MemoryDependenceAnalysis.cpp
344 ↗	(On Diff #82593)	please add an assert message.
371 ↗	(On Diff #82593)	spurious newline.
911 ↗	(On Diff #82593)	why do you need a different scope?
912 ↗	(On Diff #82593)	The type of `NonLocalDef` is not entirely obvious so I'd prefer if you use the explicit one.
test/Transforms/NewGVN/invariant.group.ll
345–347 ↗	(On Diff #82593)	Thanks for adding NewGVN tests (although they're still XFAIL'ed, alas).

Prazek added inline comments.Dec 28 2016, 8:12 AM

lib/Analysis/MemoryDependenceAnalysis.cpp
911 ↗	(On Diff #82593)	This function is pretty large, so I prefer not to add new names with that huge scope.
test/Transforms/NewGVN/invariant.group.ll
345–347 ↗	(On Diff #82593)	BTW is there any way we could merge them together keeping the Xfail semantics?

Prazek marked 4 inline comments as done.Dec 28 2016, 8:22 AM

Prazek added inline comments.

lib/Analysis/MemoryDependenceAnalysis.cpp
912 ↗	(On Diff #82593)	Find returns iterator, so I would like to avoid this long name. Does adding "It" suffix to variable name works for you?

Small changes

rebase

davide added inline comments.Dec 28 2016, 8:48 AM

lib/Analysis/MemoryDependenceAnalysis.cpp
911 ↗	(On Diff #82593)	I agree it's large, maybe in a separate commit you can try splitting into multiple functions? Not sure how much is buying us as MemDep will go away soon'ish (6 months to 1 year) but it's worth a try.
912 ↗	(On Diff #82593)	Fair enough.
test/Transforms/NewGVN/invariant.group.ll
345–347 ↗	(On Diff #82593)	I wish there was (as it would avoid a lot of duplication) but I wasn't able to find it. Happy to do the conversion if you think of/implement a way to share. Side note: I think this is not really high priority as the only time when we copy tests is during transitions (as in pass rewrites etc...) which are not common enough to justify the burden of implementing a conditional XFAIL on a per-pass basis. YMMV.

Prazek marked 8 inline comments as done.Dec 28 2016, 9:19 AM

Prazek added inline comments.

lib/Analysis/MemoryDependenceAnalysis.cpp
911 ↗	(On Diff #82593)	I will see what I can do about it, but I guess it is more worth to implement !invariant.group in MemSSE

davide added inline comments.Dec 28 2016, 9:22 AM

lib/Analysis/MemoryDependenceAnalysis.cpp
911 ↗	(On Diff #82593)	I wholeheartedly agree.

Prazek marked 3 inline comments as done.Dec 28 2016, 9:31 AM

rebase

Prazek added a reviewer: reames.Dec 30 2016, 2:10 AM

rebase

small format

Prazek added a subscriber: vsk.Dec 30 2016, 7:23 AM

The use of a side cache here appears unnecessary. Why can't you simply return the non-local result to the immediate caller and let it be kept in a local unless needed?

Also, you have a problem with the quality of your results. You're returning the first non-local found, not the best non-local found. This makes the results use list order dependent. We generally try not to have use list order sensitivities in the optimizer.

This revision now requires changes to proceed.Dec 30 2016, 10:02 AM

In D28137#632782, @reames wrote:

The use of a side cache here appears unnecessary. Why can't you simply return the non-local result to the immediate caller and let it be kept in a local unless needed?

What do you mean by "kept in local"? Do you mean storing one def as a member of MemDep?

Also, you have a problem with the quality of your results. You're returning the first non-local found, not the best non-local found. This makes the results use list order dependent. We generally try not to have use list order sensitivities in the optimizer.

But this doesn't matter to invariant.group. Let say that I have 2 loads with invariant group, one local and second non-local

  %0 load %a !invariant.group !0
  call void foo(%a)
  br b1

b1:
  %1 = load %a !invariant.group !0
  call void foo(%a)
  %2 = load %a !invariant.group !0

There is guarantee that %0, %1 and %2 will load the same value. If I will ask for dependency about %2, then whenever I will get
def(%1) or def(%0), they will be transformed into either %0 or %1, but assuming we will query %1, then it will be changed into %0,
which means that we will end up with %0 everywhere.
This is assuming that GVN is sane.

I can continue looping and either try hard to find local dependency or 'the best' non-local dependency (which probably won't be found because GVN will already remove it)
and return it, or even collect list of all non local dependencies.

I am also not sure if I understand what 'the best' non local dependency means.

In D28137#632801, @Prazek wrote:
In D28137#632782, @reames wrote:

The use of a side cache here appears unnecessary. Why can't you simply return the non-local result to the immediate caller and let it be kept in a local unless needed?

What do you mean by "kept in local"? Do you mean storing one def as a member of MemDep?

Also, you have a problem with the quality of your results. You're returning the first non-local found, not the best non-local found. This makes the results use list order dependent. We generally try not to have use list order sensitivities in the optimizer.

But this doesn't matter to invariant.group. Let say that I have 2 loads with invariant group, one local and second non-local
  %0 load %a !invariant.group !0
  call void foo(%a)
  br b1

b1:
  %1 = load %a !invariant.group !0
  call void foo(%a)
  %2 = load %a !invariant.group !0
There is guarantee that %0, %1 and %2 will load the same value. If I will ask for dependency about %2, then whenever I will get
def(%1) or def(%0), they will be transformed into either %0 or %1, but assuming we will query %1, then it will be changed into %0,
which means that we will end up with %0 everywhere.
This is assuming that GVN is sane.

I understand your argument that after optimizing to a fixed point, we should get the same result. The problem is that the *order* of iteration will differ. This can mean differences in naming, memory allocation patterns, or even output (I don't believe GVNPRE actually iterates to a fixed point.)

I can continue looping and either try hard to find local dependency or 'the best' non-local dependency (which probably won't be found because GVN will already remove it)
and return it, or even collect list of all non local dependencies.

The key point is that we should return a consistent result, regardless of which query path we see. Returning the most dominating non-local would be one reasonable scheme.

I am also not sure if I understand what 'the best' non local dependency means.

Don't get too caught up on "best" here. The ordering point above is the primary concern. A secondary concern is reducing the number of iterations required to reach the fixed point. Why do multiple full scans if we can bypass all but one with a small amount of extra work? Particularly work that we know only happens when we have found a useful result and are just trying to find a better one? (i.e. we're not burning time when we're not making progress.)

lib/Analysis/MemoryDependenceAnalysis.cpp
417 ↗	(On Diff #82725)	Ok, my comment on the previous round was unclear and probably misleading. My primary concern is the use of a side structure to propagate results from one call to the next. This introduces a bunch of complexity (do we need cache invalidation?, are we leaking memory?) which I'd like to avoid if we can. Can we structure this such that we split this function and call it once for local and once for non-local? I know this will do strictly more work, but I'd prefer that over the code complexity.

Hmm, shouldn't it be the least dominating instead? Then we will prefer local dependencies instead of non-local.
I never faced the problem with iterations (in GVN or anywhere else), so I will trust you that it will be valuable to fix it.
I am not sure if it is valuable to fix the second issue.

lib/Analysis/MemoryDependenceAnalysis.cpp
417 ↗	(On Diff #82725)	Yes, I totally agree that this is hacky solution. I don't see a simple solution to solve it expect to do high refactor in MemDep. The main problem here is that all the users of memdep expect to get local results first, and then ask for nonlocal dependencies. I firstly tried to change Dep into LocalDep and add NonLocalDep, but then I found that I would have to see all the usages of MemDep, understand all the algorithms to make sure everything is still valid. Other way would be to run the same algorithm second time in nonlocal, but this would make even less sense than the solution we have today. There is also strong argument imho to leave it as it - MemDep will is being deprecated, so there is plan to remove it in 6-12 months. Because NewGVN is still not there, I would like to make this small hack so clang-4.0 will be able to do much better devirtualization. Then I am planning to implement handling of invariant.group in MSSE and assume in NewGVN, but this will probably take me while.

Prazek added a reviewer: rjmccall.Dec 30 2016, 11:53 PM

Fiding the closest dependency

Small format

davide added inline comments.Jan 3 2017, 6:14 AM

lib/Analysis/MemoryDependenceAnalysis.cpp
371–374 ↗	(On Diff #82871)	`if (Best == nullptr \|\| Dt.dominates()) return Other;`
422–425 ↗	(On Diff #82871)	You probably don't need braces, right?

Addresed Davide's comments

ping
I would like to ship it in LLVM 4.0

ping

note that there was discussion with emails that reviewboard didn't catch.

rebase

Ping, I would really like to land it in 4.0

rebase

The N^2 loop worries me a lot, because it means i can easily construct cases where you've made memdep N^3 or N^4 in the case of invariant group dependencies.

But i guess we'll see.

lib/Analysis/MemoryDependenceAnalysis.cpp
333 ↗	(On Diff #83945)	Please try to separate out the renaming from the actual logic changes
340 ↗	(On Diff #83945)	Err, doesn't it simply indicate there may be a non-local def because it hit the top of the block? (that's what it indicates for everything else) How does it indicate there must be one?
379 ↗	(On Diff #83945)	This loop is N^2, because each of these dominates calls make take O(N) time.
429 ↗	(On Diff #83945)	As long as you are willing to commit to doing this right, i'm okay with it.

In D28137#642786, @dberlin wrote:

The N^2 loop worries me a lot, because it means i can easily construct cases where you've made memdep N^3 or N^4 in the case of invariant group dependencies.

But i guess we'll see.

Yea, but at least for clang it won't matter because it deosn't use invariant.group as default. I will make sure MemSSA will support it with constant time before turing it default.

Good news. I asked Teresa Johnson to run few benchmarks for me and this is results she got

"
Here are the results (ran 3x on ivybridge, showing both aggregated and non-aggregated improvements so it is easier to filter out noise). Looks like a real win in povray. The omnetpp improvement looks like noise, ditto for xalancbmk. Not sure about namd.
Teresa

Reference: D28137_0 + D28137_1 + D28137_2
(1): D28137_strictvtableptrs_0 + D28137_strictvtableptrs_1 + D28137_strictvtableptrs_2

           Benchmark             Base:Reference   (1)  
-------------------------------------------------------
spec/2006/fp/C++/444.namd                 25.51  +0.86%
spec/2006/fp/C++/447.dealII               43.52  +0.44%
spec/2006/fp/C++/450.soplex               41.91  +0.13%
spec/2006/fp/C++/453.povray               35.35  +4.50%
spec/2006/int/C++/471.omnetpp             26.11  +2.71%
spec/2006/int/C++/473.astar               20.52  +0.21%
spec/2006/int/C++/483.xalancbmk           34.58  +1.18%

geometric mean                                   +1.42% 


Reference: D28137_0 + D28137_1 + D28137_2
(1): D28137_0
(2): D28137_1
(3): D28137_2
(4): D28137_strictvtableptrs_0
(5): D28137_strictvtableptrs_1
(6): D28137_strictvtableptrs_2

           Benchmark             Base:Reference   (1)     (2)     (3)     (4)     (5)      (6)  
------------------------------------------------------------------------------------------------
spec/2006/fp/C++/444.namd                 25.51  -0.55%  +0.08%  +0.47%  +0.94%  +1.14%   +0.51%
spec/2006/fp/C++/447.dealII               43.52  +0.15%  -0.58%  +0.43%  +0.70%  +0.68%   -0.08%
spec/2006/fp/C++/450.soplex               41.91  -1.46%  +1.36%  +0.10%  +0.36%  -0.84%   +0.86%
spec/2006/fp/C++/453.povray               35.35  -0.88%  +0.25%  +0.62%  +3.76%  +4.47%   +5.26%
spec/2006/int/C++/471.omnetpp             26.11  +1.33%  -0.82%  -0.51%  -1.09%  -1.16%  +10.37%
spec/2006/int/C++/473.astar               20.52  -0.36%  -0.31%  +0.67%  +0.13%  +0.28%   +0.23%
spec/2006/int/C++/483.xalancbmk           34.58  -0.99%  -0.36%  +1.35%  +0.80%  -0.82%   +3.55%

geometric mean                                   -0.40%  -0.06%  +0.44%  +0.79%  +0.52%   +2.90%

Before this patch and handling of gep 0, there was -2% regression in povray, so this is very good sign!

Prazek added inline comments.Jan 11 2017, 8:16 AM

lib/Analysis/MemoryDependenceAnalysis.cpp

340 ↗

(On Diff #83945)

Because this is what getInvariantGroupPointerDependency is doing and what documentation says.

 Returns Unknown if it does not
/// find anything, and Def if it can be assumed that 2 instructions load or
/// store the same value and NonLocal which indicate that non-local Def was
/// found, which can be retrieved by calling getNonLocalPointerDependency
/// with the same queried instruction.

379 ↗

(On Diff #83945)

ok, I will leave FIXME

429 ↗

(On Diff #83945)

yep, implementing it in MSSA sounds like a real fun!

rebase typos

• dberlin added inline comments.Jan 11 2017, 10:40 AM

lib/Analysis/MemoryDependenceAnalysis.cpp
340 ↗	(On Diff #83945)	Right, i'm suggesting that this is different than all the other local pointer dependency getters, and thus , confusing /// This marker indicates that the query has no dependency in the specified /// block. /// /// To find out more, the client should query other predecessor blocks. NonLocal = 1,

updated comments

Closed by commit rL291762: [Devirtualization] MemDep returns non-local !invariant.group dependencies (authored by Prazek). · Explain WhyJan 12 2017, 3:45 AM

This revision was automatically updated to reflect the committed changes.

Just for the record - Teresa runned benchmarks with trunk instead of trunk+patch. After applying patch there were some regressions, but it might be because of some other things going out in the middle, so there is no clear verdict.

Revision Contents

Path

Size

llvm/

trunk/

include/

llvm/

Analysis/

MemoryDependenceAnalysis.h

10 lines

lib/

Analysis/

MemoryDependenceAnalysis.cpp

63 lines

test/

Transforms/

GVN/

assume-equal.ll

8 lines

invariant.group.ll

38 lines

NewGVN/

assume-equal.ll

8 lines

invariant.group.ll

39 lines

Diff 84102

llvm/trunk/include/llvm/Analysis/MemoryDependenceAnalysis.h

Show First 20 Lines • Show All 296 Lines • ▼ Show 20 Lines	struct NonLocalPointerInfo {
/// The AA tags associated with dereferences of the pointer.		/// The AA tags associated with dereferences of the pointer.
///		///
/// The members may be null if there are no tags or conflicting tags.		/// The members may be null if there are no tags or conflicting tags.
AAMDNodes AATags;		AAMDNodes AATags;

NonLocalPointerInfo() : Size(MemoryLocation::UnknownSize) {}		NonLocalPointerInfo() : Size(MemoryLocation::UnknownSize) {}
};		};

		/// Cache storing single nonlocal def for the instruction.
		/// It is set when nonlocal def would be found in function returning only
		/// local dependencies.
		DenseMap<Instruction *, NonLocalDepResult> NonLocalDefsCache;
/// This map stores the cached results of doing a pointer lookup at the		/// This map stores the cached results of doing a pointer lookup at the
/// bottom of a block.		/// bottom of a block.
///		///
/// The key of this map is the pointer+isload bit, the value is a list of		/// The key of this map is the pointer+isload bit, the value is a list of
/// <bb->result> mappings.		/// <bb->result> mappings.
typedef DenseMap<ValueIsLoadPair, NonLocalPointerInfo>		typedef DenseMap<ValueIsLoadPair, NonLocalPointerInfo>
CachedNonLocalPointerInfo;		CachedNonLocalPointerInfo;
CachedNonLocalPointerInfo NonLocalPointerDeps;		CachedNonLocalPointerInfo NonLocalPointerDeps;
▲ Show 20 Lines • Show All 123 Lines • ▼ Show 20 Lines	MemDepResult getSimplePointerDependencyFrom(const MemoryLocation &MemLoc,
BasicBlock::iterator ScanIt,		BasicBlock::iterator ScanIt,
BasicBlock *BB,		BasicBlock *BB,
Instruction *QueryInst,		Instruction *QueryInst,
unsigned *Limit = nullptr);		unsigned *Limit = nullptr);

/// This analysis looks for other loads and stores with invariant.group		/// This analysis looks for other loads and stores with invariant.group
/// metadata and the same pointer operand. Returns Unknown if it does not		/// metadata and the same pointer operand. Returns Unknown if it does not
/// find anything, and Def if it can be assumed that 2 instructions load or		/// find anything, and Def if it can be assumed that 2 instructions load or
/// store the same value.		/// store the same value and NonLocal which indicate that non-local Def was
/// FIXME: This analysis works only on single block because of restrictions		/// found, which can be retrieved by calling getNonLocalPointerDependency
/// at the call site.		/// with the same queried instruction.
MemDepResult getInvariantGroupPointerDependency(LoadInst LI, BasicBlock BB);		MemDepResult getInvariantGroupPointerDependency(LoadInst LI, BasicBlock BB);

/// Looks at a memory location for a load (specified by MemLocBase, Offs, and		/// Looks at a memory location for a load (specified by MemLocBase, Offs, and
/// Size) and compares it against a load.		/// Size) and compares it against a load.
///		///
/// If the specified load could be safely widened to a larger integer load		/// If the specified load could be safely widened to a larger integer load
/// that is 1) still efficient, 2) safe for the target, and 3) would provide		/// that is 1) still efficient, 2) safe for the target, and 3) would provide
/// the specified memory location value, then this function returns the size		/// the specified memory location value, then this function returns the size
▲ Show 20 Lines • Show All 69 Lines • Show Last 20 Lines

llvm/trunk/lib/Analysis/MemoryDependenceAnalysis.cpp

Show First 20 Lines • Show All 317 Lines • ▼ Show 20 Lines	else if (AtomicCmpXchgInst *AI = dyn_cast<AtomicCmpXchgInst>(Inst))
return AI->isVolatile();		return AI->isVolatile();
return false;		return false;
}		}

MemDepResult MemoryDependenceResults::getPointerDependencyFrom(		MemDepResult MemoryDependenceResults::getPointerDependencyFrom(
const MemoryLocation &MemLoc, bool isLoad, BasicBlock::iterator ScanIt,		const MemoryLocation &MemLoc, bool isLoad, BasicBlock::iterator ScanIt,
BasicBlock BB, Instruction QueryInst, unsigned *Limit) {		BasicBlock BB, Instruction QueryInst, unsigned *Limit) {

		MemDepResult InvariantGroupDependency = MemDepResult::getUnknown();
if (QueryInst != nullptr) {		if (QueryInst != nullptr) {
if (auto *LI = dyn_cast<LoadInst>(QueryInst)) {		if (auto *LI = dyn_cast<LoadInst>(QueryInst)) {
MemDepResult InvariantGroupDependency =		InvariantGroupDependency = getInvariantGroupPointerDependency(LI, BB);
getInvariantGroupPointerDependency(LI, BB);

if (InvariantGroupDependency.isDef())		if (InvariantGroupDependency.isDef())
return InvariantGroupDependency;		return InvariantGroupDependency;
}		}
}		}
return getSimplePointerDependencyFrom(MemLoc, isLoad, ScanIt, BB, QueryInst,		MemDepResult SimpleDep = getSimplePointerDependencyFrom(
Limit);		MemLoc, isLoad, ScanIt, BB, QueryInst, Limit);
		if (SimpleDep.isDef())
		return SimpleDep;
		// Non-local invariant group dependency indicates there is non local Def
		// (it only returns nonLocal if it finds nonLocal def), which is better than
		// local clobber and everything else.
		if (InvariantGroupDependency.isNonLocal())
		return InvariantGroupDependency;

		assert(InvariantGroupDependency.isUnknown() &&
		"InvariantGroupDependency should be only unknown at this point");
		return SimpleDep;
}		}

MemDepResult		MemDepResult
MemoryDependenceResults::getInvariantGroupPointerDependency(LoadInst *LI,		MemoryDependenceResults::getInvariantGroupPointerDependency(LoadInst *LI,
BasicBlock *BB) {		BasicBlock *BB) {

auto *InvariantGroupMD = LI->getMetadata(LLVMContext::MD_invariant_group);		auto *InvariantGroupMD = LI->getMetadata(LLVMContext::MD_invariant_group);
if (!InvariantGroupMD)		if (!InvariantGroupMD)
return MemDepResult::getUnknown();		return MemDepResult::getUnknown();

// Take the ptr operand after all casts and geps 0. This way we can search		// Take the ptr operand after all casts and geps 0. This way we can search
// cast graph down only.		// cast graph down only.
Value *LoadOperand = LI->getPointerOperand()->stripPointerCasts();		Value *LoadOperand = LI->getPointerOperand()->stripPointerCasts();

// It's is not safe to walk the use list of global value, because function		// It's is not safe to walk the use list of global value, because function
// passes aren't allowed to look outside their functions.		// passes aren't allowed to look outside their functions.
// FIXME: this could be fixed by filtering instructions from outside		// FIXME: this could be fixed by filtering instructions from outside
// of current function.		// of current function.
if (isa<GlobalValue>(LoadOperand))		if (isa<GlobalValue>(LoadOperand))
return MemDepResult::getUnknown();		return MemDepResult::getUnknown();

// Queue to process all pointers that are equivalent to load operand.		// Queue to process all pointers that are equivalent to load operand.
SmallVector<const Value *, 8> LoadOperandsQueue;		SmallVector<const Value *, 8> LoadOperandsQueue;
LoadOperandsQueue.push_back(LoadOperand);		LoadOperandsQueue.push_back(LoadOperand);

		Instruction *ClosestDependency = nullptr;
		// Order of instructions in uses list is unpredictible. In order to always
		// get the same result, we will look for the closest dominance.
		auto GetClosestDependency = [this](Instruction Best, Instruction Other) {
		assert(Other && "Must call it with not null instruction");
		if (Best == nullptr \|\| DT.dominates(Best, Other))
		return Other;
		return Best;
		};


		// FIXME: This loop is O(N^2) because dominates can be O(n) and in worst case
		// we will see all the instructions. This should be fixed in MSSA.
while (!LoadOperandsQueue.empty()) {		while (!LoadOperandsQueue.empty()) {
const Value *Ptr = LoadOperandsQueue.pop_back_val();		const Value *Ptr = LoadOperandsQueue.pop_back_val();
assert(Ptr && !isa<GlobalValue>(Ptr) &&		assert(Ptr && !isa<GlobalValue>(Ptr) &&
"Null or GlobalValue should not be inserted");		"Null or GlobalValue should not be inserted");

for (const Use &Us : Ptr->uses()) {		for (const Use &Us : Ptr->uses()) {
auto *U = dyn_cast<Instruction>(Us.getUser());		auto *U = dyn_cast<Instruction>(Us.getUser());
if (!U \|\| U == LI \|\| !DT.dominates(U, LI))		if (!U \|\| U == LI \|\| !DT.dominates(U, LI))
Show All 14 Lines	for (const Use &Us : Ptr->uses()) {
if (GEP->hasAllZeroIndices()) {		if (GEP->hasAllZeroIndices()) {
LoadOperandsQueue.push_back(U);		LoadOperandsQueue.push_back(U);
continue;		continue;
}		}

// If we hit load/store with the same invariant.group metadata (and the		// If we hit load/store with the same invariant.group metadata (and the
// same pointer operand) we can assume that value pointed by pointer		// same pointer operand) we can assume that value pointed by pointer
// operand didn't change.		// operand didn't change.
if ((isa<LoadInst>(U) \|\| isa<StoreInst>(U)) && U->getParent() == BB &&		if ((isa<LoadInst>(U) \|\| isa<StoreInst>(U)) &&
U->getMetadata(LLVMContext::MD_invariant_group) == InvariantGroupMD)		U->getMetadata(LLVMContext::MD_invariant_group) == InvariantGroupMD)
return MemDepResult::getDef(U);		ClosestDependency = GetClosestDependency(ClosestDependency, U);
}		}
}		}

		if (!ClosestDependency)
return MemDepResult::getUnknown();		return MemDepResult::getUnknown();
		if (ClosestDependency->getParent() == BB)
		return MemDepResult::getDef(ClosestDependency);
		// Def(U) can't be returned here because it is non-local. If local
		// dependency won't be found then return nonLocal counting that the
		// user will call getNonLocalPointerDependency, which will return cached
		// result.
		NonLocalDefsCache.try_emplace(
		LI, NonLocalDepResult(ClosestDependency->getParent(),
		MemDepResult::getDef(ClosestDependency), nullptr));
		return MemDepResult::getNonLocal();
}		}

MemDepResult MemoryDependenceResults::getSimplePointerDependencyFrom(		MemDepResult MemoryDependenceResults::getSimplePointerDependencyFrom(
const MemoryLocation &MemLoc, bool isLoad, BasicBlock::iterator ScanIt,		const MemoryLocation &MemLoc, bool isLoad, BasicBlock::iterator ScanIt,
BasicBlock BB, Instruction QueryInst, unsigned *Limit) {		BasicBlock BB, Instruction QueryInst, unsigned *Limit) {
bool isInvariantLoad = false;		bool isInvariantLoad = false;

if (!Limit) {		if (!Limit) {
▲ Show 20 Lines • Show All 467 Lines • ▼ Show 20 Lines	void MemoryDependenceResults::getNonLocalPointerDependency(
const MemoryLocation Loc = MemoryLocation::get(QueryInst);		const MemoryLocation Loc = MemoryLocation::get(QueryInst);
bool isLoad = isa<LoadInst>(QueryInst);		bool isLoad = isa<LoadInst>(QueryInst);
BasicBlock *FromBB = QueryInst->getParent();		BasicBlock *FromBB = QueryInst->getParent();
assert(FromBB);		assert(FromBB);

assert(Loc.Ptr->getType()->isPointerTy() &&		assert(Loc.Ptr->getType()->isPointerTy() &&
"Can't get pointer deps of a non-pointer!");		"Can't get pointer deps of a non-pointer!");
Result.clear();		Result.clear();
		{
		// Check if there is cached Def with invariant.group. FIXME: cache might be
		// invalid if cached instruction would be removed between call to
		// getPointerDependencyFrom and this function.
		auto NonLocalDefIt = NonLocalDefsCache.find(QueryInst);
		if (NonLocalDefIt != NonLocalDefsCache.end()) {
		Result.push_back(std::move(NonLocalDefIt->second));
		NonLocalDefsCache.erase(NonLocalDefIt);
		return;
		}
		}
// This routine does not expect to deal with volatile instructions.		// This routine does not expect to deal with volatile instructions.
// Doing so would require piping through the QueryInst all the way through.		// Doing so would require piping through the QueryInst all the way through.
// TODO: volatiles can't be elided, but they can be reordered with other		// TODO: volatiles can't be elided, but they can be reordered with other
// non-volatile accesses.		// non-volatile accesses.

// We currently give up on any instruction which is ordered, but we do handle		// We currently give up on any instruction which is ordered, but we do handle
// atomic instructions which are unordered.		// atomic instructions which are unordered.
// TODO: Handle ordered instructions		// TODO: Handle ordered instructions
▲ Show 20 Lines • Show All 836 Lines • Show Last 20 Lines

llvm/trunk/test/Transforms/GVN/assume-equal.ll

Show First 20 Lines • Show All 59 Lines • ▼ Show 20 Lines	if.then: ; preds = %entry
%vtable1.cast = bitcast i8** %vtable to i32 (%struct.A)*		%vtable1.cast = bitcast i8** %vtable to i32 (%struct.A)*
%2 = load i32 (%struct.A), i32 (%struct.A)* %vtable1.cast, align 8		%2 = load i32 (%struct.A), i32 (%struct.A)* %vtable1.cast, align 8

; CHECK: call i32 @_ZN1A3fooEv(		; CHECK: call i32 @_ZN1A3fooEv(
%call2 = tail call i32 %2(%struct.A* %0) #1		%call2 = tail call i32 %2(%struct.A* %0) #1
%vtable1 = load i8, i8* %1, align 8, !invariant.group !0		%vtable1 = load i8, i8* %1, align 8, !invariant.group !0
%vtable2.cast = bitcast i8** %vtable1 to i32 (%struct.A)*		%vtable2.cast = bitcast i8** %vtable1 to i32 (%struct.A)*
%call1 = load i32 (%struct.A), i32 (%struct.A)* %vtable2.cast, align 8		%call1 = load i32 (%struct.A), i32 (%struct.A)* %vtable2.cast, align 8
; FIXME: those loads could be also direct, but right now the invariant.group		; CHECK: call i32 @_ZN1A3fooEv(
; analysis works only on single block
; CHECK-NOT: call i32 @_ZN1A3fooEv(
%callx = tail call i32 %call1(%struct.A* %0) #1		%callx = tail call i32 %call1(%struct.A* %0) #1

%vtable2 = load i8, i8* %1, align 8, !invariant.group !0		%vtable2 = load i8, i8* %1, align 8, !invariant.group !0
%vtable3.cast = bitcast i8** %vtable2 to i32 (%struct.A)*		%vtable3.cast = bitcast i8** %vtable2 to i32 (%struct.A)*
%call4 = load i32 (%struct.A), i32 (%struct.A)* %vtable3.cast, align 8		%call4 = load i32 (%struct.A), i32 (%struct.A)* %vtable3.cast, align 8
; CHECK-NOT: call i32 @_ZN1A3fooEv(		; CHECK: call i32 @_ZN1A3fooEv(
%cally = tail call i32 %call4(%struct.A* %0) #1		%cally = tail call i32 %call4(%struct.A* %0) #1

%b = bitcast i8* %call to %struct.A**		%b = bitcast i8* %call to %struct.A**
%vtable3 = load %struct.A, %struct.A* %b, align 8, !invariant.group !0		%vtable3 = load %struct.A, %struct.A* %b, align 8, !invariant.group !0
%vtable4.cast = bitcast %struct.A* %vtable3 to i32 (%struct.A)*		%vtable4.cast = bitcast %struct.A* %vtable3 to i32 (%struct.A)*
%vfun = load i32 (%struct.A), i32 (%struct.A)* %vtable4.cast, align 8		%vfun = load i32 (%struct.A), i32 (%struct.A)* %vtable4.cast, align 8
; CHECK-NOT: call i32 @_ZN1A3fooEv(		; CHECK: call i32 @_ZN1A3fooEv(
%unknown = tail call i32 %vfun(%struct.A* %0) #1		%unknown = tail call i32 %vfun(%struct.A* %0) #1

br label %if.end		br label %if.end

if.else: ; preds = %entry		if.else: ; preds = %entry
%vfn47 = getelementptr inbounds i8, i8* %vtable, i64 1		%vfn47 = getelementptr inbounds i8, i8* %vtable, i64 1
%vfn4 = bitcast i8** %vfn47 to i32 (%struct.A)*		%vfn4 = bitcast i8** %vfn47 to i32 (%struct.A)*

▲ Show 20 Lines • Show All 184 Lines • Show Last 20 Lines

llvm/trunk/test/Transforms/GVN/invariant.group.ll

	Show First 20 Lines • Show All 386 Lines • ▼ Show 20 Lines
	; CHECK-NEXT: call void @fooBit(i1* %b0, i1 %trunc)			; CHECK-NEXT: call void @fooBit(i1* %b0, i1 %trunc)
	call void @fooBit(i1* %b0, i1 %2)			call void @fooBit(i1* %b0, i1 %2)
	%3 = load i1, i1* %b0, !invariant.group !0			%3 = load i1, i1* %b0, !invariant.group !0
	; CHECK-NEXT: call void @fooBit(i1* %b0, i1 %trunc)			; CHECK-NEXT: call void @fooBit(i1* %b0, i1 %trunc)
	call void @fooBit(i1* %b0, i1 %3)			call void @fooBit(i1* %b0, i1 %3)
	ret void			ret void
	}			}

				; CHECK-LABEL: define void @handling_loops()
				define void @handling_loops() {
				%a = alloca %struct.A, align 8
				%1 = bitcast %struct.A* %a to i8*
				%2 = getelementptr inbounds %struct.A, %struct.A* %a, i64 0, i32 0
				store i32 (...) bitcast (i8 getelementptr inbounds ([3 x i8], [3 x i8]* @_ZTV1A, i64 0, i64 2) to i32 (...)), i32 (...)* %2, align 8, !invariant.group !0
				%3 = load i8, i8* @unknownPtr, align 4
				%4 = icmp sgt i8 %3, 0
				br i1 %4, label %.lr.ph.i, label %_Z2g2R1A.exit

				.lr.ph.i: ; preds = %0
				%5 = bitcast %struct.A* %a to void (%struct.A)**
				%6 = load i8, i8* @unknownPtr, align 4
				%7 = icmp sgt i8 %6, 1
				br i1 %7, label %._crit_edge.preheader, label %_Z2g2R1A.exit

				._crit_edge.preheader: ; preds = %.lr.ph.i
				br label %._crit_edge

				._crit_edge: ; preds = %._crit_edge.preheader, %._crit_edge
				%8 = phi i8 [ %10, %._crit_edge ], [ 1, %._crit_edge.preheader ]
				%.pre = load void (%struct.A), void (%struct.A)*** %5, align 8, !invariant.group !0
				%9 = load void (%struct.A), void (%struct.A)* %.pre, align 8
				; CHECK: call void @_ZN1A3fooEv(%struct.A* nonnull %a)
				call void %9(%struct.A* nonnull %a) #3
				; CHECK-NOT: call void %
				%10 = add nuw nsw i8 %8, 1
				%11 = load i8, i8* @unknownPtr, align 4
				%12 = icmp slt i8 %10, %11
				br i1 %12, label %._crit_edge, label %_Z2g2R1A.exit.loopexit

				_Z2g2R1A.exit.loopexit: ; preds = %._crit_edge
				br label %_Z2g2R1A.exit

				_Z2g2R1A.exit: ; preds = %_Z2g2R1A.exit.loopexit, %.lr.ph.i, %0
				ret void
				}


	declare void @foo(i8*)			declare void @foo(i8*)
	declare void @foo2(i8*, i8)			declare void @foo2(i8*, i8)
	declare void @bar(i8)			declare void @bar(i8)
	declare i8* @getPointer(i8*)			declare i8* @getPointer(i8*)
	declare void @_ZN1A3fooEv(%struct.A*)			declare void @_ZN1A3fooEv(%struct.A*)
	declare void @_ZN1AC1Ev(%struct.A*)			declare void @_ZN1AC1Ev(%struct.A*)
	declare void @fooBit(i1*, i1)			declare void @fooBit(i1*, i1)
	Show All 12 Lines

llvm/trunk/test/Transforms/NewGVN/assume-equal.ll

Show First 20 Lines • Show All 60 Lines • ▼ Show 20 Lines	if.then: ; preds = %entry
%vtable1.cast = bitcast i8** %vtable to i32 (%struct.A)*		%vtable1.cast = bitcast i8** %vtable to i32 (%struct.A)*
%2 = load i32 (%struct.A), i32 (%struct.A)* %vtable1.cast, align 8		%2 = load i32 (%struct.A), i32 (%struct.A)* %vtable1.cast, align 8

; CHECK: call i32 @_ZN1A3fooEv(		; CHECK: call i32 @_ZN1A3fooEv(
%call2 = tail call i32 %2(%struct.A* %0) #1		%call2 = tail call i32 %2(%struct.A* %0) #1
%vtable1 = load i8, i8* %1, align 8, !invariant.group !0		%vtable1 = load i8, i8* %1, align 8, !invariant.group !0
%vtable2.cast = bitcast i8** %vtable1 to i32 (%struct.A)*		%vtable2.cast = bitcast i8** %vtable1 to i32 (%struct.A)*
%call1 = load i32 (%struct.A), i32 (%struct.A)* %vtable2.cast, align 8		%call1 = load i32 (%struct.A), i32 (%struct.A)* %vtable2.cast, align 8
; FIXME: those loads could be also direct, but right now the invariant.group		; CHECK: call i32 @_ZN1A3fooEv(
; analysis works only on single block
; CHECK-NOT: call i32 @_ZN1A3fooEv(
%callx = tail call i32 %call1(%struct.A* %0) #1		%callx = tail call i32 %call1(%struct.A* %0) #1

%vtable2 = load i8, i8* %1, align 8, !invariant.group !0		%vtable2 = load i8, i8* %1, align 8, !invariant.group !0
%vtable3.cast = bitcast i8** %vtable2 to i32 (%struct.A)*		%vtable3.cast = bitcast i8** %vtable2 to i32 (%struct.A)*
%call4 = load i32 (%struct.A), i32 (%struct.A)* %vtable3.cast, align 8		%call4 = load i32 (%struct.A), i32 (%struct.A)* %vtable3.cast, align 8
; CHECK-NOT: call i32 @_ZN1A3fooEv(		; CHECK: call i32 @_ZN1A3fooEv(
%cally = tail call i32 %call4(%struct.A* %0) #1		%cally = tail call i32 %call4(%struct.A* %0) #1

%b = bitcast i8* %call to %struct.A**		%b = bitcast i8* %call to %struct.A**
%vtable3 = load %struct.A, %struct.A* %b, align 8, !invariant.group !0		%vtable3 = load %struct.A, %struct.A* %b, align 8, !invariant.group !0
%vtable4.cast = bitcast %struct.A* %vtable3 to i32 (%struct.A)*		%vtable4.cast = bitcast %struct.A* %vtable3 to i32 (%struct.A)*
%vfun = load i32 (%struct.A), i32 (%struct.A)* %vtable4.cast, align 8		%vfun = load i32 (%struct.A), i32 (%struct.A)* %vtable4.cast, align 8
; CHECK-NOT: call i32 @_ZN1A3fooEv(		; CHECK: call i32 @_ZN1A3fooEv(
%unknown = tail call i32 %vfun(%struct.A* %0) #1		%unknown = tail call i32 %vfun(%struct.A* %0) #1

br label %if.end		br label %if.end

if.else: ; preds = %entry		if.else: ; preds = %entry
%vfn47 = getelementptr inbounds i8, i8* %vtable, i64 1		%vfn47 = getelementptr inbounds i8, i8* %vtable, i64 1
%vfn4 = bitcast i8** %vfn47 to i32 (%struct.A)*		%vfn4 = bitcast i8** %vfn47 to i32 (%struct.A)*

▲ Show 20 Lines • Show All 184 Lines • Show Last 20 Lines

llvm/trunk/test/Transforms/NewGVN/invariant.group.ll

	Show First 20 Lines • Show All 387 Lines • ▼ Show 20 Lines
	; CHECK-NEXT: call void @fooBit(i1* %b0, i1 %trunc)			; CHECK-NEXT: call void @fooBit(i1* %b0, i1 %trunc)
	call void @fooBit(i1* %b0, i1 %2)			call void @fooBit(i1* %b0, i1 %2)
	%3 = load i1, i1* %b0, !invariant.group !0			%3 = load i1, i1* %b0, !invariant.group !0
	; CHECK-NEXT: call void @fooBit(i1* %b0, i1 %trunc)			; CHECK-NEXT: call void @fooBit(i1* %b0, i1 %trunc)
	call void @fooBit(i1* %b0, i1 %3)			call void @fooBit(i1* %b0, i1 %3)
	ret void			ret void
	}			}

				; CHECK-LABEL: define void @handling_loops()
				define void @handling_loops() {
				%a = alloca %struct.A, align 8
				%1 = bitcast %struct.A* %a to i8*
				%2 = getelementptr inbounds %struct.A, %struct.A* %a, i64 0, i32 0
				store i32 (...) bitcast (i8 getelementptr inbounds ([3 x i8], [3 x i8]* @_ZTV1A, i64 0, i64 2) to i32 (...)), i32 (...)* %2, align 8, !invariant.group !0
				%3 = load i8, i8* @unknownPtr, align 4
				%4 = icmp sgt i8 %3, 0
				br i1 %4, label %.lr.ph.i, label %_Z2g2R1A.exit

				.lr.ph.i: ; preds = %0
				%5 = bitcast %struct.A* %a to void (%struct.A)**
				%6 = load i8, i8* @unknownPtr, align 4
				%7 = icmp sgt i8 %6, 1
				br i1 %7, label %._crit_edge.preheader, label %_Z2g2R1A.exit

				._crit_edge.preheader: ; preds = %.lr.ph.i
				br label %._crit_edge

				._crit_edge: ; preds = %._crit_edge.preheader, %._crit_edge
				%8 = phi i8 [ %10, %._crit_edge ], [ 1, %._crit_edge.preheader ]
				%.pre = load void (%struct.A), void (%struct.A)*** %5, align 8, !invariant.group !0
				%9 = load void (%struct.A), void (%struct.A)* %.pre, align 8
				; CHECK: call void @_ZN1A3fooEv(%struct.A* nonnull %a)
				call void %9(%struct.A* nonnull %a) #3

				; CHECK-NOT: call void %
				%10 = add nuw nsw i8 %8, 1
				%11 = load i8, i8* @unknownPtr, align 4
				%12 = icmp slt i8 %10, %11
				br i1 %12, label %._crit_edge, label %_Z2g2R1A.exit.loopexit

				_Z2g2R1A.exit.loopexit: ; preds = %._crit_edge
				br label %_Z2g2R1A.exit

				_Z2g2R1A.exit: ; preds = %_Z2g2R1A.exit.loopexit, %.lr.ph.i, %0
				ret void
				}


	declare void @foo(i8*)			declare void @foo(i8*)
	declare void @foo2(i8*, i8)			declare void @foo2(i8*, i8)
	declare void @bar(i8)			declare void @bar(i8)
	declare i8* @getPointer(i8*)			declare i8* @getPointer(i8*)
	declare void @_ZN1A3fooEv(%struct.A*)			declare void @_ZN1A3fooEv(%struct.A*)
	declare void @_ZN1AC1Ev(%struct.A*)			declare void @_ZN1AC1Ev(%struct.A*)
	declare void @fooBit(i1*, i1)			declare void @fooBit(i1*, i1)
	Show All 12 Lines

This is an archive of the discontinued LLVM Phabricator instance.

[Devirtualization] MemDep returns non-local !invariant.group dependenciesClosedPublic

Details

Diff Detail

Event Timeline

Revision Contents

Diff 84102

llvm/trunk/include/llvm/Analysis/MemoryDependenceAnalysis.h

llvm/trunk/lib/Analysis/MemoryDependenceAnalysis.cpp

llvm/trunk/test/Transforms/GVN/assume-equal.ll

llvm/trunk/test/Transforms/GVN/invariant.group.ll

llvm/trunk/test/Transforms/NewGVN/assume-equal.ll

llvm/trunk/test/Transforms/NewGVN/invariant.group.ll

[Devirtualization] MemDep returns non-local !invariant.group dependencies
ClosedPublic