This is an archive of the discontinued LLVM Phabricator instance.

Paths

Table of Contentst

-
llvm/trunk/
-
trunk/
-
include/llvm/Analysis/
-
llvm/
-
Analysis/
-
LazyCallGraph.h
-
lib/Analysis/
-
Analysis/
-
CGSCCPassManager.cpp
-
LazyCallGraph.cpp
-
unittests/Analysis/
-
Analysis/
-
LazyCallGraphTest.cpp

Differential D36352

[LCG] Switch one of the update methods for the LazyCallGraph to support limited batch updates.
ClosedPublic

Authored by chandlerc on Aug 5 2017, 5:21 AM.

Download Raw Diff

Details

Reviewers

sanjoy
davide
silvas

Commits

rG23c2f44cc7b8: [LCG] Switch one of the update methods for the LazyCallGraph to support limited…
rL310450: [LCG] Switch one of the update methods for the LazyCallGraph to support

Summary

Specifically, allow removing multiple reference edges starting from
a common source node. There are a few constraints that play into
supporting this form of batching:

The way updates occur during the CGSCC walk, about the most we can functionally batch together are those with a common source node. This also makes the batching simpler to implement, so it seems a worthwhile restriction.
The far and away hottest function for large C++ files I measured (generated code for protocol buffers) showed a huge amount of time was spent removing ref edges specifically, so it seems worth focusing there.
The algorithm for removing ref edges is very amenable to this restricted batching. There are just both API and implementation special casing for the non-batch case that gets in the way. Once removed, supporting batches is nearly trivial.

This does modify the API in an interesting way -- now, we only preserve
the target RefSCC when the RefSCC structure is unchanged. In the face of
any splits, we create brand new RefSCC objects. However, all of the
users were OK with it that I could find. Only the unittest needed
interesting updates here.

How much does batching these updates help? I instrumented the compiler
when run over a very large generated source file for a protocol buffer
and found that the majority of updates are intrinsically updating one
function at a time. However, nearly 40% of the total ref edges removed
are removed as part of a batch of removals greater than one, so these
are the cases batching can help with.

When compiling the IR for this file with 'opt' and 'O3', this patch
reduces the total time by 8-9%.

I'm still working on adding a bit of specific unittest coverage for the batch
part of the API, but wanted to go ahead and send for review as that isn't very
interesting.

Diff Detail

Repository: rL LLVM

Event Timeline

chandlerc created this revision.Aug 5 2017, 5:21 AM

Herald added subscribers: mcrosier, mehdi_amini. · View Herald TranscriptAug 5 2017, 5:21 AM

When compiling the IR for this file with 'opt' and 'O3', this patch reduces the total time by 8-9%.

Just as a point of reference, do you have handy how much of the total compile time are we spending in the PM's CGSCC / LCG stuff? If we can speed up the overall time by 8-9% by improving it then it suggests that the total time is much larger, which I find somewhat surprising (presumably, we should be spending the vast majority of our time inside of the optimizations themselves).

Is this module you're working on just a really pathological case?

Or is this maybe a situation where the denormalized ref edge representation is causing a bunch of extra work?

In D36352#833118, @silvas wrote:

When compiling the IR for this file with 'opt' and 'O3', this patch reduces the total time by 8-9%.

Just as a point of reference, do you have handy how much of the total compile time are we spending in the PM's CGSCC / LCG stuff?

Yep, that's how I got here. Before all of my changes to LCG, we spent just under 40% of the total optimizer time (and nearly that much of the total compile time) in this one method.

Removing the parent sets made this method obnoxiously faster (over 2x) and shaved just over 20% off the total optimizer time.

This patch makes this method another 2x faster, and shaves 8% off the total optimizer time.

After both of these, this method remains the only thing hot really, and it is now between 6-8% depending on the noise in the profile. I'm still looking for more things to make faster here. I think i have another 2-4 things that will make it faster.

If we can speed up the overall time by 8-9% by improving it then it suggests that the total time is much larger, which I find somewhat surprising (presumably, we should be spending the vast majority of our time inside of the optimizations themselves).

Me too. Turns out this method is really, really hot.

Is this module you're working on just a really pathological case?

It is entirely possible this module is pathological -- I've not seen this when profiling other compiles.

Unfortunately, it is a pathological case that is *generated code* that for me, happens to represent like 10% of all the files we compile and many of the slowest files we compile. So even if this module is really, really weird I still need it to compile very fast. =]

Or is this maybe a situation where the denormalized ref edge representation is causing a bunch of extra work?

I'm sure this is part of the problem. But fixing this is (much) harder than making this routine fast. And it doesn't *appear* to be the primary problem. I instrumented this routine expecting to see removal of 100,000+ edges all the time and think "oh, so its denormalized...". That wasn't what happened. In this (potentially pathological) case, there are only like 7k ref edges ever removed in the entire opt run! And most (some 3.5k) are removed due to a *single* ref edge becoming dead after the inliner runs. The fact that the optimizer is deleting one edge at a time is pretty strong evidence that the denormalization isn't as prominent in this problem as I would have expected. I'm pretty sure based on these numbers that the real reason this ends up slow in this module is the size of RefSCC that we're removing edges from.

Still, I suspect *this* patch to be mostly making the denormalized representation scale better (by batching their removal). But it seems much easier to do this than to switch representations, and this patch actually makes the code simpler, so it seemed like a fine intermediate step.

In D36352#833140, @chandlerc wrote:

In D36352#833118, @silvas wrote:

When compiling the IR for this file with 'opt' and 'O3', this patch reduces the total time by 8-9%.

Just as a point of reference, do you have handy how much of the total compile time are we spending in the PM's CGSCC / LCG stuff?

Yep, that's how I got here. Before all of my changes to LCG, we spent just under 40% of the total optimizer time (and nearly that much of the total compile time) in this one method.

Removing the parent sets made this method obnoxiously faster (over 2x) and shaved just over 20% off the total optimizer time.

This patch makes this method another 2x faster, and shaves 8% off the total optimizer time.

After both of these, this method remains the only thing hot really, and it is now between 6-8% depending on the noise in the profile. I'm still looking for more things to make faster here. I think i have another 2-4 things that will make it faster.

If we can speed up the overall time by 8-9% by improving it then it suggests that the total time is much larger, which I find somewhat surprising (presumably, we should be spending the vast majority of our time inside of the optimizations themselves).

Me too. Turns out this method is really, really hot.

Is this module you're working on just a really pathological case?

It is entirely possible this module is pathological -- I've not seen this when profiling other compiles.

Unfortunately, it is a pathological case that is *generated code* that for me, happens to represent like 10% of all the files we compile and many of the slowest files we compile. So even if this module is really, really weird I still need it to compile very fast. =]

Or is this maybe a situation where the denormalized ref edge representation is causing a bunch of extra work?

I'm sure this is part of the problem. But fixing this is (much) harder than making this routine fast. And it doesn't *appear* to be the primary problem. I instrumented this routine expecting to see removal of 100,000+ edges all the time and think "oh, so its denormalized...". That wasn't what happened. In this (potentially pathological) case, there are only like 7k ref edges ever removed in the entire opt run! And most (some 3.5k) are removed due to a *single* ref edge becoming dead after the inliner runs. The fact that the optimizer is deleting one edge at a time is pretty strong evidence that the denormalization isn't as prominent in this problem as I would have expected. I'm pretty sure based on these numbers that the real reason this ends up slow in this module is the size of RefSCC that we're removing edges from.

In case you haven't dug in, I just dug in a bit in Mathematica out of curiousity and it seems that the large RefSCC's come from the lazy descriptor initialization.

This is the only RefSCC in the .pb.cc file I tried that wasn't just a single node. The .proto this was generated contains a message Foo containing a message Bar containing a message Message3 (.proto: https://reviews.llvm.org/F4745358, .pb.cc: https://reviews.llvm.org/F4745329, .pb.h: https://reviews.llvm.org/F4745572).

https://reviews.llvm.org/F4740125
(sorry, Mathematica doesn't have an easy way to jitter the nodes with this layout so that the labels don't overlap; bummer)

_Z31protobuf_AssignDesc_foo_2eprotov contains a call edge to _Z28protobuf_AddDesc_foo_2eprotov which contains ref edges to _ZN3FooC2Ev and the corresponding constructors for Baz and Message3. In turn, these constructors reference the vtable pointer which leads to the MergeFrom (the virtual one taking a ::google::protobuf::Message) and virtual MergePartialFromCodedStream which are big methods that recursively call into similar code of any contained messages.

In other words, it seems that there will always be a RefSCC that is of size approximately O(num messages defined in a given .proto file) which explains why you're seeing such massive RefSCC's.

(the dot file this came from is https://reviews.llvm.org/F4735714 which was generated with opt -passes=print-lcg-dot)

Still, I suspect *this* patch to be mostly making the denormalized representation scale better (by batching their removal). But it seems much easier to do this than to switch representations, and this patch actually makes the code simpler, so it seemed like a fine intermediate step.

Makes sense. Just looking at the graph above, it makes intuitive sense because since vtables don't contain pointers to other vtables, the big fanout doesn't end up being multiplied by the transitive closure of base classes or something like that. I.e. the ref fanout is limited by the size of a single vtable, rather than potentially the total size of all vtables in a module. So that's not terribly problematic.

Sorry, to be clear, the entire RefSCC is something like:

for each message Foo:
_ZN3FooC2Ev --(ref through vtable)-> _ZNK3Foo11GetMetadataEv --> _ZN12_GLOBAL__N_130protobuf_AssignDescriptorsOnceEv --ref-> _Z31protobuf_AssignDesc_foo_2eprotov --> _Z28protobuf_AddDesc_foo_2eprotov --> _ZN3FooC2Ev (and all the other C2 constructors)

Additionally, to make the RefSCC even larger, when message Foo contains a Message Baz, then Foo::MergeFrom and Foo::MergePartialFromCodedStream end up referencing _ZN3FooC2Ev because they new a Baz object via the mutable_* methods (to make things even worse they also directly call Baz::{MergeFrom,MergePartialFromCodedStream}. So all the MergeFrom/MergePartialFromCodedStream end up in the RefSCC as well.

Really nice analysis. I hadn't gotten that far, so that is helpful. Sadly, the descriptor stuff being the source isn't too surprising to me.

Anyways, any thoughts about the patch itself? I've got at least one more fix prepared behind this.

craig.topper added a subscriber: craig.topper.Aug 6 2017, 4:11 PM

craig.topper added inline comments.

include/llvm/Analysis/LazyCallGraph.h
807 ↗	(On Diff #109863)	'intact' is one word
lib/Analysis/LazyCallGraph.cpp
1278 ↗	(On Diff #109863)	'current'

In D36352#833455, @chandlerc wrote:

Really nice analysis. I hadn't gotten that far, so that is helpful. Sadly, the descriptor stuff being the source isn't too surprising to me.

Anyways, any thoughts about the patch itself? I've got at least one more fix prepared behind this.

Overall it looks fairly mechanical and slightly cleaner after the patch. LGTM.

Also, I looked a bit more into the graph I posted above. There is one ref edge which is what really causes the large RefSCC: _ZN12_GLOBAL__N_130protobuf_AssignDescriptorsOnceEv -> _Z31protobuf_AssignDesc_foo_2eprotov. That edge comes from the once initialization code

GOOGLE_PROTOBUF_DECLARE_ONCE(protobuf_AssignDescriptors_once_);                                                                                                                                             
inline void protobuf_AssignDescriptorsOnce() {
  ::google::protobuf::GoogleOnceInit(&protobuf_AssignDescriptors_once_,
                 &protobuf_AssignDesc_foo_2eproto);
}

(it's a ref edge, but if we were to fully inline through GoogleOnceInit it would become a call edge. The actual dispatch to protobuf_AssignDesc_foo_2eproto happens through an external function though so we're saved)

If that edge is deleted (which we can't do statically, but just for the sake of investigation), then these are the only remaining nontrivial RefSCC's.

https://reviews.llvm.org/F4829448

There is one nontrivial RefSCC of constant size per message (Foo, Bar, Message3) where all edges within the RefSCC are ref edges. These are just all the functions that statically reference the vtable (constructor/destructor), together with any virtual functions that call back into them. These are basically constant size independent of the number of messages.

Then there is one RefSCC that is O(number of messages in the .proto file). The protobuf_AddDesc_foo_2eproto RefSCC. Note that all edges here are call edges. Basically this routine just recursively default intializes all default instances, which themselves recursively initialize any instances they depend on. The thing that ties the SCC together is that default_instance() looks like this:

const Baz& Baz::default_instance() {
  if (default_instance_ == NULL) protobuf_AddDesc_foo_2eproto();
  return *default_instance_;
}

However, only the default instance stuff gets pulled into this SCC, so even though it is O(number of messages in the proto file) the constant is much smaller than when the edge _ZN12_GLOBAL__N_130protobuf_AssignDescriptorsOnceEv -> _Z31protobuf_AssignDesc_foo_2eprotov is present (which pulls in the MergeFrom and MergePartialFromCodedStream and a bunch of getters/setters and stuff into the RefSCC).

lib/Analysis/LazyCallGraph.cpp
1133–1142 ↗	(On Diff #109863)	I found this a bit confusing. SourceC and TargetCs seem to be only used for this bailout here. So the section-delineating comment `// Collect the SCCs for the source and targets.` is a bit misleading (seems like it is collecting them for use later). Maybe just move the `If all targets are in the same SCC as the source ...` comment up to there and move the `if` a bit closer to make that clear. In another patch you may want to consider putting the tarjan walk below into a separate function which I think would make this easier to understand. The main reason I think it seems pretty important to do that is to make it clear to the reader that we aren't doing anything fancy using our knowledge of the SourceC and TargetCs and are just walking everything in the RefSCC again, which makes the complexity obvious. (maybe beefing up the comments would help too) (then again, I can see why you'd want to keep it all inline, because above you raw delete edges and so some of the invariants might be broken until we finish the update.)

This revision is now accepted and ready to land.Aug 7 2017, 12:25 PM

In D36352#834258, @silvas wrote:

(it's a ref edge, but if we were to fully inline through GoogleOnceInit it would become a call edge. The actual dispatch to protobuf_AssignDesc_foo_2eproto happens through an external function though so we're saved)

I said "saved" because I was mistakenly thinking that making this a call edge would have a large effect (and forgot to reword before posting), but it actually doesn't since the edges into this mainly come from Foo::GetMetadata (and Bar, Message3) which are only ever called virtually, so there is always a ref edge in the way to prevent the creation of a large CallSCC.

One way to break up this large RefSCC is to notice that the fanout from the vtable ref edges in the ctors/dtors is contributing a lot to it being so large. If those edges are made more precise, then it can reduce the size. For example, by breaking the vtable ref edges we prevent the ref edges to virtual GetMetadata methods and so the huge RefSCC breaks up as I mentioned earlier.

This can basically be seen as increasing the precision of the reference visitation; right now, it just assumes that all transitively reachable function pointers are ref edges. This leads to a lot of false positives, as ref edges are just meant to be a conservative superset of all edges that we might eventually discover through static optimization as call edges. In this case, the C2 constructors for Foo, Baz, and Metadata3 can be seen to contain no indirect calls and all their callees are either external or contain no indirect calls. So we can avoid adding any ref edges at all. A simple context-insensitive bottom-up RPO tracking whether an indirect call instruction is present be enough for this case (it would be another cache-busting walk over all instructions in the module though, but that may be worth it to decrease the number of ref edges / ref edge operations).

I'm sure the tradeoffs and stuff for doing this kind of "may call through this function pointer" analysis are well studied in the Java literature, but I'm not really very familiar with it. The main difference for our use case here is that external functions count as "contributes no ref edges" instead of "could dynamically call potentially everything" (since we aren't concerned with runtime calls, but rather a conservative superset of statically discoverable direct calls).
But of course the point of this is to be fast and get rid of the most egregious false positive ref edges so something simple is probably enough.

Thanks for the review all! I've fixed the mentioned issues and am landing based on Sean's LGTM.

In D36352#834396, @silvas wrote:

One way to break up this large RefSCC is to notice that the fanout from the vtable ref edges in the ctors/dtors is contributing a lot to it being so large. If those edges are made more precise, then it can reduce the size. For example, by breaking the vtable ref edges we prevent the ref edges to virtual GetMetadata methods and so the huge RefSCC breaks up as I mentioned earlier.

This can basically be seen as increasing the precision of the reference visitation; right now, it just assumes that all transitively reachable function pointers are ref edges. This leads to a lot of false positives, as ref edges are just meant to be a conservative superset of all edges that we might eventually discover through static optimization as call edges. In this case, the C2 constructors for Foo, Baz, and Metadata3 can be seen to contain no indirect calls and all their callees are either external or contain no indirect calls. So we can avoid adding any ref edges at all. A simple context-insensitive bottom-up RPO tracking whether an indirect call instruction is present be enough for this case (it would be another cache-busting walk over all instructions in the module though, but that may be worth it to decrease the number of ref edges / ref edge operations).

I'm sure the tradeoffs and stuff for doing this kind of "may call through this function pointer" analysis are well studied in the Java literature, but I'm not really very familiar with it. The main difference for our use case here is that external functions count as "contributes no ref edges" instead of "could dynamically call potentially everything" (since we aren't concerned with runtime calls, but rather a conservative superset of statically discoverable direct calls).
But of course the point of this is to be fast and get rid of the most egregious false positive ref edges so something simple is probably enough.

Sadly, I think this is a bit harder than it seems... We rely on reference edges to track *propagation* as well. So you can imagine a function F that is a leaf function and returns a function pointer. All callers might also have no indirect calls, but if we can inline F into them and inline some *other* function into them (or into their callers) we can still end up collapsing to a call.

Essentially, I think we'd end up looking at the entire reachable partition of the graph for indirect calls, and would almost always find one. =/

But honestly, with the fixes coming, even this fairly extreme case becomes very fast to handle, so I'm not too worried about this.

lib/Analysis/LazyCallGraph.cpp
1133–1142 ↗	(On Diff #109863)	Yeah, this ended up not being relevant and being confusing. I've switched it to a much simpler and much more direct test for the early exit. Regarding the longer-term issue of separating the Tarjan walk out, subsequent patches are likely to make this slightly less appealing, but I'm happy to revisit this in a subsequent patch if it makes sense.

Closed by commit rL310450: [LCG] Switch one of the update methods for the LazyCallGraph to support (authored by chandlerc). · Explain WhyAug 9 2017, 2:08 AM

This revision was automatically updated to reflect the committed changes.

chandlerc marked an inline comment as done.

Revision Contents

Path

Size

llvm/

trunk/

include/

llvm/

Analysis/

LazyCallGraph.h

33 lines

lib/

Analysis/

CGSCCPassManager.cpp

117 lines

LazyCallGraph.cpp

159 lines

unittests/

Analysis/

LazyCallGraphTest.cpp

117 lines

Diff 110334

llvm/trunk/include/llvm/Analysis/LazyCallGraph.h

Show First 20 Lines • Show All 789 Lines • ▼ Show 20 Lines	public:
/// formation, so this is always safe to call once you have the source		/// formation, so this is always safe to call once you have the source
/// RefSCC.		/// RefSCC.
///		///
/// This operation does not change the cyclic structure of the graph and so		/// This operation does not change the cyclic structure of the graph and so
/// is very inexpensive. It may change the connectivity graph of the SCCs		/// is very inexpensive. It may change the connectivity graph of the SCCs
/// though, so be careful calling this while iterating over them.		/// though, so be careful calling this while iterating over them.
void removeOutgoingEdge(Node &SourceN, Node &TargetN);		void removeOutgoingEdge(Node &SourceN, Node &TargetN);

/// Remove a ref edge which is entirely within this RefSCC.		/// Remove a list of ref edges which are entirely within this RefSCC.
///		///
/// Both the \a SourceN and the \a TargetN must be within this RefSCC.		/// Both the \a SourceN and all of the \a TargetNs must be within this
/// Removing such an edge may break cycles that form this RefSCC and thus		/// RefSCC. Removing these edges may break cycles that form this RefSCC and
/// this operation may change the RefSCC graph significantly. In		/// thus this operation may change the RefSCC graph significantly. In
/// particular, this operation will re-form new RefSCCs based on the		/// particular, this operation will re-form new RefSCCs based on the
/// remaining connectivity of the graph. The following invariants are		/// remaining connectivity of the graph. The following invariants are
/// guaranteed to hold after calling this method:		/// guaranteed to hold after calling this method:
///		///
/// 1) This RefSCC is still a RefSCC in the graph.		/// 1) If a ref-cycle remains after removal, it leaves this RefSCC intact
/// 2) This RefSCC will be the parent of any new RefSCCs. Thus, this RefSCC		/// and in the graph. No new RefSCCs are built.
/// is preserved as the root of any new RefSCC DAG formed.		/// 2) Otherwise, this RefSCC will be dead after this call and no longer in
/// 3) No RefSCC other than this RefSCC has its member set changed (this is		/// the graph or the postorder traversal of the call graph. Any iterator
		/// pointing at this RefSCC will become invalid.
		/// 3) All newly formed RefSCCs will be returned and the order of the
		/// RefSCCs returned will be a valid postorder traversal of the new
		/// RefSCCs.
		/// 4) No RefSCC other than this RefSCC has its member set changed (this is
/// inherent in the definition of removing such an edge).		/// inherent in the definition of removing such an edge).
/// 4) All of the parent links of the RefSCC graph will be updated to
/// reflect the new RefSCC structure.
/// 5) All RefSCCs formed out of this RefSCC, excluding this RefSCC, will
/// be returned in post-order.
/// 6) The order of the RefSCCs in the vector will be a valid postorder
/// traversal of the new RefSCCs.
///		///
/// These invariants are very important to ensure that we can build		/// These invariants are very important to ensure that we can build
/// optimization pipelines on top of the CGSCC pass manager which		/// optimization pipelines on top of the CGSCC pass manager which
/// intelligently update the RefSCC graph without invalidating other parts		/// intelligently update the RefSCC graph without invalidating other parts
/// of the RefSCC graph.		/// of the RefSCC graph.
///		///
/// Note that we provide no routine to remove a call edge. Instead, you		/// Note that we provide no routine to remove a call edge. Instead, you
/// must first switch it to a ref edge using \c switchInternalEdgeToRef.		/// must first switch it to a ref edge using \c switchInternalEdgeToRef.
/// This split API is intentional as each of these two steps can invalidate		/// This split API is intentional as each of these two steps can invalidate
/// a different aspect of the graph structure and needs to have the		/// a different aspect of the graph structure and needs to have the
/// invalidation handled independently.		/// invalidation handled independently.
///		///
/// The runtime complexity of this method is, in the worst case, O(V+E)		/// The runtime complexity of this method is, in the worst case, O(V+E)
/// where V is the number of nodes in this RefSCC and E is the number of		/// where V is the number of nodes in this RefSCC and E is the number of
/// edges leaving the nodes in this RefSCC. Note that E includes both edges		/// edges leaving the nodes in this RefSCC. Note that E includes both edges
/// within this RefSCC and edges from this RefSCC to child RefSCCs. Some		/// within this RefSCC and edges from this RefSCC to child RefSCCs. Some
/// effort has been made to minimize the overhead of common cases such as		/// effort has been made to minimize the overhead of common cases such as
/// self-edges and edge removals which result in a spanning tree with no		/// self-edges and edge removals which result in a spanning tree with no
/// more cycles. There are also detailed comments within the implementation		/// more cycles.
/// on techniques which could substantially improve this routine's
/// efficiency.
SmallVector<RefSCC *, 1> removeInternalRefEdge(Node &SourceN,		SmallVector<RefSCC *, 1> removeInternalRefEdge(Node &SourceN,
Node &TargetN);		ArrayRef<Node *> TargetNs);

/// A convenience wrapper around the above to handle trivial cases of		/// A convenience wrapper around the above to handle trivial cases of
/// inserting a new call edge.		/// inserting a new call edge.
///		///
/// This is trivial whenever the target is in the same SCC as the source or		/// This is trivial whenever the target is in the same SCC as the source or
/// the edge is an outgoing edge to some descendant SCC. In these cases		/// the edge is an outgoing edge to some descendant SCC. In these cases
/// there is no change to the cyclic structure of SCCs or RefSCCs.		/// there is no change to the cyclic structure of SCCs or RefSCCs.
///		///
▲ Show 20 Lines • Show All 437 Lines • Show Last 20 Lines

llvm/trunk/lib/Analysis/CGSCCPassManager.cpp

Show First 20 Lines • Show All 453 Lines • ▼ Show 20 Lines	LazyCallGraph::SCC &llvm::updateCGAndAnalysisManagerForFunctionPass(
// Include synthetic reference edges to known, defined lib functions.		// Include synthetic reference edges to known, defined lib functions.
for (auto *F : G.getLibFunctions())		for (auto *F : G.getLibFunctions())
// While the list of lib functions doesn't have repeats, don't re-visit		// While the list of lib functions doesn't have repeats, don't re-visit
// anything handled above.		// anything handled above.
if (!Visited.count(F))		if (!Visited.count(F))
VisitRef(*F);		VisitRef(*F);

// First remove all of the edges that are no longer present in this function.		// First remove all of the edges that are no longer present in this function.
// We have to build a list of dead targets first and then remove them as the		// The first step makes these edges uniformly ref edges and accumulates them
// data structures will all be invalidated by removing them.		// into a separate data structure so removal doesn't invalidate anything.
SmallVector<PointerIntPair<Node *, 1, Edge::Kind>, 4> DeadTargets;		SmallVector<Node *, 4> DeadTargets;
for (Edge &E : *N)		for (Edge &E : *N) {
if (!RetainedEdges.count(&E.getNode()))		if (RetainedEdges.count(&E.getNode()))
DeadTargets.push_back({&E.getNode(), E.getKind()});
for (auto DeadTarget : DeadTargets) {
Node &TargetN = *DeadTarget.getPointer();
bool IsCall = DeadTarget.getInt() == Edge::Call;
SCC &TargetC = *G.lookupSCC(TargetN);
RefSCC &TargetRC = TargetC.getOuterRefSCC();

if (&TargetRC != RC) {
RC->removeOutgoingEdge(N, TargetN);
if (DebugLogging)
dbgs() << "Deleting outgoing edge from '" << N << "' to '" << TargetN
<< "'\n";
continue;		continue;
}
if (DebugLogging)
dbgs() << "Deleting internal " << (IsCall ? "call" : "ref")
<< " edge from '" << N << "' to '" << TargetN << "'\n";

if (IsCall) {		SCC &TargetC = *G.lookupSCC(E.getNode());
		RefSCC &TargetRC = TargetC.getOuterRefSCC();
		if (&TargetRC == RC && E.isCall()) {
if (C != &TargetC) {		if (C != &TargetC) {
// For separate SCCs this is trivial.		// For separate SCCs this is trivial.
RC->switchTrivialInternalEdgeToRef(N, TargetN);		RC->switchTrivialInternalEdgeToRef(N, E.getNode());
} else {		} else {
// Now update the call graph.		// Now update the call graph.
C = incorporateNewSCCRange(RC->switchInternalEdgeToRef(N, TargetN), G,		C = incorporateNewSCCRange(RC->switchInternalEdgeToRef(N, E.getNode()),
N, C, AM, UR, DebugLogging);		G, N, C, AM, UR, DebugLogging);
}		}
}		}

auto NewRefSCCs = RC->removeInternalRefEdge(N, TargetN);		// Now that this is ready for actual removal, put it into our list.
		DeadTargets.push_back(&E.getNode());
		}
		// Remove the easy cases quickly and actually pull them out of our list.
		DeadTargets.erase(
		llvm::remove_if(DeadTargets,
		[&](Node *TargetN) {
		SCC &TargetC = G.lookupSCC(TargetN);
		RefSCC &TargetRC = TargetC.getOuterRefSCC();

		// We can't trivially remove internal targets, so skip
		// those.
		if (&TargetRC == RC)
		return false;

		RC->removeOutgoingEdge(N, *TargetN);
		if (DebugLogging)
		dbgs() << "Deleting outgoing edge from '" << N
		<< "' to '" << TargetN << "'\n";
		return true;
		}),
		DeadTargets.end());

		// Now do a batch removal of the internal ref edges left.
		auto NewRefSCCs = RC->removeInternalRefEdge(N, DeadTargets);
if (!NewRefSCCs.empty()) {		if (!NewRefSCCs.empty()) {
		// The old RefSCC is dead, mark it as such.
		UR.InvalidatedRefSCCs.insert(RC);

// Note that we don't bother to invalidate analyses as ref-edge		// Note that we don't bother to invalidate analyses as ref-edge
// connectivity is not really observable in any way and is intended		// connectivity is not really observable in any way and is intended
// exclusively to be used for ordering of transforms rather than for		// exclusively to be used for ordering of transforms rather than for
// analysis conclusions.		// analysis conclusions.

// The RC worklist is in reverse postorder, so we first enqueue the		// Update RC to the "bottom".
// current RefSCC as it will remain the parent of all split RefSCCs, then
// we enqueue the new ones in RPO except for the one which contains the
// source node as that is the "bottom" we will continue processing in the
// bottom-up walk.
UR.RCWorklist.insert(RC);
if (DebugLogging)
dbgs() << "Enqueuing the existing RefSCC in the update worklist: "
<< *RC << "\n";
// Update the RC to the "bottom".
assert(G.lookupSCC(N) == C && "Changed the SCC when splitting RefSCCs!");		assert(G.lookupSCC(N) == C && "Changed the SCC when splitting RefSCCs!");
RC = &C->getOuterRefSCC();		RC = &C->getOuterRefSCC();
assert(G.lookupRefSCC(N) == RC && "Failed to update current RefSCC!");		assert(G.lookupRefSCC(N) == RC && "Failed to update current RefSCC!");

		// The RC worklist is in reverse postorder, so we enqueue the new ones in
		// RPO except for the one which contains the source node as that is the
		// "bottom" we will continue processing in the bottom-up walk.
assert(NewRefSCCs.front() == RC &&		assert(NewRefSCCs.front() == RC &&
"New current RefSCC not first in the returned list!");		"New current RefSCC not first in the returned list!");
for (RefSCC *NewRC : reverse(		for (RefSCC *NewRC :
make_range(std::next(NewRefSCCs.begin()), NewRefSCCs.end()))) {		reverse(make_range(std::next(NewRefSCCs.begin()), NewRefSCCs.end()))) {
assert(NewRC != RC && "Should not encounter the current RefSCC further "		assert(NewRC != RC && "Should not encounter the current RefSCC further "
"in the postorder list of new RefSCCs.");		"in the postorder list of new RefSCCs.");
UR.RCWorklist.insert(NewRC);		UR.RCWorklist.insert(NewRC);
if (DebugLogging)		if (DebugLogging)
dbgs() << "Enqueuing a new RefSCC in the update worklist: " << *NewRC		dbgs() << "Enqueuing a new RefSCC in the update worklist: " << *NewRC
<< "\n";		<< "\n";
}		}
}		}
}

// Next demote all the call edges that are now ref edges. This helps make		// Next demote all the call edges that are now ref edges. This helps make
// the SCCs small which should minimize the work below as we don't want to		// the SCCs small which should minimize the work below as we don't want to
// form cycles that this would break.		// form cycles that this would break.
for (Node *RefTarget : DemotedCallTargets) {		for (Node *RefTarget : DemotedCallTargets) {
SCC &TargetC = G.lookupSCC(RefTarget);		SCC &TargetC = G.lookupSCC(RefTarget);
RefSCC &TargetRC = TargetC.getOuterRefSCC();		RefSCC &TargetRC = TargetC.getOuterRefSCC();

▲ Show 20 Lines • Show All 132 Lines • Show Last 20 Lines

llvm/trunk/lib/Analysis/LazyCallGraph.cpp

Show First 20 Lines • Show All 1,088 Lines • ▼ Show 20 Lines	#endif

// First remove it from the node.		// First remove it from the node.
bool Removed = SourceN->removeEdgeInternal(TargetN);		bool Removed = SourceN->removeEdgeInternal(TargetN);
(void)Removed;		(void)Removed;
assert(Removed && "Target not in the edge set for this caller?");		assert(Removed && "Target not in the edge set for this caller?");
}		}

SmallVector<LazyCallGraph::RefSCC *, 1>		SmallVector<LazyCallGraph::RefSCC *, 1>
LazyCallGraph::RefSCC::removeInternalRefEdge(Node &SourceN, Node &TargetN) {		LazyCallGraph::RefSCC::removeInternalRefEdge(Node &SourceN,
assert(!(*SourceN)[TargetN].isCall() &&		ArrayRef<Node *> TargetNs) {
"Cannot remove a call edge, it must first be made a ref edge");		// We return a list of the resulting new RefSCCs in post-order.
		SmallVector<RefSCC *, 1> Result;

#ifndef NDEBUG		#ifndef NDEBUG
// In a debug build, verify the RefSCC is valid to start with and when this		// In a debug build, verify the RefSCC is valid to start with and that either
// routine finishes.		// we return an empty list of result RefSCCs and this RefSCC remains valid,
		// or we return new RefSCCs and this RefSCC is dead.
verify();		verify();
auto VerifyOnExit = make_scope_exit([&]() { verify(); });		auto VerifyOnExit = make_scope_exit([&]() {
		if (Result.empty()) {
		verify();
		} else {
		assert(!G && "A dead RefSCC should have its graph pointer nulled.");
		assert(SCCs.empty() && "A dead RefSCC should have no SCCs in it.");
		for (RefSCC *RC : Result)
		RC->verify();
		}
		});
#endif		#endif

// First remove the actual edge.		// First remove the actual edges.
bool Removed = SourceN->removeEdgeInternal(TargetN);		for (Node *TargetN : TargetNs) {
		assert(!(SourceN)[TargetN].isCall() &&
		"Cannot remove a call edge, it must first be made a ref edge");

		bool Removed = SourceN->removeEdgeInternal(*TargetN);
(void)Removed;		(void)Removed;
assert(Removed && "Target not in the edge set for this caller?");		assert(Removed && "Target not in the edge set for this caller?");
		}

// We return a list of the resulting new RefSCCs in post-order.		// Direct self references don't impact the ref graph at all.
SmallVector<RefSCC *, 1> Result;		if (llvm::all_of(TargetNs,
		[&](Node *TargetN) { return &SourceN == TargetN; }))
// Direct recursion doesn't impact the SCC graph at all.
if (&SourceN == &TargetN)
return Result;		return Result;

// If this ref edge is within an SCC then there are sufficient other edges to		// If all targets are in the same SCC as the source, because no call edges
// form a cycle without this edge so removing it is a no-op.		// were removed there is no RefSCC structure change.
SCC &SourceC = *G->lookupSCC(SourceN);		SCC &SourceC = *G->lookupSCC(SourceN);
SCC &TargetC = *G->lookupSCC(TargetN);		if (llvm::all_of(TargetNs, [&](Node *TargetN) {
if (&SourceC == &TargetC)		return G->lookupSCC(*TargetN) == &SourceC;
		}))
return Result;		return Result;

// We build somewhat synthetic new RefSCCs by providing a postorder mapping		// We build somewhat synthetic new RefSCCs by providing a postorder mapping
// for each inner SCC. We also store these associated with nodes rather		// for each inner SCC. We also store these associated with nodes rather
// than SCCs because this saves a round-trip through the node->SCC map and in		// than SCCs because this saves a round-trip through the node->SCC map and in
// the common case, SCCs are small. We will verify that we always give the		// the common case, SCCs are small. We will verify that we always give the
// same number to every node in the SCC such that these are equivalent.		// same number to every node in the SCC such that these are equivalent.
const int RootPostOrderNumber = 0;		int PostOrderNumber = 0;
int PostOrderNumber = RootPostOrderNumber + 1;
SmallDenseMap<Node *, int> PostOrderMapping;		SmallDenseMap<Node *, int> PostOrderMapping;

// Every node in the target SCC can already reach every node in this RefSCC
// (by definition). It is the only node we know will stay inside this RefSCC.
// Everything which transitively reaches Target will also remain in the
// RefSCC. We handle this by pre-marking that the nodes in the target SCC map
// back to the root post order number.
//
// This also enables us to take a very significant short-cut in the standard
// Tarjan walk to re-form RefSCCs below: whenever we build an edge that
// references the target node, we know that the target node eventually
// references all other nodes in our walk. As a consequence, we can detect
// and handle participants in that cycle without walking all the edges that
// form the connections, and instead by relying on the fundamental guarantee
// coming into this operation.
for (Node &N : TargetC)
PostOrderMapping[&N] = RootPostOrderNumber;

// Reset all the other nodes to prepare for a DFS over them, and add them to		// Reset all the other nodes to prepare for a DFS over them, and add them to
// our worklist.		// our worklist.
SmallVector<Node *, 8> Worklist;		SmallVector<Node *, 8> Worklist;
for (SCC *C : SCCs) {		for (SCC *C : SCCs) {
if (C == &TargetC)
continue;

for (Node &N : *C)		for (Node &N : *C)
N.DFSNumber = N.LowLink = 0;		N.DFSNumber = N.LowLink = 0;

Worklist.append(C->Nodes.begin(), C->Nodes.end());		Worklist.append(C->Nodes.begin(), C->Nodes.end());
}		}

auto MarkNodeForSCCNumber = [&PostOrderMapping](Node &N, int Number) {		auto MarkNodeForSCCNumber = [&PostOrderMapping](Node &N, int Number) {
N.DFSNumber = N.LowLink = -1;		N.DFSNumber = N.LowLink = -1;
Show All 40 Lines	do {
// Continue, resetting to the child node.		// Continue, resetting to the child node.
ChildN.LowLink = ChildN.DFSNumber = NextDFSNumber++;		ChildN.LowLink = ChildN.DFSNumber = NextDFSNumber++;
N = &ChildN;		N = &ChildN;
I = ChildN->begin();		I = ChildN->begin();
E = ChildN->end();		E = ChildN->end();
continue;		continue;
}		}
if (ChildN.DFSNumber == -1) {		if (ChildN.DFSNumber == -1) {
// Check if this edge's target node connects to the deleted edge's
// target node. If so, we know that every node connected will end up
// in this RefSCC, so collapse the entire current stack into the root
// slot in our SCC numbering. See above for the motivation of
// optimizing the target connected nodes in this way.
auto PostOrderI = PostOrderMapping.find(&ChildN);
if (PostOrderI != PostOrderMapping.end() &&
PostOrderI->second == RootPostOrderNumber) {
MarkNodeForSCCNumber(*N, RootPostOrderNumber);
while (!PendingRefSCCStack.empty())
MarkNodeForSCCNumber(*PendingRefSCCStack.pop_back_val(),
RootPostOrderNumber);
while (!DFSStack.empty())
MarkNodeForSCCNumber(*DFSStack.pop_back_val().first,
RootPostOrderNumber);
// Ensure we break all the way out of the enclosing loop.
N = nullptr;
break;
}

// If this child isn't currently in this RefSCC, no need to process		// If this child isn't currently in this RefSCC, no need to process
// it.		// it.
++I;		++I;
continue;		continue;
}		}

// Track the lowest link of the children, if any are still in the stack.		// Track the lowest link of the children, if any are still in the stack.
// Any child not on the stack will have a LowLink of -1.		// Any child not on the stack will have a LowLink of -1.
assert(ChildN.LowLink != 0 &&		assert(ChildN.LowLink != 0 &&
"Low-link must not be zero with a non-zero DFS number.");		"Low-link must not be zero with a non-zero DFS number.");
if (ChildN.LowLink >= 0 && ChildN.LowLink < N->LowLink)		if (ChildN.LowLink >= 0 && ChildN.LowLink < N->LowLink)
N->LowLink = ChildN.LowLink;		N->LowLink = ChildN.LowLink;
++I;		++I;
}		}
if (!N)
// We short-circuited this node.
break;

// We've finished processing N and its descendents, put it on our pending		// We've finished processing N and its descendents, put it on our pending
// stack to eventually get merged into a RefSCC.		// stack to eventually get merged into a RefSCC.
PendingRefSCCStack.push_back(N);		PendingRefSCCStack.push_back(N);

// If this node is linked to some lower entry, continue walking up the		// If this node is linked to some lower entry, continue walking up the
// stack.		// stack.
if (N->LowLink != N->DFSNumber) {		if (N->LowLink != N->DFSNumber) {
Show All 22 Lines	do {
PendingRefSCCStack.erase(RefSCCNodes.end().base(),		PendingRefSCCStack.erase(RefSCCNodes.end().base(),
PendingRefSCCStack.end());		PendingRefSCCStack.end());
} while (!DFSStack.empty());		} while (!DFSStack.empty());

assert(DFSStack.empty() && "Didn't flush the entire DFS stack!");		assert(DFSStack.empty() && "Didn't flush the entire DFS stack!");
assert(PendingRefSCCStack.empty() && "Didn't flush all pending nodes!");		assert(PendingRefSCCStack.empty() && "Didn't flush all pending nodes!");
} while (!Worklist.empty());		} while (!Worklist.empty());

// We now have a post-order numbering for RefSCCs and a mapping from each		// If we only ever needed one post-order number, we reformed a ref-cycle for
// node in this RefSCC to its final RefSCC. We create each new RefSCC node		// every node so the RefSCC remains unchanged.
// (re-using this RefSCC node for the root) and build a radix-sort style map		if (PostOrderNumber == 1)
// from postorder number to the RefSCC. We then append SCCs to each of these		return Result;
// RefSCCs in the order they occured in the original SCCs container.
for (int i = 1; i < PostOrderNumber; ++i)		// Otherwise we create a collection of new RefSCC nodes and build
		// a radix-sort style map from postorder number to these new RefSCCs. We then
		// append SCCs to each of these RefSCCs in the order they occured in the
		// original SCCs container.
		for (int i = 0; i < PostOrderNumber; ++i)
Result.push_back(G->createRefSCC(*G));		Result.push_back(G->createRefSCC(*G));

// Insert the resulting postorder sequence into the global graph postorder		// Insert the resulting postorder sequence into the global graph postorder
// sequence before the current RefSCC in that sequence. The idea being that		// sequence before the current RefSCC in that sequence, and then remove the
// this RefSCC is the target of the reference edge removed, and thus has		// current one.
// a direct or indirect edge to every other RefSCC formed and so must be at
// the end of any postorder traversal.
//		//
// FIXME: It'd be nice to change the APIs so that we returned an iterator		// FIXME: It'd be nice to change the APIs so that we returned an iterator
// range over the global postorder sequence and generally use that sequence		// range over the global postorder sequence and generally use that sequence
// rather than building a separate result vector here.		// rather than building a separate result vector here.
if (!Result.empty()) {
int Idx = G->getRefSCCIndex(*this);		int Idx = G->getRefSCCIndex(*this);
G->PostOrderRefSCCs.insert(G->PostOrderRefSCCs.begin() + Idx,		G->PostOrderRefSCCs.erase(G->PostOrderRefSCCs.begin() + Idx);
Result.begin(), Result.end());		G->PostOrderRefSCCs.insert(G->PostOrderRefSCCs.begin() + Idx, Result.begin(),
		Result.end());
for (int i : seq<int>(Idx, G->PostOrderRefSCCs.size()))		for (int i : seq<int>(Idx, G->PostOrderRefSCCs.size()))
G->RefSCCIndices[G->PostOrderRefSCCs[i]] = i;		G->RefSCCIndices[G->PostOrderRefSCCs[i]] = i;
assert(G->PostOrderRefSCCs[G->getRefSCCIndex(*this)] == this &&
"Failed to update this RefSCC's index after insertion!");
}

for (SCC *C : SCCs) {		for (SCC *C : SCCs) {
auto PostOrderI = PostOrderMapping.find(&*C->begin());		auto PostOrderI = PostOrderMapping.find(&*C->begin());
assert(PostOrderI != PostOrderMapping.end() &&		assert(PostOrderI != PostOrderMapping.end() &&
"Cannot have missing mappings for nodes!");		"Cannot have missing mappings for nodes!");
int SCCNumber = PostOrderI->second;		int SCCNumber = PostOrderI->second;
#ifndef NDEBUG		#ifndef NDEBUG
for (Node &N : *C)		for (Node &N : *C)
assert(PostOrderMapping.find(&N)->second == SCCNumber &&		assert(PostOrderMapping.find(&N)->second == SCCNumber &&
"Cannot have different numbers for nodes in the same SCC!");		"Cannot have different numbers for nodes in the same SCC!");
#endif		#endif
if (SCCNumber == 0)
// The root node is handled separately by removing the SCCs.
continue;

RefSCC &RC = *Result[SCCNumber - 1];		RefSCC &RC = *Result[SCCNumber];
int SCCIndex = RC.SCCs.size();		int SCCIndex = RC.SCCs.size();
RC.SCCs.push_back(C);		RC.SCCs.push_back(C);
RC.SCCIndices[C] = SCCIndex;		RC.SCCIndices[C] = SCCIndex;
C->OuterRefSCC = &RC;		C->OuterRefSCC = &RC;
}		}

// Now erase all but the root's SCCs.		// Now that we've moved things into the new RefSCCs, clear out our current
SCCs.erase(remove_if(SCCs,		// one.
[&](SCC *C) {		G = nullptr;
return PostOrderMapping.lookup(&*C->begin()) !=		SCCs.clear();
RootPostOrderNumber;
}),
SCCs.end());
SCCIndices.clear();		SCCIndices.clear();
for (int i = 0, Size = SCCs.size(); i < Size; ++i)
SCCIndices[SCCs[i]] = i;

#ifndef NDEBUG
// Verify all of the new RefSCCs.
for (RefSCC *RC : Result)
RC->verify();
#endif

// Return the new list of SCCs.		// Return the new list of SCCs.
return Result;		return Result;
}		}

void LazyCallGraph::RefSCC::handleTrivialEdgeInsertion(Node &SourceN,		void LazyCallGraph::RefSCC::handleTrivialEdgeInsertion(Node &SourceN,
Node &TargetN) {		Node &TargetN) {
// The only trivial case that requires any graph updates is when we add new		// The only trivial case that requires any graph updates is when we add new
▲ Show 20 Lines • Show All 454 Lines • Show Last 20 Lines

llvm/trunk/unittests/Analysis/LazyCallGraphTest.cpp

Show First 20 Lines • Show All 1,160 Lines • ▼ Show 20 Lines	TEST(LazyCallGraphTest, InlineAndDeleteFunction) {
// Then remove the old ones.		// Then remove the old ones.
LazyCallGraph::SCC &DC = *CG.lookupSCC(D2);		LazyCallGraph::SCC &DC = *CG.lookupSCC(D2);
auto NewCs = DRC.switchInternalEdgeToRef(D1, D2);		auto NewCs = DRC.switchInternalEdgeToRef(D1, D2);
EXPECT_EQ(&DC, CG.lookupSCC(D2));		EXPECT_EQ(&DC, CG.lookupSCC(D2));
EXPECT_EQ(NewCs.end(), std::next(NewCs.begin()));		EXPECT_EQ(NewCs.end(), std::next(NewCs.begin()));
LazyCallGraph::SCC &NewDC = *NewCs.begin();		LazyCallGraph::SCC &NewDC = *NewCs.begin();
EXPECT_EQ(&NewDC, CG.lookupSCC(D1));		EXPECT_EQ(&NewDC, CG.lookupSCC(D1));
EXPECT_EQ(&NewDC, CG.lookupSCC(D3));		EXPECT_EQ(&NewDC, CG.lookupSCC(D3));
auto NewRCs = DRC.removeInternalRefEdge(D1, D2);		auto NewRCs = DRC.removeInternalRefEdge(D1, {&D2});
EXPECT_EQ(&DRC, CG.lookupRefSCC(D2));		ASSERT_EQ(2u, NewRCs.size());
EXPECT_EQ(NewRCs.end(), std::next(NewRCs.begin()));		LazyCallGraph::RefSCC &NewDRC = *NewRCs[0];
LazyCallGraph::RefSCC &NewDRC = **NewRCs.begin();
EXPECT_EQ(&NewDRC, CG.lookupRefSCC(D1));		EXPECT_EQ(&NewDRC, CG.lookupRefSCC(D1));
EXPECT_EQ(&NewDRC, CG.lookupRefSCC(D3));		EXPECT_EQ(&NewDRC, CG.lookupRefSCC(D3));
EXPECT_FALSE(NewDRC.isParentOf(DRC));		LazyCallGraph::RefSCC &D2RC = *NewRCs[1];
EXPECT_TRUE(CRC.isParentOf(DRC));		EXPECT_EQ(&D2RC, CG.lookupRefSCC(D2));
		EXPECT_FALSE(NewDRC.isParentOf(D2RC));
		EXPECT_TRUE(CRC.isParentOf(D2RC));
EXPECT_TRUE(CRC.isParentOf(NewDRC));		EXPECT_TRUE(CRC.isParentOf(NewDRC));
EXPECT_TRUE(DRC.isParentOf(NewDRC));		EXPECT_TRUE(D2RC.isParentOf(NewDRC));
CRC.removeOutgoingEdge(C1, D2);		CRC.removeOutgoingEdge(C1, D2);
EXPECT_FALSE(CRC.isParentOf(DRC));		EXPECT_FALSE(CRC.isParentOf(D2RC));
EXPECT_TRUE(CRC.isParentOf(NewDRC));		EXPECT_TRUE(CRC.isParentOf(NewDRC));
EXPECT_TRUE(DRC.isParentOf(NewDRC));		EXPECT_TRUE(D2RC.isParentOf(NewDRC));

// Now that we've updated the call graph, D2 is dead, so remove it.		// Now that we've updated the call graph, D2 is dead, so remove it.
CG.removeDeadFunction(D2F);		CG.removeDeadFunction(D2F);

// Check that the graph still looks the same.		// Check that the graph still looks the same.
EXPECT_EQ(&ARC, CG.lookupRefSCC(A1));		EXPECT_EQ(&ARC, CG.lookupRefSCC(A1));
EXPECT_EQ(&ARC, CG.lookupRefSCC(A2));		EXPECT_EQ(&ARC, CG.lookupRefSCC(A2));
EXPECT_EQ(&ARC, CG.lookupRefSCC(A3));		EXPECT_EQ(&ARC, CG.lookupRefSCC(A3));
▲ Show 20 Lines • Show All 144 Lines • ▼ Show 20 Lines	TEST(LazyCallGraphTest, InternalEdgeRemoval) {
LazyCallGraph::Node &C = CG.lookup(lookupFunction(M, "c"));		LazyCallGraph::Node &C = CG.lookup(lookupFunction(M, "c"));
EXPECT_EQ(&RC, CG.lookupRefSCC(A));		EXPECT_EQ(&RC, CG.lookupRefSCC(A));
EXPECT_EQ(&RC, CG.lookupRefSCC(B));		EXPECT_EQ(&RC, CG.lookupRefSCC(B));
EXPECT_EQ(&RC, CG.lookupRefSCC(C));		EXPECT_EQ(&RC, CG.lookupRefSCC(C));

// Remove the edge from b -> a, which should leave the 3 functions still in		// Remove the edge from b -> a, which should leave the 3 functions still in
// a single connected component because of a -> b -> c -> a.		// a single connected component because of a -> b -> c -> a.
SmallVector<LazyCallGraph::RefSCC *, 1> NewRCs =		SmallVector<LazyCallGraph::RefSCC *, 1> NewRCs =
RC.removeInternalRefEdge(B, A);		RC.removeInternalRefEdge(B, {&A});
EXPECT_EQ(0u, NewRCs.size());		EXPECT_EQ(0u, NewRCs.size());
EXPECT_EQ(&RC, CG.lookupRefSCC(A));		EXPECT_EQ(&RC, CG.lookupRefSCC(A));
EXPECT_EQ(&RC, CG.lookupRefSCC(B));		EXPECT_EQ(&RC, CG.lookupRefSCC(B));
EXPECT_EQ(&RC, CG.lookupRefSCC(C));		EXPECT_EQ(&RC, CG.lookupRefSCC(C));
auto J = CG.postorder_ref_scc_begin();		auto J = CG.postorder_ref_scc_begin();
EXPECT_EQ(I, J);		EXPECT_EQ(I, J);
EXPECT_EQ(&RC, &*J);		EXPECT_EQ(&RC, &*J);
EXPECT_EQ(E, std::next(J));		EXPECT_EQ(E, std::next(J));

		// Increment I before we actually mutate the structure so that it remains
		// a valid iterator.
		++I;

// Remove the edge from c -> a, which should leave 'a' in the original RefSCC		// Remove the edge from c -> a, which should leave 'a' in the original RefSCC
// and form a new RefSCC for 'b' and 'c'.		// and form a new RefSCC for 'b' and 'c'.
NewRCs = RC.removeInternalRefEdge(C, A);		NewRCs = RC.removeInternalRefEdge(C, {&A});
EXPECT_EQ(1u, NewRCs.size());		ASSERT_EQ(2u, NewRCs.size());
EXPECT_EQ(&RC, CG.lookupRefSCC(A));		LazyCallGraph::RefSCC &BCRC = *NewRCs[0];
EXPECT_EQ(1, std::distance(RC.begin(), RC.end()));		LazyCallGraph::RefSCC &ARC = *NewRCs[1];
LazyCallGraph::RefSCC &RC2 = *CG.lookupRefSCC(B);		EXPECT_EQ(&ARC, CG.lookupRefSCC(A));
EXPECT_EQ(&RC2, CG.lookupRefSCC(C));		EXPECT_EQ(1, std::distance(ARC.begin(), ARC.end()));
EXPECT_EQ(&RC2, NewRCs[0]);		EXPECT_EQ(&BCRC, CG.lookupRefSCC(B));
		EXPECT_EQ(&BCRC, CG.lookupRefSCC(C));
J = CG.postorder_ref_scc_begin();		J = CG.postorder_ref_scc_begin();
EXPECT_NE(I, J);		EXPECT_NE(I, J);
EXPECT_EQ(&RC2, &*J);		EXPECT_EQ(&BCRC, &*J);
		++J;
		EXPECT_NE(I, J);
		EXPECT_EQ(&ARC, &*J);
++J;		++J;
EXPECT_EQ(I, J);		EXPECT_EQ(I, J);
EXPECT_EQ(&RC, &*J);		EXPECT_EQ(E, J);
		}

		TEST(LazyCallGraphTest, InternalMultiEdgeRemoval) {
		LLVMContext Context;
		// A nice fully connected (including self-edges) RefSCC.
		std::unique_ptr<Module> M = parseAssembly(
		Context, "define void @a(i8** %ptr) {\n"
		"entry:\n"
		" store i8* bitcast (void(i8*) @a to i8), i8* %ptr\n"
		" store i8* bitcast (void(i8*) @b to i8), i8* %ptr\n"
		" store i8* bitcast (void(i8*) @c to i8), i8* %ptr\n"
		" ret void\n"
		"}\n"
		"define void @b(i8** %ptr) {\n"
		"entry:\n"
		" store i8* bitcast (void(i8*) @a to i8), i8* %ptr\n"
		" store i8* bitcast (void(i8*) @b to i8), i8* %ptr\n"
		" store i8* bitcast (void(i8*) @c to i8), i8* %ptr\n"
		" ret void\n"
		"}\n"
		"define void @c(i8** %ptr) {\n"
		"entry:\n"
		" store i8* bitcast (void(i8*) @a to i8), i8* %ptr\n"
		" store i8* bitcast (void(i8*) @b to i8), i8* %ptr\n"
		" store i8* bitcast (void(i8*) @c to i8), i8* %ptr\n"
		" ret void\n"
		"}\n");
		LazyCallGraph CG = buildCG(*M);

		// Force the graph to be fully expanded.
		CG.buildRefSCCs();
		auto I = CG.postorder_ref_scc_begin(), E = CG.postorder_ref_scc_end();
		LazyCallGraph::RefSCC &RC = *I;
		EXPECT_EQ(E, std::next(I));

		LazyCallGraph::Node &A = CG.lookup(lookupFunction(M, "a"));
		LazyCallGraph::Node &B = CG.lookup(lookupFunction(M, "b"));
		LazyCallGraph::Node &C = CG.lookup(lookupFunction(M, "c"));
		EXPECT_EQ(&RC, CG.lookupRefSCC(A));
		EXPECT_EQ(&RC, CG.lookupRefSCC(B));
		EXPECT_EQ(&RC, CG.lookupRefSCC(C));

		// Increment I before we actually mutate the structure so that it remains
		// a valid iterator.
++I;		++I;
EXPECT_EQ(E, I);
		// Remove the edges from b -> a and b -> c, leaving b in its own RefSCC.
		SmallVector<LazyCallGraph::RefSCC *, 1> NewRCs =
		RC.removeInternalRefEdge(B, {&A, &C});

		ASSERT_EQ(2u, NewRCs.size());
		LazyCallGraph::RefSCC &BRC = *NewRCs[0];
		LazyCallGraph::RefSCC &ACRC = *NewRCs[1];
		EXPECT_EQ(&BRC, CG.lookupRefSCC(B));
		EXPECT_EQ(1, std::distance(BRC.begin(), BRC.end()));
		EXPECT_EQ(&ACRC, CG.lookupRefSCC(A));
		EXPECT_EQ(&ACRC, CG.lookupRefSCC(C));
		auto J = CG.postorder_ref_scc_begin();
		EXPECT_NE(I, J);
		EXPECT_EQ(&BRC, &*J);
++J;		++J;
		EXPECT_NE(I, J);
		EXPECT_EQ(&ACRC, &*J);
		++J;
		EXPECT_EQ(I, J);
EXPECT_EQ(E, J);		EXPECT_EQ(E, J);
}		}

TEST(LazyCallGraphTest, InternalNoOpEdgeRemoval) {		TEST(LazyCallGraphTest, InternalNoOpEdgeRemoval) {
LLVMContext Context;		LLVMContext Context;
// A graph with a single cycle formed both from call and reference edges		// A graph with a single cycle formed both from call and reference edges
// which makes the reference edges trivial to delete. The graph looks like:		// which makes the reference edges trivial to delete. The graph looks like:
//		//
Show All 36 Lines	TEST(LazyCallGraphTest, InternalNoOpEdgeRemoval) {
EXPECT_EQ(&RC, CG.lookupRefSCC(BN));		EXPECT_EQ(&RC, CG.lookupRefSCC(BN));
EXPECT_EQ(&RC, CG.lookupRefSCC(CN));		EXPECT_EQ(&RC, CG.lookupRefSCC(CN));
EXPECT_EQ(&C, CG.lookupSCC(AN));		EXPECT_EQ(&C, CG.lookupSCC(AN));
EXPECT_EQ(&C, CG.lookupSCC(BN));		EXPECT_EQ(&C, CG.lookupSCC(BN));
EXPECT_EQ(&C, CG.lookupSCC(CN));		EXPECT_EQ(&C, CG.lookupSCC(CN));

// Remove the edge from a -> c which doesn't change anything.		// Remove the edge from a -> c which doesn't change anything.
SmallVector<LazyCallGraph::RefSCC *, 1> NewRCs =		SmallVector<LazyCallGraph::RefSCC *, 1> NewRCs =
RC.removeInternalRefEdge(AN, CN);		RC.removeInternalRefEdge(AN, {&CN});
EXPECT_EQ(0u, NewRCs.size());		EXPECT_EQ(0u, NewRCs.size());
EXPECT_EQ(&RC, CG.lookupRefSCC(AN));		EXPECT_EQ(&RC, CG.lookupRefSCC(AN));
EXPECT_EQ(&RC, CG.lookupRefSCC(BN));		EXPECT_EQ(&RC, CG.lookupRefSCC(BN));
EXPECT_EQ(&RC, CG.lookupRefSCC(CN));		EXPECT_EQ(&RC, CG.lookupRefSCC(CN));
EXPECT_EQ(&C, CG.lookupSCC(AN));		EXPECT_EQ(&C, CG.lookupSCC(AN));
EXPECT_EQ(&C, CG.lookupSCC(BN));		EXPECT_EQ(&C, CG.lookupSCC(BN));
EXPECT_EQ(&C, CG.lookupSCC(CN));		EXPECT_EQ(&C, CG.lookupSCC(CN));
auto J = CG.postorder_ref_scc_begin();		auto J = CG.postorder_ref_scc_begin();
EXPECT_EQ(I, J);		EXPECT_EQ(I, J);
EXPECT_EQ(&RC, &*J);		EXPECT_EQ(&RC, &*J);
EXPECT_EQ(E, std::next(J));		EXPECT_EQ(E, std::next(J));

// Remove the edge from b -> a and c -> b; again this doesn't change		// Remove the edge from b -> a and c -> b; again this doesn't change
// anything.		// anything.
NewRCs = RC.removeInternalRefEdge(BN, AN);		NewRCs = RC.removeInternalRefEdge(BN, {&AN});
NewRCs = RC.removeInternalRefEdge(CN, BN);		NewRCs = RC.removeInternalRefEdge(CN, {&BN});
EXPECT_EQ(0u, NewRCs.size());		EXPECT_EQ(0u, NewRCs.size());
EXPECT_EQ(&RC, CG.lookupRefSCC(AN));		EXPECT_EQ(&RC, CG.lookupRefSCC(AN));
EXPECT_EQ(&RC, CG.lookupRefSCC(BN));		EXPECT_EQ(&RC, CG.lookupRefSCC(BN));
EXPECT_EQ(&RC, CG.lookupRefSCC(CN));		EXPECT_EQ(&RC, CG.lookupRefSCC(CN));
EXPECT_EQ(&C, CG.lookupSCC(AN));		EXPECT_EQ(&C, CG.lookupSCC(AN));
EXPECT_EQ(&C, CG.lookupSCC(BN));		EXPECT_EQ(&C, CG.lookupSCC(BN));
EXPECT_EQ(&C, CG.lookupSCC(CN));		EXPECT_EQ(&C, CG.lookupSCC(CN));
J = CG.postorder_ref_scc_begin();		J = CG.postorder_ref_scc_begin();
▲ Show 20 Lines • Show All 624 Lines • Show Last 20 Lines

This is an archive of the discontinued LLVM Phabricator instance.

[LCG] Switch one of the update methods for the LazyCallGraph to support limited batch updates.ClosedPublic

Details

Diff Detail

Event Timeline

Revision Contents

Diff 110334

llvm/trunk/include/llvm/Analysis/LazyCallGraph.h

llvm/trunk/lib/Analysis/CGSCCPassManager.cpp

llvm/trunk/lib/Analysis/LazyCallGraph.cpp

llvm/trunk/unittests/Analysis/LazyCallGraphTest.cpp

[LCG] Switch one of the update methods for the LazyCallGraph to support limited batch updates.
ClosedPublic