This is an archive of the discontinued LLVM Phabricator instance.

[LCG] Construct an actual call graph with call-edge SCCs nested inside reference-edge SCCs.
ClosedPublic

Authored by chandlerc on Feb 2 2016, 4:06 AM.

Details

Summary

This essentially builds a more normal call graph as a subgraph of the
"reference graph" that was the old model. This allows both to exist and
the different use cases to use the aspect which addresses their needs.
Specifically, the pass manager and other *ordering* constrained logic
can use the reference graph to achieve conservative order of visit,
while analyses reasoning about attributes and other properties derived
from reachability can reason about the direct call graph.

Note that this isn't yet complete: it doesn't model edges to
declarations or indirect calls yet. Those are planned for subsequent
patches to complete the set of information needed for traditional call
graph based analyses.

An important realization is that the call graph is a formal subset of
the reference graph and thus both can live within the same
data structure. All SCCs of the call graph are necessarily contained
within an SCC of the reference graph, etc.

The design is to build 'RefSCC's to model SCCs of the reference graph,
and then within them more literal SCCs for the call graph.

The formation of call graph SCCs is not done lazily, unlike reference
SCCs. Instead, once a reference SCC is formed, it directly builds the
call SCCs within it and stores them in post-order. This is used to
provide a consistent platform for mutation and update of the graph. The
post-order also allows for very efficient updates in common cases by
bounding the number of nodes (and thus edges) considered.

There is considerable common code that I'm still looking for the best
way to factor out between the various DFS implementations here. So far,
my attempts have made the code harder to read and understand despite
reducing the duplication, which seems a poor tradeoff. I've not given up
on figuring out the right way to do this, but I wanted to wait until
I at least had the system working and tested to continue attempting to
factor it differently.

This also requires introducing several new algorithms in order to handle
all of the incremental update scenarios for the more complex structure
involving two edge colorings. I've tried to comment the algorithms
sufficiently to make it clear how this is expected to work, but they may
still need more extensive documentation.

I have worked through the core algorithms on a whiteboard and I think
their underpinnings are largely correct. However, I still have a lot of
testing to do here. Several of the previous tests have not yet been
updated (and are commented out currently) and new tests have not been
written covering all of the intricasies of the new algorithms. I'm
planning to continue working on the testing, but based on discussions
with Sanjoy, it seemed worthwhile to start the review early. This seems
especially worthwhile considering the very sizable amount of code change
due to introducing both new structures and new algorithms.

I also know that there are some changes which are not strictly
necessarily coupled here. The process of developing this started out
with a very focused set of changes for the new structure of the graph
and algorithms, but subsequent changes to bring the APIs and code into
consistent and understandable patterns also ended up touching on other
aspects. There was no good way to separate these out without causing
*massive* merge conflicts. Ultimately, to a large degree this is
a rewrite of most of the core algorithms in the LCG class and so I don't
think it really matters much.

Anyways, looking forward to comments. I'll also update this as I make
more progress carefully testing the rest of the mutation logic.

Diff Detail

Repository
rL LLVM

Event Timeline

chandlerc updated this revision to Diff 46636.Feb 2 2016, 4:06 AM
chandlerc retitled this revision from to [LCG] Construct an actual call graph with call-edge SCCs nested inside reference-edge SCCs..
chandlerc updated this object.
chandlerc added reviewers: reames, sanjoy, bogner, echristo.
chandlerc added a subscriber: llvm-commits.
sanjoy edited edge metadata.Feb 2 2016, 10:29 PM

Okay, this is a lot of code to review at once. :)

I'm okay with the general structure (SCC's nested in RefSCCs). I've only skimmed through the code so far, and have added some minor comments inline based on that.

lib/Analysis/LazyCallGraph.cpp
352 ↗(On Diff #46636)

Here and below in the later std::stable_partition, do you need to update SCCIndices?

408 ↗(On Diff #46636)

Should this be return ConnectedSet.count(C);? Since I understand you want { nodes reachable from Target } Target { nodes not reachable from Target } ?

424 ↗(On Diff #46636)

any SCC-wide properties except norecurse?

456 ↗(On Diff #46636)

Minor: assert message is wrong.

713 ↗(On Diff #46636)

This may be naive of me, but why can't this do exactly what switchInternalEdgeToCall does when it discovers that it needs to merge a set of SCC's into a new SCC (perhaps even share code with some suitable abstraction)? If it is expensive to keep a postorder of RefSCCs due to lazy generation (since you'll have to prepend onto a vector), can we apply essentially the same algorithm to the reverse postorder (that is always up to date) of RefSCCs?

sanjoy added inline comments.Feb 2 2016, 10:29 PM
include/llvm/Analysis/LazyCallGraph.h
473 ↗(On Diff #46636)

Nit: "existing"

487 ↗(On Diff #46636)

Nit: "existing"

499 ↗(On Diff #46636)

Nit: "existing"

765 ↗(On Diff #46636)

Why not map these to unsigned? Is making the integer type signed a semantic change?

lib/Analysis/LazyCallGraph.cpp
349 ↗(On Diff #46636)

Should this be !ConnectedSet.count(&TargetSCC)?

chandlerc updated this revision to Diff 47005.Feb 5 2016, 2:22 AM
chandlerc marked 7 inline comments as done.
chandlerc edited edge metadata.

Fleshed out unit tests, numerous bug fixes and missing APIs in order to write
unit tests effectively, and addressed Sanjoy's feedback.

Thanks so much for the review Sanjoy.

Now with much better testing (still checking on the last bit of coverage) and many, many bugs fixed. I've also replied inline to some of your questions and fixed all the issues you pointed out (that I could).

include/llvm/Analysis/LazyCallGraph.h
765 ↗(On Diff #46636)

Because I don't want 2^32 modular arithmetic behavior.

I use signed integers unless I *need* an unsigned integer. It makes me much more comfortable writing relational comparisons, etc.

lib/Analysis/LazyCallGraph.cpp
349 ↗(On Diff #46636)

Yep.

New unit tests catch this as well.

352 ↗(On Diff #46636)

Yep. Unit tests also catch this now.

I've also been able to merge some of the calls to this to simplify things.

408 ↗(On Diff #46636)

Yep. Again, unit tests now catch this and fixed.

424 ↗(On Diff #46636)

readnone? most of them are...

713 ↗(On Diff #46636)

So, I've not thought of a good way to retain the postorder of RefSCCs and use them here. It's made tricky because one of the goals of RefSCCS is for updates to one RefSCC to not impact other ones (for parallelism etc) and mutating a postorder list would likely do just that.

But it is a delightful optimization so I'm going to keep thinking about this. Maybe something will present itself once we have the users in hand and know what their usage looks like?

First round of comments and questions (I'll do a second pass soon):

(PS: I haven't reviewed the tests yet)

include/llvm/Analysis/LazyCallGraph.h
451 ↗(On Diff #47005)

I don't see where slice, Begin and End above are used.

480 ↗(On Diff #47005)

Why can't Parents be a container of const RefSCC *?

555 ↗(On Diff #47005)

s/SCC/RefSCC

558 ↗(On Diff #47005)

Minor: Might want to change this to "existing path from \p SourceN to \p TargetN" since the edge being inserted is not necessarily a call edge.

559 ↗(On Diff #47005)

Minor: "does not change the set of SCCs and RefSCCs" may be clearer (esp. since you use that language elsewhere).

585 ↗(On Diff #47005)

Was this FIXME for the postorder optimization we do in this change, or something else? If the former, then perhaps this FIXME should be removed now?

lib/Analysis/LazyCallGraph.cpp
224 ↗(On Diff #47005)

I'd remove the &TargetSCC == &SourceSCC clause, and instead just have this assert be <= i.

300 ↗(On Diff #47005)

Nit: sequence

334 ↗(On Diff #47005)

Minor: I'd use llvm::any_of for the inner loop over the outgoing edges.

358 ↗(On Diff #47005)

Nit: "the correct post-order"

404 ↗(On Diff #47005)

Minor: I'd use SCCIndices.find(&EdgeC)->second here, just so that we crash if &EdgeC didn't end up in SCCIndices due to a bug earlier.

462 ↗(On Diff #47005)

Nit: indentation

568 ↗(On Diff #47005)

This special case here makes me slightly uncomfortable. Unless you think it is important for performance (or other reasons I don't see yet), perhaps we can get rid of the // Force the target node to be in the old SCC. bit above (so that the ChildN.DFSNumber == -1 case is never "spuriously" taken), and instead down below add SCCNodes to OldSCC if it contains TargetN?

733 ↗(On Diff #47005)

Nit: spelling

840 ↗(On Diff #47005)

Given that you've used this pattern a lot, perhaps the interface should be Node &getNode() (which asserts that node exists) and perhaps an Node *getNodePtr() or bool hasNode() interface for clients that want to handle edges that don't yet have a node?

1010 ↗(On Diff #47005)

Why not have this DFS be over SCC s as nodes? That way we won't waste cycles DFS'ing inside an SCC; and it fits in better with the "SCC 's nested within RefSCC" design.

1033 ↗(On Diff #47005)

Might be useful to explicitly document (on the field) that DFSNumber for RefSCC (and SCC?) instances is -1 for all nodes unless we're mid-DFS. Perhaps we can even stick this invariant in verify()?

1042 ↗(On Diff #47005)

As I said earlier, unless there are cases where this really matters, I'd rather not have this special case here; but instead have a check on RefSCCNodes to see if it should be put in a new SCC or into TargetC.

1055 ↗(On Diff #47005)

Nit: "RefSCC"

1173 ↗(On Diff #47005)

Can there be cases where Result is empty, IsLeaf is false, and this was a leaf RefSCC before removeInternalRefEdge was called? If not, we can get rid of IsLeaf and update G->LeafRefSCCs only if Result is non-empty.

1182 ↗(On Diff #47005)

[Edit: also see above]

Doesn't !Result.empty() imply !IsLeaf (from the assert above)?

I think you need if (!WasLeafBeforeEdgeRemoval && !Result.empty()), but I think just checking for !IsLeaf will Do The Right Thing, since std::remove doesn't break if this isn't present in G->LeafRefSCCs.

sanjoy added inline comments.Feb 7 2016, 4:14 PM
include/llvm/Analysis/LazyCallGraph.h
727 ↗(On Diff #47005)

Why can't this (i.e. ContainingSCC) live as a field in Node?

chandlerc updated this revision to Diff 47303.Feb 9 2016, 1:44 AM
chandlerc marked 9 inline comments as done.

Update based on code review comments.

Updated resolving most of the code review comments. Some questions and responses below:

include/llvm/Analysis/LazyCallGraph.h
451 ↗(On Diff #47005)

I'm expecting clients to want to do:

for (auto &C : make_range(RC.begin(), RC.find(SomeOldC)))
  ...

And thought it would be good to directly support this rather than forcing the use of make_range by providing:

for (auto &C : RC.slice(RC.Begin, SomeOldC))
  ...

It happened to end up cleaner to write the tests using direct iterators instead. I can add a unit test specifically for this API or I can wait to add the API until I have the first user?

480 ↗(On Diff #47005)

I guess it can, but there are some places where we walk the parents container and we are really planning to mutate stuff... I was mostly trying to reduce the number of const_casts I have to write. I don't feel very strongly about any of this.

585 ↗(On Diff #47005)

In this case, the postorder optimization doesn't apply.

This comment is about the fact that the DFS over the inverse DAG formed with the 'parents' sets is potentially quite far reaching. We could do some things to try to prune this space in common cases. That's all.

727 ↗(On Diff #47005)

I was originally trying to avoid digging into the Node object. Essentially, to allow the DFS to just find the SCC from the address of the node.

But I'm not sure any more that this is the right tradeoff.

Either way, moving completely away from the map seems like it could be usefully separated into a follow-up change.

lib/Analysis/LazyCallGraph.cpp
334 ↗(On Diff #47005)

For all_of, I tend to agree. But I'm not sure that this:

return std::any_of(C.begin(), C.end(), [&](Node &N) {
  return std::any_of(N.call_begin(), N.call_end(), [&](Edge &E) {
    assert(E.getNode() && "Must have formed a node within an SCC!");
    return ConnectedSet.count(G->lookupSCC(*E.getNode()));
  });
});

Is more readable than:

for (Node &N : C)
  for (Edge &E : N.calls()) {
    assert(E.getNode() && "Must have formed a node within an SCC!");
    if (ConnectedSet.count(G->lookupSCC(*E.getNode()))
      return true;
  }
return false;
358 ↗(On Diff #47005)

I think this should be "corrected the post-order" (which I've made it) but check me.

404 ↗(On Diff #47005)

Yea, much better.

568 ↗(On Diff #47005)

Well, this clearly doesn't change the big-O, but I think it pretty dramatically shifts the average case. Whenever we hit this, we skip visiting all other edges on the "pop" half of the DFS which should be a really significant savings.

Is there anything that would make you more comfortable with it? We do this optimization in two places so I'd like to get it right.

840 ↗(On Diff #47005)

I think you're totally right. Should that go here or in a follow-up patch?

1010 ↗(On Diff #47005)

I thought a lot about this, but I don't think it helps much. Let me see if I can explain why.

Ultimately, the DFS is actually over *edges*, and the edges are fundamentally attached to nodes. We could use the SCC as the "node" in the DFS, but we'd have to include both an edge_iterator and a node_iterator to mark the position in the DFS stack, and we'd still visit exactly the same number of edges.

So while it makes the code a bit awkward, I don't think we really lose anything by directly DFS-ing the nodes, and we get a significantly simpler edge iterator model.

1042 ↗(On Diff #47005)

See above.

1173 ↗(On Diff #47005)

No, there can't (as you indicate below).

1182 ↗(On Diff #47005)

Yea, the IsLeaf is essentially just a debug check. I've made it now in fact just a debug check and left a FIXME about the cost of relying on std::remove rather than knowing if this RefSCC was already a leaf RefSCC.

mcrosier removed a subscriber: mcrosier.Feb 9 2016, 6:24 AM
sanjoy added inline comments.Feb 9 2016, 8:59 AM
include/llvm/Analysis/LazyCallGraph.h
452 ↗(On Diff #47303)

I'd say lets wait till we have a user?

477 ↗(On Diff #47303)

If having the parents be a container of const RefSCC * increases the number of const_casts then what you have here is fine.

582 ↗(On Diff #47303)

Ah, I misread it as inserting an incoming call edge.

724 ↗(On Diff #47303)

Either way, moving completely away from the map seems like it could be usefully separated into a follow-up change.

SGTM. Might save a few hashtable lookups.

lib/Analysis/LazyCallGraph.cpp
336 ↗(On Diff #47303)

As discussed on IRC, the for loop is fine.

360 ↗(On Diff #47303)

SGTM

725 ↗(On Diff #47303)

Is there anything that would make you more comfortable with it? We do this optimization in two places so I'd like to get it right.

Might be helpful to explicitly document that this is a performance optimization then -- I couldn't easily tell if there's something fundamentally different going on here.

840 ↗(On Diff #47303)

SGTM

1010 ↗(On Diff #47303)

we'd still visit exactly the same number of edges.

Wouldn't you be able to skip pushing intra-SCC edges, if you're considering an SCC as a node?

we get a significantly simpler edge iterator model.

This I agree with: for the code to be readable, we'll need to add an outgoing_edges iterator to SCC, that skips intra-SCC edges.

chandlerc updated this revision to Diff 47429.Feb 10 2016, 2:14 AM
chandlerc marked 6 inline comments as done.

More updates to address review comments.

include/llvm/Analysis/LazyCallGraph.h
452 ↗(On Diff #47303)

Fine fine. ;] And I was so happy to have figured out a nice pattern here. Will try to remember it.

lib/Analysis/LazyCallGraph.cpp
725 ↗(On Diff #47303)

I've tried to expand the comments about this in both algorithms, and reference those comments from the place where we do the short-circuit. Let me know if this is helping.

1010 ↗(On Diff #47303)

I'm really not sure this is the right tradeoff... The iterator itself has to carry 2x the state in order to remember where we "paused" in our walk. But we do get to only have 1 frame in the DFS stack for each SCC.

My expectation is that SCCs with >1 node are *quite* rare in practice, and RefSCCs with >1 SCC are somewhat rare (probably under 50%, maybe under 20% in most code). Given that, I suspect that the state would almost always be a bunch of zeros and we wouldn't save a lot of depth on the stack.

But I think this is still something to potentially revisit as we go along. But I'd like to keep the algorithm as-is for now. It's a complex change and there is already too much of that here.

I'd be interested in re-visiting this and trying to see if there is a way to get the best of both worlds -- simple DFS stack and walk over edges, but handle SCCs at once so that we actually skip redundant work in the face of large SCCs.

Ping. I think this patch is getting close?

sanjoy requested changes to this revision.Feb 15 2016, 4:01 PM
sanjoy edited edge metadata.

Some more mostly minor comments:

include/llvm/Analysis/LazyCallGraph.h
52 ↗(On Diff #47429)

Looks like this header is unused.

485 ↗(On Diff #47429)

Minor: I'd be specific about "remain valid till ... (destruction of the parent LCG?)"

497 ↗(On Diff #47429)

SourceN ant? This first sentence sounds malformed.

lib/Analysis/LazyCallGraph.cpp
105 ↗(On Diff #47429)

Here and elsewhere: why not {&TargetN.getFunction(), Edges.size()} instead of an explicit std::make_pair?

110 ↗(On Diff #47429)

Why not (*this)[&TargetF].setKind(EK)?

733 ↗(On Diff #47429)

The doc updates lgtm

788 ↗(On Diff #47429)

I think this can just be ConnectedDepth = DFSStack.size() (with an assert(ConnectedDepth < (int)DFSStack.size())).

1026 ↗(On Diff #47429)

SGTM

This revision now requires changes to proceed.Feb 15 2016, 4:01 PM
chandlerc updated this revision to Diff 48105.Feb 16 2016, 1:54 PM
chandlerc edited edge metadata.
chandlerc marked 5 inline comments as done.

Update to address comments from Sanjoy.

Comments addressed.

include/llvm/Analysis/LazyCallGraph.h
52 ↗(On Diff #47429)

std::pair comes from here.

497 ↗(On Diff #47429)

Cleaned it up, sorry about that.

lib/Analysis/LazyCallGraph.cpp
105 ↗(On Diff #47429)

When I first wrote the code, not all of our compilers supported {} syntax, and this make_pair didn't end up getting completely rewritten. I can update more of them though.

110 ↗(On Diff #47429)

The public interface doesn't expose a mutable edge.

sanjoy accepted this revision.Feb 16 2016, 2:04 PM
sanjoy edited edge metadata.

Looks great!

This revision is now accepted and ready to land.Feb 16 2016, 2:04 PM
This revision was automatically updated to reflect the committed changes.