This is an archive of the discontinued LLVM Phabricator instance.

[PM/LCG] Remove the lazy RefSCC formation from the LazyCallGraph during iteration.
ClosedPublic

Authored by chandlerc on Feb 1 2017, 1:43 AM.

Details

Summary

This behavior turns out to be incompatible with ArgPromotion, which can
delete reference edges in distant ancestors of the current SCC. These
reference edges being removed can cause many problems, but the worst
case is when they are part of the lazy in-flight DFS walk forming the
RefSCCs. When this happens we have no really good ways of updating
everything.

However, the lazy formation of RefSCCs isn't really the most important
part of the laziness here -- that has to do with walking the functions
themselves -- and isn't essential to maintain. Originally, there were
incremental update algorithms that relied on updates happening
predominantly near the most recent RefSCC formed, but those have been
replaced with ones that have much tighter general case bounds at this
point. So we can simplify the entire analysis by having a single
up-front step that builds all of the RefSCCs in a direct Tarjan walk. We
can even easily replace this with other or better algorithms at will and
with much less confusion now that there is no iterator-based incremental
logic involved. This removes a substantial amount of complexity from LCG
and also gives us an easy way to implement ArgPromote -- we already
support removing edges and doing incremental graph updates. In fact,
updates for removing edges are among the most simple cases.

Another advantage of moving in this direction is that it simplifies
testing the system substantially as we no longer have to worry about
observing and mutating the graph half-way through the RefSCC formation.

We still need a somewhat special iterator for RefSCCs because we want
the iterator to remain stable in the face of graph updates. However,
this now merely involves relative indexing to the current RefSCC's
position in the sequence which isn't too hard.

I've used this in a prototype of a more thorough port of ArgPromotion and it
seems to be working well. That part needs a bit more work though to fully
handle the updates and this seems usefully separable.

Depends on D29277.

Diff Detail

Repository
rL LLVM

Event Timeline

chandlerc created this revision.Feb 1 2017, 1:43 AM
chandlerc updated this revision to Diff 87082.Feb 3 2017, 10:56 PM

Update to base on plain master where it can be submitted and remove discussion
of ArgPromote. Also updated to fix a scaling issue by putting some very
expensive asserts behind the appropriate macro guard.

After some experiments I'm going to port ArgPromote in a different way than
anticipated which makes this patch not *strictly* necessary, but a huge
improvement anyways.

davide edited edge metadata.Feb 5 2017, 11:29 AM

I personally like this solution as it strips away a lot of complexity. Some comments while I do some testing on this patch.

include/llvm/Analysis/LazyCallGraph.h
1086–1090 ↗(On Diff #87082)

I think this API deserves a comment, also for consistency with everything else in the file.

lib/Analysis/LazyCallGraph.cpp
1692–1693 ↗(On Diff #87082)

Commented code?

1788 ↗(On Diff #87082)

s/bulit/built/

1797–1798 ↗(On Diff #87082)

Comment on why you need this reverse, maybe?

1810–1811 ↗(On Diff #87082)

"Push the new node into the postorder list " ...
I would also add the fact you're inserting in the indices map (or move this comment down, up to you).

dberlin added inline comments.
lib/Analysis/LazyCallGraph.cpp
1690 ↗(On Diff #87082)

Just to note, you don't actually have to push the stack here.

This will push every node onto the stack.
it's possible to push only the root nodes onto the stack.

This is more memory efficient, and about 25% faster.

This is nuutila's variant, which is easy to implement:

http://www.cse.tkk.fi/~enu/ps/ipl-scc.ps
or
http://www.cse.tkk.fi/~enu/ps/tko-b94-scc.ps
or
the tarjan scc part of https://reviews.llvm.org/D28934

(For the papers, you want algorithm 1, not 2)

I think there's still a bug/infinite loop somewhere as this doesn't terminate on some large inputs I have. I'll investigate and let you know (assuming it's unrelated from the issue Danny pointed out)

davide added inline comments.Feb 5 2017, 11:57 AM
lib/Analysis/LazyCallGraph.cpp
1690 ↗(On Diff #87082)

My understanding of Nuutila is that it's faster if the graph is sparse, and when I spoke and proposed this to Chandler he mentioned the graph maybe not-that-sparse (for some definition of). I never found the SCC computation being a bottleneck (even for large programs), but I like Nuutila's variant so I wouldn't be opposed to that (and I think it's worth a try)

dberlin added inline comments.Feb 5 2017, 12:14 PM
lib/Analysis/LazyCallGraph.cpp
1690 ↗(On Diff #87082)

It's either faster or the same in every case, and given it's switching like two lines, IMHO, i'd do it, but ¯\_(ツ)_/¯

The real speed is unrelated to whether it's a sparse or not. What it's really related to is what happens on trees/dags.

Tarjan's algorithm pushes always
Nuutila's variant only pushes if it's actually a cycle.
So nodes that are parts of trees/dags, will not end up on the stack.

davide added inline comments.Feb 5 2017, 1:28 PM
lib/Analysis/LazyCallGraph.cpp
1690 ↗(On Diff #87082)

STGM =)

davide requested changes to this revision.Feb 5 2017, 2:08 PM

Also, some of the assertions still on by default are waay to slow for large bitcode files (and LTO), O(N^2) to O(N^4) depending on the sparseness of the graph
With https://reviews.llvm.org/P7969 disables the offending assertions in release mode (and enables them only under LLVM_EXPENSIVE_CHECKS)

This revision now requires changes to proceed.Feb 5 2017, 2:08 PM
davide added a comment.Feb 5 2017, 2:10 PM

(this was the reason why it was taking forever to build, FWIW)

chandlerc marked 8 inline comments as done.Feb 6 2017, 1:20 AM

Thanks, comments hopefully addressed and new patch incoming.

include/llvm/Analysis/LazyCallGraph.h
1086–1090 ↗(On Diff #87082)

Sorry, I had meant to go back through all these comments after I got this stuff working and add the ones missing and clean up the ones that were no longer accurate. Only some got done. I've commented more effectively now (i hope) but please tell me if they could be improved further.

lib/Analysis/LazyCallGraph.cpp
1690 ↗(On Diff #87082)

FWIW, if the Nuutila variant can be implement cleanly and shows a performance improvement for some graphs (both of which I assume are true) then of course we should switch. =]

That said, I don't want to switch in *this* patch because my intent here is to very minimally relocate existing code.

I've added this as a FIXME to the missing comment on the routine though. =]

1692–1693 ↗(On Diff #87082)

An assert that I wasn't sure could be implemented generically. Indeed, it can't. That said, we have lots of other asserts and tests that should catch any issue like this, I just forgot to clean it up. Thanks for spotting1

1797–1798 ↗(On Diff #87082)

I've commented. I'm a bit torn. Not sure we should do reverse at all really, but if I don't do reverse, all of the iterations in the tests go in the opposite order and it seemed a lot of churn for no benefit. I think the reason is that we operate on this as a stack and so reversed gives the most unsurprising semantics.

That said, if you disagree I can remove this completely and just update the tests.

chandlerc updated this revision to Diff 87196.Feb 6 2017, 1:20 AM
chandlerc edited edge metadata.
chandlerc marked 3 inline comments as done.

Rebase and address review comments.

Also, some of the assertions still on by default are waay to slow for large bitcode files (and LTO), O(N^2) to O(N^4) depending on the sparseness of the graph
With https://reviews.llvm.org/P7969 disables the offending assertions in release mode (and enables them only under LLVM_EXPENSIVE_CHECKS)

Sorry I missed these the first time around, yes, all of the descendent / ancestor stuff should be behind expensive checks clearly. I thought I had gotten them all but had searched for only one. =] Updated patch coming up which should match what you pasted as well.

chandlerc updated this revision to Diff 87197.Feb 6 2017, 1:26 AM

And this time with all the crazy super-linear assertions disabled outside of
"expensive checks" mode.

chandlerc updated this revision to Diff 87199.Feb 6 2017, 1:54 AM

Another update to actually remove all the support variables for maintaining an
in-flight DFS walk from the class. Somehow forgot to remove these, they're all
dead now.

In entertaining news, if you look carefully you'll spot places where this patch
actually fixes bugs due to failing to correctly manage all these variables. Yay
for a simpler approach!

davide accepted this revision.Feb 6 2017, 10:49 AM

This LGTM now. Thanks for your patience, Chandler!

This revision is now accepted and ready to land.Feb 6 2017, 10:49 AM
This revision was automatically updated to reflect the committed changes.