Page MenuHomePhabricator

[PM] WIP: Introduce basic update capabilities to the new PM's CGSCC pass manager, including both plumbing and logic to handle function pass updates.
ClosedPublic

Authored by chandlerc on Jun 17 2016, 2:01 AM.

Details

Summary

There are three fundamentally tied changes here:

  1. Plumbing *some* mechanism for updating the CGSCC pass manager as the CG changes while passes are running.
  2. Changing the CGSCC pass manager infrastructure to have support for the underlying graph to mutate mid-pass run.
  3. Actually updating the CG after function passes run.

I can separate them if necessary, but I think its really useful to have
them together as the needs of #3 drove #2, and that in turn drove #1.

The plumbing technique is to extend the "run" method signature with
extra arguments. We provide the call graph that intrinsically is
available as it is the basis of the pass manager's IR units, and an
output parameter that records the results of updating the call graph
during an SCC passes's run. Note that "...UpdateResult" isn't a *great*
name here... suggestions very welcome.

I tried a pretty frustrating number of different data structures and such
for the innards of the update result. Every other one failed for one
reason or another. Sometimes I just couldn't keep the layers of
complexity right in my head. The thing that really worked was to just
directly provide access to the underlying structures used to walk the
call graph so that their updates could be informed by the *particular*
nature of the change to the graph.

The technique for how to make the pass management infrastructure cope
with mutating graphs was also something that took a really, really large
number of iterations to get to a place where I was happy. Here are some
of the considerations that drove the design:

  • We operate at three levels within the infrastructure: RefSCC, SCC, and Node. In each case, we are working bottom up and so we want to continue to iterate on the "lowest" node as the graph changes. Look at how we iterate over nodes in an SCC running function passes as those function passes mutate the CG. We continue to iterate on the "lowest" SCC, which is the one that continues to contain the function just processed.
  • The call graph structure re-uses SCCs (and RefSCCs) during mutation events for the *highest* entry in the resulting new subgraph, not the lowest. This means that it is necessary to continually update the current SCC or RefSCC as it shifts. This is really surprising and subtle, and took a long time for me to work out. I actually tried changing the call graph to provide the opposite behavior, and it breaks *EVERYTHING*. The graph update algorithms are really deeply tied to this particualr pattern.
  • When SCCs or RefSCCs are split apart and refined and we continually re-pin our processing to the bottom one in the subgraph, we need to enqueue the newly formed SCCs and RefSCCs for subsequent processing. Queuing them presents a few challenges:
    1. SCCs and RefSCCs use wildly different iteration strategies at a high level. We end up needing to converge them on worklist approaches that can be extended in order to be able to handle the mutations.
    2. The order of the enqueuing need to remain bottom-up post-order so that we don't get surprising order of visitation for things like the inliner.
    3. We need the worklists to have set semantics so we don't duplicate things endlessly. We don't need a *persistent* set though because we always keep processing the bottom node!!!! This is super, super surprising to me and took a long time to convince myself this is correct, but I'm pretty sure it is... Once we sink down to the bottom node, we can't re-split out the same node in any way, and the postorder of the current queue is fixed and unchanging.
    4. We need to make sure that the "current" SCC or RefSCC actually gets enqueued here such that we re-visit it because we continue processing a *new*, *bottom* SCC/RefSCC.
  • We also need the ability to *skip* SCCs and RefSCCs that get merged into a larger component. We even need the ability to skip *nodes* from an SCC that are no longer part of that SCC.

This led to the design you see in the patch which uses SetVector-based
worklists. The RefSCC worklist is always empty until an update occurs
and is just used to handle those RefSCCs created by updates as the
others don't even exist yet and are formed on-demand during the
bottom-up walk. The SCC worklist is pre-populated from the RefSCC, and
we push new SCCs onto it and blacklist existing SCCs on it to get the
desired processing.

We then *directly* update these when updating the call graph as I was
never able to find a satisfactory abstraction around the update
strategy.

Finally, we need to compute the updates for function passes. This is
mostly used as an initial customer of all the update mechanisms to drive
their design to at least cover some real set of use cases. There are
a bunch of interesting things that came out of doing this:

  • It is really nice to do this a function at a time because that function is likely hot in the cache. This means we want even the function pass adaptor to support online updates to the call graph!
  • To update the call graph after arbitrary function pass mutations is quite hard. We have to build a fairly comprehensive set of data structures and then process them. Fortunately, some of this code is related to the code for building the cal graph in the first place. Unfortunately, very little of it makes any sense to share because the nature of what we're doing is so very different. I've factored out the one part that made sense at least.
  • We need to transfer these updates into the various structures for the CGSCC pass manager. Once those were more sanely worked out, this became relatively easier. But some of those needs necessitated changes to the LazyCallGraph interface to make it significantly easier to extract the changed SCCs from an update operation.
  • We also need to update the CGSCC analysis manager as the shape of the graph changes. When an SCC is merged away we need to clear analyses associated with it from the analysis manager which we didn't have support for in the analysis manager infrsatructure. New SCCs are easy! But then we have the case that the original SCC has its shape changed but remains in the call graph. There we need to *invalidate* the analyses associated with it.
  • We also need to invalidate analyses after we *finish* processing an SCC. But the analyses we need to invalidate here are *only those for the newly updated SCC*!!! Because we only continue processing the bottom SCC, if we split SCCs apart the original one gets invalidated once when its shape changes and is not processed farther so its analyses will be correct. It is the bottom SCC which continues being processed and needs to have the "normal" invalidation done based on the preserved analyses set.

All of this is mostly background and context for the changes here. This
is still very much a WIP! I know there are bugs here (tests fail) and
I still need to actively test the core functionality here. I'll be
working on that next. Don't stress too much if things look like bugs or
something looks like it has a basic functionality error. Mostly looking
for feedback on the design, layering, and approach. I'll update ASAP
with fixes and tests.

Depends on http://reviews.llvm.org/D21462

Diff Detail

Repository
rL LLVM

Event Timeline

chandlerc updated this revision to Diff 61073.Jun 17 2016, 2:01 AM
chandlerc retitled this revision from to [PM] WIP: Introduce basic update capabilities to the new PM's CGSCC pass manager, including both plumbing and logic to handle function pass updates..
chandlerc updated this object.
chandlerc added a subscriber: llvm-commits.
silvas added a subscriber: silvas.Jun 17 2016, 3:37 AM

Some initial comments.

You've described a lot of implementation details (and implementation), but haven't described the final intended visitation behavior you are trying to implement (or why). For starters, what's wrong with the old PM CGSCC visitation order that requires doing something fundamentally different in the new PM?

The general idea is clearly to maintain a bottom-up visitation behavior, but since this patch doesn't implement edge addition (which represents devirtualization and so is pretty important) the intended behavior there is unclear. I've put a very specific question about this inline.

lib/Analysis/CGSCCPassManager.cpp
90 ↗(On Diff #61073)

What is the plan here? If I'm running cgscc(foo-cgscc-pass,function(gvn),bar-cgscc-pass) and gvn devirtualizes a call which increases the size of the SCC, what do we do after finishing the function pass manager? Do we:

  1. start back at the beginning with foo-cgscc-pass running on the now-larger SCC?

or

  1. continue on to bar-cgscc-pass running on the now-larger SCC?

or

  1. something else?

Adding subscribers from the recent "Intended behavior of CGSCC pass manager." thread.

eraman added a subscriber: eraman.Jun 17 2016, 10:01 AM
chandlerc updated this revision to Diff 61741.Jun 23 2016, 4:17 PM

Rebase and fix a fundamental bug in the callgraph update due to incorrectly
tracking edges demoted to references. Also clean up the patch itself to avoid
some weird diff churn that was present in the previous patch (sorry about
that).

Responses to questions and more comprehensive testing still coming...

Chandler, thanks for the tremendous effort for trying to make this work. At this moment, I only have very high level comments, so please bear with me.

Can we have a design document describing the underlying algorithm with formal proof (traversing with mutation)? Given the level of complexity involved and the fact there is no prior literature we can refer to, it is important we get it right.

Besides it is also more important to understand what problem the added complexity is trying to solve, why those problems are important to be solved (evidence of missing opportunities in real apps), and why they can not be solved in other ways? The same questions have been raised in Sean's email thread, and there are some interesting discussions there, so may be you can also chime in there?

As far as I can tell, for the new PM transition, the consensus is to make the transition NFC initially, which will greatly lower the bar for the new PM to become the default. If your plan is to make new PM transition depending on this update support in CGSCC pass to be ready, we need to understand why.

thanks,

David

sanjoy added inline comments.Jun 23 2016, 7:37 PM
lib/Analysis/CGSCCPassManager.cpp
170 ↗(On Diff #61741)

Doesn't this also hold if the target SCC is not this SCC?

Digging into some of these comments, will get David's in the next email...

Some initial comments.

You've described a lot of implementation details (and implementation), but haven't described the final intended visitation behavior you are trying to implement (or why). For starters, what's wrong with the old PM CGSCC visitation order that requires doing something fundamentally different in the new PM?

I'll post an update to the patch that adds a much more detailed comment to the header to try and describe what this is trying to achieve at a high level, etc. It includes some of the why, but it doesn't going into very specific details. I'm happy to do so here and add the details you or others find sufficient relevant and not too brittle to go there.

To summarize what is wrong with the old visit order: it misses specific optimization opportunities, and makes it harder to reason about the call graph updates in a principled way. The latter becomes more important when there might be cached analyses attached to the call graph nodes. Consider if we wanted to lift some of the things in FunctionAttrs into an *analysis* over an SCC rather than an attribute. Now it is essential that update to the graph structure invalidate the right set of analyses.

As an example of the kind of missed optimizations, I'm going to add a test case where we successfully deduce readnone with the new pass manager but can't (no matter how many iterations we make) deduce it with the old one. This is because we don't refine the SCC graph structure, and do principled visitation to that graph structure. The examples in the test case are somewhat contrived in order to make good test cases, but they hopefully represent the general pattern that I'm worried about here: we are using the set of functions in the SCC to reason about it as an SCC but not actually updating that set of functions when the code changes.

The general idea is clearly to maintain a bottom-up visitation behavior, but since this patch doesn't implement edge addition (which represents devirtualization and so is pretty important) the intended behavior there is unclear.

So, this *does* implement ref -> call promotion which is what I would expect devirtualization to look like in all cases. See below for details on your question.

The other case is for "out-of-thin-air" call (or ref) insertion such as instrumentation passes might do. Currently this isn't supported but could be added in the future.

lib/Analysis/CGSCCPassManager.cpp
90 ↗(On Diff #61741)

(in case reading this against the current patch, this originally was attached to the FIXME regarding handling adding new calls)

As i somewhat alluded to above, what you describe should be handled by a ref edge turning into a call edge, and the update mechanism should be able to handle that well.

I've added a test case that exercises this with GVN and function-attrs. There is currently a case missed in the new pass manager because we don't have the up-to-four iteration whenever an indirect call turns into a direct call heuristic that the old pass manager has. I'm happy to add that, but I'd like to add it in a follow-up patch. I've marked where in the test case this is missed, and I've demonstrated that in theory this update mechanism is sufficient to handle it by explicitly running function-attrs again and it correctly catches the refinement.

The direct answer to your question is #2: it continues running on the now-larger SCC, detects that we switch from one SCC to another at some point, and re-runs on that SCC to make sure that the refined graph is observed.

170 ↗(On Diff #61741)

Yes, and inside the implementation of switchInternalEdgeToRef, it early exits with an empty range when we hit that scenario.

I can lift that distinction up into this code if you think that would be helpful, the somewhat arbitrary split was that these methods are on a RefSCC, and so it is the RefSCC-easy case that callers have to handle.

Chandler, thanks for the tremendous effort for trying to make this work. At this moment, I only have very high level comments, so please bear with me.

Can we have a design document describing the underlying algorithm with formal proof (traversing with mutation)? Given the level of complexity involved and the fact there is no prior literature we can refer to, it is important we get it right.

I've added an overview to the top of the file that I hope answers a lot of the high level questions.

I don't have a formal proof of this, nor do I really know what you would want to prove here. What's the concern? I am working on building up a set of test cases that I think exercise the kinds of updates. I've not finished with that yet, but so far it seems to be hitting lots of the different aspects of this code.

Besides it is also more important to understand what problem the added complexity is trying to solve, why those problems are important to be solved (evidence of missing opportunities in real apps), and why they can not be solved in other ways?

See my response to Sean and my comments in the updated patch (coming shortly) for some of the details here.

I view this much more about getting the call graph management into a principled state with a path that supports the different goals. The new pass manager's requirement of *identity* for things to support analysis caching was a key element that drove the development of the LCG in the beginning. Without that kind of identity, it becomes extremely hard to reason about a caching approach to analyses. I can't claim its impossible to get something to work, but I tried and was unable to see a clear and clean path that made sense to me, so I worked on building up the infrastructure necessary to have identity.

Once you have identity, you need to be very careful with updates because it makes it easy to transform a "benign" relaxation of the model (the old PM's approach of folding more nodes into the current SCC) into a correctness bug. As a consequence I've tried to take a very principled approach to the call graph update where we can verify and check that the graph structure has a particular shape and structure.

Ultimately, I think this is the right way to design the call graph portion of the pass manager because I think other designs will inevitably run up against the limitations that their relaxed model imposes. In fact, I'd like to generalize and make more principled even the idea of the "reference graph" because I think it hasn't gone far enough yet, but that's for a future iteration.

The same questions have been raised in Sean's email thread, and there are some interesting discussions there, so may be you can also chime in there?

That email thread happened when I didn't have time to read it as it went, and after skimming it I was unable to really understand what open questions there were.

This *has* actually been discussed before, if you look at the discussions that led to the LazyCallGraph design. I don't know that we have a complete record of the discussion, but I've tried to capture the sentiment that folks were left with in terms of what direction the design should go in the comment I added to the top of the file.

None of that is to say that we shouldn't revisit the design decisions if they are wrong of course! I just don't want to give the impression that there was *no* discussion about this. We should still try to figure out if there is a problem with this approach or a better approach that we should take.

As far as I can tell, for the new PM transition, the consensus is to make the transition NFC initially, which will greatly lower the bar for the new PM to become the default. If your plan is to make new PM transition depending on this update support in CGSCC pass to be ready, we need to understand why.

I don't actually agree with this. This has come up several times on several different threads with myself, Hal, Philip and numerous others.

I think it would be *nice* if we could make the transition NFC, but I personally do not believe that is reasonable given the degree of difference between the designs. To me, it is very fundamental that when you have a caching analysis layer that understands the SCC organizational unit of IR, you need a different strategy for handling updates and iteration. Trying to force things to look exactly the same will, IMO, cause the design of the new pass manager to be worse.

That said, we do have to have a way to migrate. My plan has been to simply make the new approach superior in terms of benchmark numbers, and at least not a significant regression in terms of compile time. Given the power of a more call graph aware refinement-driven iteration order, and the compile time savings of caching analysis, I think both of these will be reasonably attainable. However, if we try, and they prove very difficult, I think we will still be able to layer a constraint on top of the current design that more precisely mimics the old behavior in order to ease migration. I don't think it will come to this, and I also think that if it does, having what we think is the principled design (but which needs more work to address regressions) underneath will ensure that the compatibility mode sits cleanly on top and doesn't twist the core in an unfortunate direction.

chandlerc updated this revision to Diff 62157.Jun 28 2016, 5:58 PM

Major update that includes several fixes to correctly respect the call graph
updates as well as the first round of test cases that exercise the update
logic. 'cgscc-iterate-function-mutation.ll' is perhaps especially interesting
because it demonstrates a fundamental property that the old pass manager cannot
handle in its current form and that the new pass manager handles cleanly.

This also adds a lot of asserts and debug logging that helped me debug and fix
many issues.

Still working on more test cases, should be able to move this out of 'WIP'
status really soon though.

silvas added inline comments.Jun 28 2016, 6:33 PM
lib/Analysis/CGSCCPassManager.cpp
90 ↗(On Diff #61741)

As i somewhat alluded to above, what you describe should be handled by a ref edge turning into a call edge, and the update mechanism should be able to handle that well.

Please add test cases exhibiting this.

The direct answer to your question is #2: it continues running on the now-larger SCC, detects that we switch from one SCC to another at some point, and re-runs on that SCC to make sure that the refined graph is observed.

Okay, please add a test case for that. Also add a test case demonstrating that we don't go quadratic on a graph like http://reviews.llvm.org/F2110388

digraph "foo bar" {
  rankdir=LR
  A -> B; B -> A [style=dashed,label="ref"];
  B -> C; C -> B [style=dashed,label="ref"];
  C -> D; D -> C [style=dashed,label="ref"];
  D -> E; E -> D [style=dashed,label="ref"];
}

where function passes manage to devirtualize all the ref edges.

sanjoy added inline comments.Jun 28 2016, 6:41 PM
include/llvm/Analysis/CGSCCPassManager.h
77 ↗(On Diff #62157)

Can you clarify here what happens when there isn't a well defined "bottom"? I.e.

A -> B -> C -> A
A -> C

becomes

A -> B
A -> C
chandlerc added inline comments.Jun 28 2016, 10:01 PM
lib/Analysis/CGSCCPassManager.cpp
327 ↗(On Diff #62157)

As i somewhat alluded to above, what you describe should be handled by a ref edge turning into a call edge, and the update mechanism should be able to handle that well.

Please add test cases exhibiting this.

Er, the next sentence in what I wrote was:

I've added a test case that exercises this with GVN and function-attrs.

I think the updated patch has this test case in it. It is cgscc-observe-devirt.ll

The direct answer to your question is #2: it continues running on the now-larger SCC, detects that we switch from one SCC to another at some point, and re-runs on that SCC to make sure that the refined graph is observed.

Okay, please add a test case for that.

This is the same thing, and covered by the same test case. I was just trying to make sure I directly answer your question as well...

Also add a test case demonstrating that we don't go quadratic on a graph like http://reviews.llvm.org/F2110388 where function passes manage to devirtualize all the ref edges.

I'm not really sure what you want here... In general it is very hard to have a test case in the regression test suite that demonstrates a lack of quadratic behavior -- it typically requires an unacceptably large test case even when the behavior is linear.

There are also a bunch of things that might "go quadratic" in this case. There are FIXMEs in the code for some of these things that I would like to address, but probably don't belong conflated into this patch...

Based on the example you post, I think I've figured out that you are trying to point out a case where we will run the SCC pass manager over the function E as many times as we successfully devirtualize edges somewhere in the SCC containing E in a way that brings a new node into that SCC. If I've understood this correctly, then I agree, and that's a nice find. I think it is unlikely to be a problem in practice, but it is definitely something we would need fixed to finish deploying this, probably with just a cap to limit things as a very large SCC formed in this way seems unlikely to be a practical concern to optimize heavily. Given that, I'm inclined to make a FIXME or note about this rather than trying to address it within this patch as that seems like it would bottleneck things.

Did I understand correctly? Does that approach make sense?

chandlerc updated this revision to Diff 62319.Jun 29 2016, 5:22 PM

Update to resolve some issues, add more tests, and make the tests actually
exercise more logic and exercise it more effectively.

chandlerc added inline comments.Jun 29 2016, 5:30 PM
include/llvm/Analysis/CGSCCPassManager.h
77 ↗(On Diff #62319)

Sanjoy helped me better understand his concern here, and I've added a 'test3_*' collection of functions to cgscc-observe-devirt.ll that model this. The key is that a scalar transform introduces a post-order constraint not previously present, and if we don't visit in order, we will fail to observe the refined context.

In that test case, if we fail to visit these in order, we will fail to deduce readonly for b1, b2, and b3. Before the most recent update, we failed that test. To fix this, we just have to use the *other* effect of switching a ref to a call edge -- the postorder list within the RefSCC may be updated and we need to adjust our visit order to reflect this (as well as revisit the current SCC which may now have refined context for what it calls). I've added this logic (using a worklist that can have the order adjusted which i separated into D21866) and now we correctly handle this case.

Thanks for bringing it up Sanjoy!

chandlerc updated this revision to Diff 62329.Jun 29 2016, 7:28 PM

Add some more test cases (that already worked), asserts (that caught a bug
where we visit things too often), and the necessary plumbing to correctly
filter these extra visits.

At this point, I think the implementation is getting pretty good. I don't have
a lot of significant ideas of what else to test; Sanjoy's example really hit
the large remaining area that needs to be handled correctly.

I'm going to start working on the next layers (the devirt iteration driver and
the inliner).

Some initial comments.

include/llvm/Analysis/CGSCCPassManager.h
169 ↗(On Diff #62329)

Please add specific documentation about how this CGSCCUpdateResult is used by CGSCC passes to communicate the modifications. One possibility for "documentation" may be to add unit tests covering basic patterns like how to update them when removing a function, how to update them when replacing a function, how to update them when adding edges, etc.

But I would also expect at least some basic comments on each member explaining under what scenarios it needs to be used and how.

388 ↗(On Diff #62329)

This is error prone passing in both C and CG. For a given C there is only one valid CG and this signature gives the false impression that there are two degrees of freedom here. You already have the relevant back pointers to fetch the CG as needed so just use that.

chandlerc updated this revision to Diff 68945.Aug 22 2016, 7:30 PM
chandlerc marked an inline comment as done.

Rebase, clean up the description a bit, and add some much belated comments to the update API.

FWIW, I think a lot of the issues here are at least much better understood. I've got a patch out that solves the most pressing of the invalidation problems within the framework of this design. I've got a devirtualization iteration utility posted. I've ported the inliner, and will mail that shortly -- i mostly was trying to test it a bit more heavily before sending it out, but I think that's counter-productive at this point.

There were a couple of specific comments on this patch outstanding that I've tried to address. Sorry for the delay there.

There are still several invalidation problems that need to be fixed. But I've started fixing them, and have the beginnings of patches, but I'm having a lot of trouble testing the fixes effectively. I'd like to do a fairly substantial refactoring of the unit tests, but I'd really rather do that *after* this lands so that I don't have to refactor the unittests twice essentially (all of the APIs will change with this patch).

Hoping this patch is in a state where folks can review again. I know the algorithms are pretty long, and there may even be bugs there that we'll have to fix as more testing lines up, but so far the testing of this particular part of the change seems to be working out fairly well -- I have way more bugs I need to fix in invalidation than in the graph update so far. =D

include/llvm/Analysis/CGSCCPassManager.h
169 ↗(On Diff #62329)

Yea, I had forgotten to do this for a long time because I kept changing what the members of this were. I never went back and documented them once things settled into a final form. Sorry about that.

388 ↗(On Diff #62329)

Unfortunately, there aren't a lot of great alternatives.

This replaces an already error prone pattern where the pass would immediately grab the analysis by calling 'getCachedResult' and asserting that it got a non-null pointer back. By passing in a reference to the deeply fundamental analyses I think it makes it more clear to the caller that these analyses have to be provided.

We could package all of these arguments inside a struct (possibly with an up-pointer in the SCC, but possibly with some other struct that bundles them). But I'm not sure that will really be a net improvement. I expect that functions which give this parameter a name at all would want to unpack any such struct immediately.

It is perhaps a bit weird that this API is really designed to optimize for implementor, but in a sense it is. This API gets implemented for each pass, but called in only a very few places.

Still, I'll definitely add some comments here to help make it very clear what the expected relationship is between the arguments.

sanjoy accepted this revision.Aug 23 2016, 7:06 PM
sanjoy edited edge metadata.

Looks good overall, some minor comments inline. Feel free to push after fixing (or address them post-commit).

include/llvm/Analysis/CGSCCPassManager.h
42 ↗(On Diff #68945)

It wasn't clear what you meant by "denormalized indirect references".

160 ↗(On Diff #68945)

s/rammifications/ramifications/

292 ↗(On Diff #68945)

Why do you do this nested scheme instead of queueing all of CG.postorder_ref_sccs() into RCWorklist? The answer is probably worth a comment.

437 ↗(On Diff #68945)

Shouldn't this be "across every function in the SCC."?

include/llvm/Analysis/LazyCallGraph.h
585 ↗(On Diff #68945)

Document what you're returning? Especially your assumption that the first node in the returned range is the SCC containing the source node.

Might be worth it to add a unit test for this new behavior (no need to block this patch on that though).

lib/Analysis/CGSCCPassManager.cpp
47 ↗(On Diff #68945)

Range for?

86 ↗(On Diff #68945)

I understand this this is a local utility, but I still think it will be helpful to add one or two lines about what it does (or use a more descriptive verb than "process").

92 ↗(On Diff #68945)

Not sure if the default for DebugLogging adds any value.

111 ↗(On Diff #68945)

This confused me for a moment -- how about calling the return value NextC?

145 ↗(On Diff #68945)

I found these names a little off, I'd have called them PromotedRefTargets and DemotedCallTargets instead, since they're ref targets that were promoted and call targets that were demoted respectively.

199 ↗(On Diff #68945)

Might be worth adding a Edge::getKind helper to use here.

240 ↗(On Diff #68945)

In processNewSCCRange you assume that the SCC containing N will be the first one in the post-order, but you don't assume the corresponding fact for RefSCC s here. Why do we have this asymmetry?

244 ↗(On Diff #68945)

s/Enqueing/Enqueuing/

307 ↗(On Diff #68945)

I'd s/valid/precise/ here (I can't imagine a real-world analyses that would be _incorrect_ after splitting out SCCs).

328 ↗(On Diff #68945)

s/Enqueing/Enqueuing/

329 ↗(On Diff #68945)

Enqueue

This revision is now accepted and ready to land.Aug 23 2016, 7:06 PM
chandlerc marked 12 inline comments as done.Aug 24 2016, 12:11 AM

Thanks so much Sanjoy! I've tried to essentially address all of these (with a couple of exceptions where I'd like further guidance on how to best address them in follow-up commits below).

Landing now, and will be preparing some follow-up patches and the next significant chunk of functionality for review.

include/llvm/Analysis/CGSCCPassManager.h
42 ↗(On Diff #68945)

Sure, I've spelled this out better in the comment, just let me know if it still isn't sufficietnly clear.

292 ↗(On Diff #68945)

Commented. For easy discovery, the crux is that CG.postorder_ref_sccs() is lazy, and we want to be as lazy as we can be. The worklist exists for when we discover new RefSCCs during transformations.

include/llvm/Analysis/LazyCallGraph.h
585 ↗(On Diff #68945)

Gah, thought I had done that. Thanks for spotting it.

Documentation updated, and yea I'll get this covered by a unit test. (It *is* covered by the pass manager tests as well, but a unit test would be really good here.)

lib/Analysis/CGSCCPassManager.cpp
86 ↗(On Diff #68945)

Indeed, i've already written code that wanted to use it again. I've both tried to give it a better name and improved the comments.

92 ↗(On Diff #68945)

I don't think it does either. I'd really like to move all of the logging to use DEBUG and the DEBUG_PASS infrastructure. But I didn't want to do that in this change and the code already has the DebugLogging threaded through really pervasively.

111 ↗(On Diff #68945)

Hmm, but it's not the next C... its the "current" C. Or the component containing N.

Let's keep chatting to see if there is a better name, and I'll fix this up with whatever we come up with.

145 ↗(On Diff #68945)

Agreed.

240 ↗(On Diff #68945)

My memory is because of the nature of how we split SCCs vs. how we split RefSCCs. For the former we use the existing postorder sequence to constrain the precise returned order. For the RefSCCs, while the order is deterministic, it isn't predictable in the same way, and so we might have sibling RefSCCs and the node might end up in any particular one.

I think this would be worth fixing so that we can make a symmetric assumption. It may even already happen to be true. But I would want to spend some considerable time re-examining the RefSCC splitting algorithm to ensure this is by design and actually something we can guarantee. And we'd need a bunch of testing for it.

So, maybe this just needs a FIXME or a comment. What do you think? Where would you put them?

This revision was automatically updated to reflect the committed changes.
chandlerc marked 5 inline comments as done.

Replied to comments.

lib/Analysis/CGSCCPassManager.cpp
92 ↗(On Diff #68945)

I meant passing in false for the default value of DebugLogging. Looks like you explicitly pass in DebugLogging for both the places that call processNewSCCRange so the default isn't needed.

111 ↗(On Diff #68945)

I was reading this as a "transition function" that returns the new state of the algorithm (so it returns the "next" state).

Passing in LazyCallGraph::SCC *&C and updating C in place is fine too, in which case I'll read the function as "destructively modify these variables to reflect that we have a set of new SCC s".

However, this is not a big deal, and using C is fine here.

240 ↗(On Diff #68945)

A comment noting that this was intentional when written and not an oversight will be great.