Page MenuHomePhabricator

[proof of concept] Port old PM CGSCC visitation logic to new PM

Authored by silvas on Jun 30 2016, 11:06 PM.



This patch implements a proof of concept port of the old PM CGSCC visitation logic to the new PM. It also ports all the remaining CGSCC passes (inliner, argpromotion, prune-eh). I have done some basic sanity checks to make sure things work. Tomorrow I'll add a CC1 flag to clang that allows running new-PM passes and give test-suite a shot.

This was done as a sort of mad dash to get a proof of concept; all the changes are meant to be as mechanical as possible. I fully expect to have to redo every change here incrementally and with proper attention to detail, but seeing it all come together and working is important IMO. Especially the port of the inliner can be done much more cleanly than I have done here.

Most of the patch is just porting the passes and threading state through various places.

But pay attention to include/llvm/Analysis/CGSCCPassManager.h
That is where the real business happens. Especially the "run" methods of ModuleToPostOrderCGSCCPassAdaptor and CGSCCToFunctionPassAdaptor. In total, they are less than 100 lines of code which are at the heart of this patch.


Diff Detail

Event Timeline

silvas updated this revision to Diff 62465.Jun 30 2016, 11:06 PM
silvas retitled this revision from to [proof of concept] Port old PM CGSCC visitation logic to new PM.
silvas updated this object.
silvas set the repository for this revision to rL LLVM.
silvas added a subscriber: llvm-commits.

With this experience of doing this proof of concept, I can now say with confidence that porting the old CGSCC visitation order interferes minimally with what Chandler is doing in
(I know this because every change that he will have to undo will be a change that I had to do in preparing that patch; I can say that in fact they were a very small part of preparing this patch.
In fact, there is no reason why Chandler's new logic can't live in parallel with this kind of port of the old logic; the hardest part of that is likely to just be deciding on the name for distinguishing the two.
There is thus little downside to moving forward with porting the old PM visitation logic to the new PM.

I would like to establish consensus for moving forward with porting the old PM visitation logic to the new PM (or discovering a good reason why that would be a bad idea!). The huge advantage of doing this is that it is as NFC as possible.

I'd like to emphasize that there are still significant work items to be done as part of the transition to the new pass manager:
(Please add more bugs there if there is anything else you can think of!)
In order to get to the point that we can delete all these old-PM shims in the middle-end, we have things like optnone/optbisect, bugpoint support, debug-pass=Structure, and pass manager builder (the new PM doesn't know about dependencies which has the potential to make this quite non-trivial).

I'm sure there will be plenty of other yaks to be found when actually putting the new PM through its paces on real production compilation workloads. There are e.g. lingering issues with excessive memory use due to excessive caching and I'm sure plenty of unknown unknowns. Also unexpected perf swings due to new PM vs old PM will be fun to investigate :)
(One thing I discovered while doing this proof of concept port was that the "normal" use of BasicAA in the new PM uses domtree but it doesn't in the old PM; it was causing crashes in some places (see the note in FunctionAttrs.cpp; domtree would crash on declarations there so I had to add an isDeclaration check))

If it's more convenient for people to fetch, I've also pushed my local git branch to github:

silvas updated this revision to Diff 62609.Jul 2 2016, 5:34 PM
silvas removed rL LLVM as the repository for this revision.

Updated patch. I've done some cleanup/refactoring on trunk which generally reduces the diff.

This also includes a fix for
With that fix, this patch now can pass test-suite (build and run) with:

CMAKE_{C,CXX}_FLAGS:STRING=-Xclang "-newpm-passes=module(function(sroa,instcombine,simplify-cfg,jump-threading),globaldce,globalopt,deadargelim,function(sroa,instcombine,simplify-cfg),require<profile-summary>,require<targetlibinfo>,cgscc(prune-eh,inline,function-attrs,argpromotion,function(instcombine,simplify-cfg,instcombine,jump-threading,simplify-cfg)))" -Xclang -newpm-aa-pipeline=basic-aa,scoped-noalias-aa,type-based-aa,globals-aa

I'll continue working towards getting the clang O3 pipeline ported and keep iterating on test-suite.

davide edited edge metadata.Jul 3 2016, 10:12 PM

First round of comments. I feel like this patch is important because it shows up the old semantic can be ported to the new PM with (relatively) little effort. I don't quite understand all the corner cases of the CGSCC pass manager but your patch looks to me kinda mechanical (that's actually great because it means the old code fits in the new model nicely). I'll probably comment further, and take another look. As Chandler spent a fair amount of time on the problem I'd really love to hear what's his feedback/what are his comments. While I think that eventually we're going to land his work probably I feel this is still a valuable incremental step (and NFC).



just remove it ? =)


or maybe move this stuff to the destructor? I think Chandler mentioned that's the equivalent of doFinalization() in the new PM


Can you please exapnd this comment a little bit more?


maybe this can be private?


Do you mind to put a doxygen comment briefly explaining what this struct/class does?


XXX -> FIXME? Here and everywhere else in the patch where it applies


This preserves the CFG, add a FIXME etc etc...

Thanks for the comments Davide. As a quick reminder, the purpose of this patch is just to be a proof of concept that we can all agree is the right direction (or figure out why not). The patch is not intended to be committed as-is to trunk, so I'd like to avoid discussing usual code hygiene / "janitorial" details -- I've deliberately skimped on some of them or done things in a suboptimal way just to keep things mechanical.


I guess I should have said this before, but all the things marked with XXX are things that I wanted to highlight for reviewers. They are just comments about non-obvious things that I thought reviewers would find useful. For example, if somebody is looking at the code of CGPassManager::runOnModule and comparing it with here, this will save them the time of going and looking through the existing passes just to come to the same conclusion.

Some of the XXX comments will likely turn into regular comments, other will just be deleted because they are only relevant to this proof of concept and won't be issues for the real patches that will be committed to trunk (e.g. commented-out code :) ).


It isn't a FIXME. It is a note for reviewers. It will turn into just a regular comment since there isn't really anything to "fix" per se.

silvas updated this revision to Diff 62936.Jul 6 2016, 12:38 PM
silvas edited edge metadata.

Remove workaround for PR28400. I originally misdiagnosed the problem. r274656 is a better fix for the moment.

silvas added a comment.Jul 7 2016, 3:06 PM

Sorry I didn't mention this earlier, but the basic idea here is:

  1. The outer loop of CGPassManager::runOnModule and CGPassManager::RunAllPassesOnSCC goes in ModuleToPostOrderCGSCCPassAdaptor
  2. CGPassManager::RunPassOnSCC goes in the CGSCCToFunctionPassAdaptor

The rest is pretty much just fallout from doing that.

chandlerc edited edge metadata.Jul 7 2016, 7:08 PM

Mostly a high-level comment below on the core of this patch's approach.


This is the core of this change, and it highlights the problem I have with this design. But I want to emphasize something: I freely admit that you can fix the immediate failure modes. I'll even suggest how you could do that. I still think that it is a problematic design though, and hopefully this helps surface why.

So, the key thing here is that you are building "stable" SCCs with identity on the fly by allocating CallGraphSCC objects in the vector of unique pointers. You then use their identity to thread the analysis management layers. Notably, you use it to access the FunctionAnalysisManager via the CGSCCAnalysisManager via the ModuleAnalysis manager.

Both the lookup below of the FunctionAnalysisManager's proxy and the invalidate here use this identity.

However, if you were to run multiple CGSCC pass managers one after the other, they would build different SCCs each time. There is no way to persist this, at the end of the CGSCC pass manager run, the SCCs are going away.

To make this code correct, you should probably use the 'clear' method on the CGSCCAnalysisManager at the end of the CGSCC pass manager run. I think there are some bugs in the current pass manager code you'll hit if you do that. We can write some unit tests that hit these (sorry, I missed them when originally working on it) and fix them. Once fixed, that should make this approach correct AFAICT. However, I don't like some of the consequences.

If we fix the bugs in the 'clear' code path in the new pass manager, this will have the effect of invalidating all function analyses computed thus far (assuming it follows the current design direction of the pass manager thus far). And one of the really big goals of the new pass manager design was to reduce the re-computing of analyses through caching.

The way I see to continue to get caching for function analyses with this approach would be to avoid doing downward invalidation[1]. That is, when an SCC pass invalidates SCC analyses, don't have that trigger invalidation of function analyses. We could probably make that direction work, but it would mean manually tracking function analysis invalidation here and in the module pass manager rather than relying on the wiring between the proxies to do this for us.

So that is what I think it would look like to make the approach in this patch at least functional for the passes as they exist in the tree today.

Beyond that, as Sean has said, this doesn't really have a good mechanism to support SCC-based analyses. They will currently work, but they will be cached against an SCC that isn't being updated, and they won't survive the CGSCC pass manager run or be queryable. Notably, if we want an SCC analysis, we won't be able to query it from a module pass without duplicating the logic here to run the SCC iterator over the module and produce SCCs that we can then query the CGAM with.

Now, we have no need of that today. But I think it is a mistake to design in a different limitation like this.

I understand that your goal in this Sean is to make incremental progress. However, I am personally much more interested in figuring out the long term design for the CGSCC passes and the new pass manager. I think it will create churn to maintain all of the different permutations, as I think even with this approach we will have significant differences and challenges with the new pass manager. And so I think we will end up supporting both the legacy pass manager, and this one, *and* the one with stable SCCs that provides a more direct solution to the analysis challenges outlined above.

[1]: "downward invalidation" is something I had planned to add, but hadn't gotten to yet. It clearly would benefit from being visible in the code though, so I'm going to work on a short patch that adds it and unit tests that show how it would work. Naturally, we can always go a different direction if needed.

silvas updated this revision to Diff 65217.Jul 22 2016, 10:21 PM
silvas edited edge metadata.

Update after r276515

davide resigned from this revision.Dec 14 2016, 9:05 PM
davide removed a reviewer: davide.

While I considered this a reasonable approach, it seems Chandler is going in another direction. Feel free to add me back as reviewer in case/when this unblocks.

reames resigned from this revision.Feb 25 2020, 9:00 AM

Resigning from a stale review (2016). Feel free to re-add if thread ever revived.

silvas abandoned this revision.Mar 25 2020, 6:30 PM