This is an archive of the discontinued LLVM Phabricator instance.

[NewPM] Fix a nasty bug with analysis invalidation in the new PM.
ClosedPublic

Authored by chandlerc on Mar 26 2019, 11:55 PM.

Details

Summary

The issue here is that we actually allow CGSCC passes to mutate IR (and
therefore invalidate analyses) outside of the current SCC. At a minimum,
we need to support mutating parent and ancestor SCCs to support the
ArgumentPromotion pass which rewrites all calls to a function.

However, the analysis invalidation infrastructure is heavily based
around not needing to invalidate the same IR-unit at multiple levels.
With Loop passes for example, they don't invalidate other Loops. So we
need to customize how we handle CGSCC invalidation. Doing this without
gratuitously re-running analyses is even harder. I've avoided most of
these by using an out-of-band preserved set to accumulate the cross-SCC
invalidation, but it still isn't perfect in the case of re-visiting the
same SCC repeatedly *but* it coming off the worklist. Unclear how
important this use case really is, but I wanted to call it out.

Another wrinkle is that in order for this to successfully propagate to
function analyses, we have to make sure we have a proxy from the SCC to
the Function level. That requires pre-creating the necessary proxy.

The motivating test case now works cleanly and is added for
ArgumentPromotion.

Event Timeline

chandlerc created this revision.Mar 26 2019, 11:55 PM

Fix up the CHECK lines in my new test case.

This makes complete sense to me. There is currently no way to invalidate an analysis for a specific IRUnit instance only, right? So not preserving an analysis cross-SCC will invalidate it for all?

This makes complete sense to me. There is currently no way to invalidate an analysis for a specific IRUnit instance only, right? So not preserving an analysis cross-SCC will invalidate it for all?

Correct.

philip.pfaffe accepted this revision.Mar 27 2019, 12:58 PM

Okay. Any plans to improve it? On the other hand, I'm not sure about the impact. We'll probably have to get some profile data first.

Anyways, this patch looks good!

This revision is now accepted and ready to land.Mar 27 2019, 12:58 PM
wmi added a comment.Mar 27 2019, 2:26 PM

Thanks for working on the patch which will fix several build failures found internally! I believe the patch is already after a lot of balancing, and here are just some questions out of curiousity.

In the patch, I saw two targets to achieve: T1. lazy invalidation. T2. maintain pass-manager driven invalidation scheme (mentioned in a comment). And I saw two compromises made, C1. If a SCC changes its IR and invalidate some analysis, the invalidated pass will be propagated not only to its SCC parent, but also to its grandparent and grandgrandparent ... or even SCC not being an ancestor on the call graph if only it is visited after the current SCC. C2. For CGSCC pass other than ArgumentPromotion, like inliner pass, even if it will not touch the IR other than the current SCC like what ArgumentPromotion does, it will still invalidate the analysis of other SCCs after this change.

lazy invalidation seems to be achieved using the UR.CrossSCCPA set, so the purpose of the compromises seems mainly to maintain passmanager driven invalidation scheme. From my understanding, the scheme means invalidation initiated by pass manager instead of single IR Unit, and pass manager can only invalidate the analysis of the IR managed by itself.

I am wondering for C1, Can we have multiple CrossSCCPA instead of one (like a map, every SCC has its own CrossSCCPA) so if only the current SCC hasn't made any change, it can propagate preservedall to its parent SCC. For relieving C2, it seems to require returning PreservedAnalysesCrossSCC instead of PreservedAnalyses for each SCC pass so each pass can set PreservedAnalysesCrossSCC accordingly. That seems not very easy with little churn.

Anyways, this patch looks good!

Thanks, will land it momentarily, but I do want to dig into your questions and Wei's questions here.

Okay. Any plans to improve it? On the other hand, I'm not sure about the impact. We'll probably have to get some profile data first.

I don't see any really good ways of implementing this. See below, as I think Wei was also hitting on this, and I'll try to answer in a bit of detail.

But yes, if we find that this causes a serious drop in available cached results, we can investigate alternatives. My suspicion is that *this* won't be the real problem though, and so I'm not too worried.

In D59869#1445186, @wmi wrote:

Thanks for working on the patch which will fix several build failures found internally! I believe the patch is already after a lot of balancing, and here are just some questions out of curiousity.

In the patch, I saw two targets to achieve: T1. lazy invalidation. T2. maintain pass-manager driven invalidation scheme (mentioned in a comment). And I saw two compromises made, C1. If a SCC changes its IR and invalidate some analysis, the invalidated pass will be propagated not only to its SCC parent, but also to its grandparent and grandgrandparent ... or even SCC not being an ancestor on the call graph if only it is visited after the current SCC. C2. For CGSCC pass other than ArgumentPromotion, like inliner pass, even if it will not touch the IR other than the current SCC like what ArgumentPromotion does, it will still invalidate the analysis of other SCCs after this change.

lazy invalidation seems to be achieved using the UR.CrossSCCPA set, so the purpose of the compromises seems mainly to maintain passmanager driven invalidation scheme. From my understanding, the scheme means invalidation initiated by pass manager instead of single IR Unit, and pass manager can only invalidate the analysis of the IR managed by itself.

I am wondering for C1, Can we have multiple CrossSCCPA instead of one (like a map, every SCC has its own CrossSCCPA) so if only the current SCC hasn't made any change, it can propagate preservedall to its parent SCC.

This seems really appealing, and I thought a bunch about it. I hit some really difficult problems though and don't really see a way to make it work. That said, if you see something I'm missing, please say so. It would certainly be convenient.

So, first, let's think about how to do this while still being lazy -- that is, the pass itself doesn't *eagerly* invalidate anything. Also, I'll use ArgumentPromotion as an example (with the caller being mutated while processing the callee SCC).

What we could do is, as you suggest have a map. The next question is, what key would we put in the map? It turns out this is ... a really tricky question. Let's assume that the caller is in an SCC (that was the case we hit in the wild certainly). In that case, we could put the caller's SCC into the map as a key. This creates a bunch of new problems unfortunately. The inliner or another pass may merge this SCC with other SCCs or split it appart. We'll have to update the map every time, and it may be somewhat complex to do the update. But it is worse than that, the function in question may *move* between SCCs. So we'll also have to be able to think about that form of update. Maybe we can solve this, but it already started to look very unappealing to me in terms of complexity and cost. Then I started thinking about the case where the caller isn't in an SCC *at all* because we haven't built the callers part of the call graph yet! So we would also need a two-level map so that we can add invalidation into SCCs that we form in the future from functions that have not yet been visited.

Ultimately, storing enough information to do the invalidation for individual IR units seems definitely do-able, but not trivial. I think we'd need to have a decent amount of infrastructure to update everything and make sure it stayed in sync as the structure changes. This is a big reason why I didn't try harder to address the issue in this direction. We can go down the road if we end up in practice *needing* this kind of preservation, but it does seem quite difficult.

A different approach would be to invalidate things eagerly. This removes any need to track things, which is nice. But it forces each pass that mutates code outside of its SCC to know all the details of invalidation. It also will perform that invalidation at a surprising step when nothing else would naturally be operating on that unit of IR. It also removes the ability to coalesce multiple invalidation events into a single one. That also seemed quite a high cost.

So this is how I ended up with the compromise here. It seemed like a basically minimal burden to get the most important preservation while still keeping things correct.

For relieving C2, it seems to require returning PreservedAnalysesCrossSCC instead of PreservedAnalyses for each SCC pass so each pass can set PreservedAnalysesCrossSCC accordingly. That seems not very easy with little churn

This is an interestingly different compromise to tackle. I don't quite understand your suggestion (changing the return type of the pass seems like it would be fairly disruptive from an API / code perspective). But I think the idea here is very straight forward. The way I would do it is to have a synthetic "analysis" ID that can be preserved which symbolically represents preserving all other SCCs' analyses. Then, before updating the CrossSCCPA, we could check if that symbolic analysis is preserved. This would allow the inliner to mark in its pass PA.preserve<SomeSymbolicNameHere>() and avoid any extraneous invalidation.

I'm fine adding this if we want it, but it wasn't clear there was any realistic advantage. I don't really expect that many cached analysis results to exist for parent functions. It seemed like the case we found in the wild was much more of the edge case. But if you or any others are seeing more significant caching here, then yes, we should add some symbolic analysis so that we can preserve it.

It also has the nice effect of remaining conservatively correct -- a pass will have to explicitly think about the fact that it is not mutating other SCCs before setting this.

wmi added a comment.Mar 27 2019, 4:02 PM

Thanks for the detailed answer for how to make the choices!

This revision was automatically updated to reflect the committed changes.

Pushed this through our java-based fuzzer, no problems found so far
(though our use of SCC-based passes is very limited).

wenlei added a subscriber: wenlei.May 21 2020, 9:24 AM
wenlei added inline comments.
llvm/trunk/include/llvm/Analysis/CGSCCPassManager.h
832–835 ↗(On Diff #192549)

@chandlerc, was it intentional to have overlap/redundancy between UR.UpdatedC and CWorklist?

Asking because we ran into a pathological case where that redundancy and the one extra pass over a SCC caused significant compile time increase (by couple hours), when we switch a service to use new pass manager. There was no SCC mutation between the last two runs, and we run over that same SCC twice only because of the redundancy. (The one extra last run over it caused its size to grow exponentially, and that slowed down downstream passes)

I tried to tweak this to avoid that redundancy between the two - that resolved the compile time regression from newpm, and all llvm tests passed too (except the analysis counts for CGSCCPassManagerTest needs update, which is expected I think). But wanted to check if that's a sensible approach..

wenlei added inline comments.May 26 2020, 2:04 PM
llvm/trunk/include/llvm/Analysis/CGSCCPassManager.h
832–835 ↗(On Diff #192549)

Here's an attempt/rfc to address this FIXME and the compile time issue we saw with new pass manager: D80589