Page MenuHomePhabricator

[PM] Introduce the facilities for registering cross-IR-unit dependencies that require deferred invalidation.

Authored by chandlerc on Nov 29 2016, 3:51 AM.



This handles the other real-world invalidation scenario that we have
cases of: a function analysis which caches references to a module
analysis. We currently do this in the AA aggregation layer and might
well do this in other places as well.

Since this is relative rare, the technique is somewhat more cumbersome.
Analyses need to register themselves when accessing the outer analysis
manager's proxy. This proxy is already necessarily present to allow
access to the outer IR unit's analyses. By registering here we can track
and trigger invalidation when that outer analysis goes away.

To make this work we need to enhance the PreservedAnalyses
infrastructure to support a (slightly) more explicit model for "sets" of
analyses, and allow abandoning a single specific analyses even when
a set covering that analysis is preserved. That allows us to describe
the scenario of preserving all Function analyses *except* for the one
where deferred invalidation has triggered.

We also need to teach the invalidator API to support direct ID calls
instead of always going through a template to dispatch so that we can
just record the ID mapping.

I've introduced testing of all of this both for simple module<->function
cases as well as for more complex cases involving a CGSCC layer.

Much like the previous patch I've not tried to fully update the loop
pass management layer because that layer is due to be heavily reworked
to use similar techniques to the CGSCC to handle updates. As that
happens, we'll have a better testing basis for adding support like this.

Depends on D27197.

Diff Detail


Event Timeline

There are a very large number of changes, so older changes are hidden. Show Older Changes
chandlerc added inline comments.Dec 9 2016, 10:43 PM
119 ↗(On Diff #79533)

I'm so bad at commas... I hope this is better now...

506 ↗(On Diff #79533)

Done and factored into a common routine. Came up with a nice way to have a single implementation that is fast when it can be fast but generic/type-erased when it needs to be.

675 ↗(On Diff #79533)

Reworded to hopefully make this more clear.

431 ↗(On Diff #79533)

I don't actually think of it this way.

I think there is a common underlying idea of a pass, and there are two primary special cases: analysis passes and transformation passes.

I understand that many (most?) analysis passes tend to be trivial and we instead focus on the analysis result and caching it, but I don't want to neglect the fact that it is a pass that gets run over the IR. In that sense, the AnalysisManager *is* a `pass manager as well.

Anyways, I'm happy to spend some time debating this long term, but I'm not sure it's the right focus of this code review....

jlebar added inline comments.Dec 10 2016, 1:10 AM
103 ↗(On Diff #79533)

Anyways, any better API ideas here are very, very welcome. =/

One idea was up earlier in the review:

Could we require that these abstract sets inherit from some type [or otherwise have some way to tell the difference between an analysis and a set of analyses]?

At least then PA could catch some incorrect uses of its API.

(I understand that is solving a different problem than the one I was originally commenting on here.)

In terms of this problem, are you saying that we can neither

a) have an abstract analysis set enumerate its passes, nor can we
b) have an analysis enumerate its abstract analysis sets,

because the only layer that knows about all of the relevant sets and passes does not declare the sets or the passes? If so this seems remarkably fragile, to the point that I would want to step back and consider whether the layering we've imposed is actually helpful -- that is, whether the design space is overconstrained.

TBH I am pretty concerned that nobody other than you and Sean is going to be smart enough to program this correctly. For example, writing

!PA.preserved<CGSCCAnalysisManagerModuleProxy, AllAnalysesOn<Module>>()

requires global knowledge of LLVM that the only analysis set that covers CGSCCAnalysisManagerModuleProxy is AllAnalysesOn<Module>. (Or it somehow requires even more arcane knowledge that AllAnalysesOn<Module> is the only set that we need to enumerate here.) If you get it wrong, things will mostly work, until they don't, so these are not going to be easy bugs to find.

My understanding is that a lot of the design complexity is motivated by a desire to allow outside-tree users to provide new kinds of PMs. That's a laudable goal, but on average I would expect out-of-tree users to have less knowledge of LLVM core than your average core developer, so such an API is really only useful if it's hard to screw up. If we can't make it hard for them to do the wrong thing, and if providing this loose-coupling mechanism adds substantial complexity to our internal design, I am personally not convinced we are making the right design tradeoffs.

431 ↗(On Diff #79533)
I'm not sure it's the right focus of this code review....

I tend to lose state on anything I'm not working on in a week. If you start the discussion in the forum of your choosing Monday, great. If you start a thread in two or three weeks, you will probably still have state, but I may not, and then I will either decide I no longer care, or have to come back and page all this back in. Either way is not fun.

In fact I now vaguely recall that we had an outstanding question from a previous patch that we said we'd discuss outside the review. Maybe we did come back and it's been resolved? I am sort of a goldfish.

Because I've now been thinking about it, let me just say what I have in mind so we can capture it somewhere. I hope that's OK.

the AnalysisManager *is* a `pass manager as well.

I would ask a slightly different question. Instead of "is AM technically a PM?", I'd ask, "is it *useful* to an engineer of average skill to think of AM as a PM, and to think of Analyses as Passes?"

One can even have comments on AM explaining this technicality if it's useful to understand when thinking about the PM/AM framework. But then, is this happenstance of abstraction *so fundamental* that we should also use it everywhere else? Maybe I haven't fully understood the code because I don't grok that an analysis is just a monoid in the category of endofunctors^W^W^W^W^W^Wpass. :)

Personally I think "pass" is a useful word because "optimization pass" was a term I knew before I started working on compilers. But like Sean I am not sure it helps me more than it hurts to think of analyses as the same sort of thing.

A couple suggestions to make this patch more understadable.

75 ↗(On Diff #80994)

The comment on this class needs to be beefed up in response to this patch. The original idea behind PreservedAnalyses was that it was a whitelist of analyses to preserve, so that invalidation was conservatively correct. The addition of NotPreservedAnalysisIDs breaks with this approach and adds a layer of complexity that isn't adequately addressed by the current comment.

109 ↗(On Diff #80994)

This restriction seems like it stems from an implementation detail. Some implementation-level comment should explain its origin.

244 ↗(On Diff #80994)

This restriction about no sets in NotPreservedAnalysisIDs needs to be explained better in the comments here. More generally, there is a clear asymmetry between the PreservedAnalysisIDs and NotPreservedAnalysisIDs that needs to be explained (not just the "rules", but *why* the rules are there). Maybe that is appropriate for the class comment?

103 ↗(On Diff #79533)

In terms of this problem, are you saying that we can neither

a) have an abstract analysis set enumerate its passes, nor can we
b) have an analysis enumerate its abstract analysis sets,

b) is possible. It's just somewhat inconvenient right now because that information is hidden in the invalidate method which is on the analysis result object instead of the analysis itself.

TBH I am pretty concerned that nobody other than you and Sean is going to be smart enough to program this correctly.

For the record, I'm not convinced that I would be able to program this correctly. I don't like the approach that Chandler is taking here for precisely this reason. Explicitly tracking dependencies between analysis results so that there is a clear single point of truth in the analysis manager for the primitive operation "I need to invalidate analysis result X, invalidate all analysis results that depend on it" makes all of this so much easier.

If you haven't read it yet and want to understand the problem of analysis result invalidation better, I highly recommend reading (or at least skimming) the thread "[PM] I think that the new PM needs to learn about inter-analysis dependencies...":
Especially this post:

FYI, working on one API improvement idea and on addressing the tactical comments from Sean that are spot on here, but wanted to reply to the two discussion threads...

103 ↗(On Diff #79533)

Justin wrote:

In terms of this problem, are you saying that we can neither

a) have an abstract analysis set enumerate its passes, nor can we
b) have an analysis enumerate its abstract analysis sets,

I believe that we have (b) -> the analysis enumerates these in its result's invalidate routine.

The problem is that the API for doing this isn't good. I have some ideas about improving the API after thinking more on it. One thing that I did experiment with and continue to dislike is trying to do this *declaratively*. Every version of that I've come up with has been, IMO, much harder to understand.

TBH I am pretty concerned that nobody other than you and Sean is going to be smart enough to program this correctly.

I don't think this is about smarts. =] I think this is largely a problem of documentation and API design. I think your review is helping both of those.

I also think it is important to understand how rarely this complexity will come up. Most analyses will simply:

  1. Use the default invalidation logic which works out of the box, or
  2. Simply declare that they are never invalidated because they are fundamentally immutable or self-updating, or
  3. Implement an invalidate routine that checks a few common sets like 'CFG' in addition to themselves.

Everything else is relatively rare. The next most common case are analyses which embed references to other analyses in their results. Some of these are because it was easy rather than because it was the right design. But some will need to use the Invalidator logic provided in the previous review.

Most of the facilities I'm adding in this patch to be very rarely used. That doesn't mean it gets a free pass of course, it still needs to be clearly documented and have examples that show how to use it and not be easy to misuse in subtle ways.

The facility I am most concerned about (and I called it out, and you called it out) is just letting an analysis result check an additional set or two. That is currently too confusing, agreed, and I'd like a better API for that. I'm experimenting with the idea you suggested Justin and I think it might help, but I can't yet be certain. I'll update the patch if/when I get something interesting.

I also don't think we should strive for perfection in a single patch if there aren't terribly good ideas yet.

Regarding the meta point Sean, I continue to think that unifying the analysis management is the wrong design. I think it creates serious issues when expressing analyses on IR units defined by analyses, which is functionality that I very much want in the design.

431 ↗(On Diff #79533)

Totally good to capture it. =] I'm hoping to discuss this more with you on Monday though and we can kick off some ideas on the list and/or IRC. I wasn't planning on waiting weeks and weeks.

chandlerc updated this revision to Diff 81363.Dec 14 2016, 3:59 AM
chandlerc marked an inline comment as done.

Substantial rework of the documentation and API for the PreservedAnalyses

Ok, two significant changes here:

  1. I've made the sets use a distinct key type from the analyses so they are clearly independent things. I've also separated the APIs dealing with them so that things are more explicit.
  1. Justin and I sat down together to try to at least remove the "magical" aspect of the query API on PreservedAnalyses. The result is a very different API. It is a bit more heavyweight, but now the set relationships can be explicit logical relation ships of ||s and &&s rather than a list of things. This seems both more expressive and also more readable. It seemed to make the code implementing the invalidate method somewhat more comprehensible to Justin at least. But I'd like general feedback on this API design.

As Sean had suggested (thanks!) and then amplified by #2, I've rewritten the high level documentation for PreservedAnalyses and I've written much more comprehensive unit testing. It now clearly needs to live in a separate file IMO, but I want to do that code movement as a follow-up patch if that's OK.

I also spent some time trying to understand why elements of this are confusing, especially at first. I think Justin had a great insight here that the names of the critical component--the proxy analyses--give the reader no good anchor to what each one *is*. Once that is established, understanding the code becomes much easier.

The current plan is to rename the proxies from 'FunctionAnalysisManagerModuleProxy' to 'ProxyModuleAnalysis<FunctionAnalysisManager>'. This does two things that seem to help. One is that it puts 'ModuleAnalysis' early and contiguous in the name anchoring the reader that this is a module analysis first and foremost. The second is that it sinks the thing being proxied into a clearly subordinate position. This too seems like it should be its own patch.

A final thought is that there is currently some duplication of code between the two ProxyModuleAnalysis results' invalidate implementation that it turns out I *can* nicely factor out. I'm happy to do that in this patch or a follow-up, whatever folks prefer. I just wanted to update the patch now that the API rework is in place.

Some further replies below...

75 ↗(On Diff #80994)

I've written new documentation that tries to do this. It may not be good or enough. Let me know how this does and what else I can do here.

109 ↗(On Diff #80994)

It actually is more of an interface and semantic simplification in my mind... What does it mean to abandon a set? Does that abandon even explicitly preserved analyses? I would assume so (that's how analysis abandonment works), but instead we might just remove the set. Spelling that out will be necessary.

Also, supporting abandonment of sets makes the API for querying even more constrained, and the complexity of the interface was one thing that was raised as an unfortunate complexity in this patch.

With the new PreservedAnalyses API the code is simpler but we now can't express abandoned analysis sets reasonably at the interface level if they actually preclude individual analysis preservation.

I'm not sure what documentation would help here though... thoughts?

244 ↗(On Diff #80994)

See above... in some ways the asymmetry is worse (we have it at the typesystem level). But I'm not sure how best to document this. Suggestions would really be helpful here. The why is both "we don't need it, so why add it?" coupled with the fact that adding support for it introduces non-trivial complexity to the API.

Like I said, I'm very happy to add documentation that you think would help in light of the new API, just need to know what and where.

A couple nits. Overall, this is looking a lot better. It's awesome that you sat down with Justin to hash this out.

104 ↗(On Diff #81363)

This example is a bit confusing. Where is PAC used?

113 ↗(On Diff #81363)

What is "name"? Do you mean "analysis" or "analysis key" or something?

chandlerc updated this revision to Diff 81507.Dec 14 2016, 5:54 PM

Address thinkos in the comment spotted by Sean.

chandlerc marked 2 inline comments as done.Dec 14 2016, 5:54 PM
chandlerc added inline comments.
104 ↗(On Diff #81363)

Doh! Good catch....

113 ↗(On Diff #81363)

Yea, I have no idea. I meant what you said - "analysis".

jlebar edited edge metadata.Dec 15 2016, 3:38 PM

This looks good to me, and I'm basically ready to approve the patch, but there are a few new non-comment questions buried in here that I'd like to get resolved first.

675 ↗(On Diff #79533)

With your latest change to the PA interface, can we revert this change and instead do

if (PA.areAllPreserved() || PA.allAnalysesOnSetPreserved<AllAnalysesOn<IRUnitT>>())


72 ↗(On Diff #81507)


73 ↗(On Diff #81507)

I know what you mean, but the "rather than" part doesn't make sense -- *analyses* would never have to enumerate every analysis that's preserved. That's the job of the *transformations*.

86 ↗(On Diff #81507)

to indicate that they preserve

87 ↗(On Diff #81507)

set off "such as its CFG" with parens

90 ↗(On Diff #81507)

", which"

94 ↗(On Diff #81507)

First sentence could be clarified / simplified:

Given a PreservedAnalyses object built up by a transformation, an analysis will typically want to figure out whether it is preserved.

95 ↗(On Diff #81507)

s/are expected to typically be part of sets/are usually covered by one or more sets/

96 ↗(On Diff #81507)

"can" suggests that they have an option, but it's kind of the only choice.

101 ↗(On Diff #81507)

Suggest simplifying this whole para. Just introduce the idea and give the example.

Given a PreservedAnalyses object, an analysis will typically want to figure out whether it is preserved. In the example below, MyAnalysisType is preserved if it's not abandoned, and (a) it's explicitly marked as preserved, (b), the set AllAnalysesOn<MyIRUnit> is preserved, or (c) both AnalysisSetA and AnalysisSetB are preserved.

113 ↗(On Diff #81507)

Not sure this para is necessary with the rewrite above.

117 ↗(On Diff #81507)

Would suggest making this active, like the suggestion above:

You can also ask a PreservedAnalyses object whether all analyses in a particular set are preserved. If *any* analyses have been abandoned, this *always* returns false, because PreservedAnalyses does not have a priori knowledge of which analyses are in which sets.

Alternatively, maybe this isn't necessary to include in this comment at all; it's kind of an edge case, and we don't have to enumerate the whole API here.

133 ↗(On Diff #81507)


Mark a particular analysis as preserved, given a pointer to its AnalysisKey.

or something. The current way of distinguishing between this and the one above -- "a particular analysis" versus "an abstract analysis ID" -- is not facile.

Same below.

157 ↗(On Diff #81507)

"if it's covered" "was previously marked as preserved".

164 ↗(On Diff #81507)

Again here, we are still marking the analysis -- not an ID -- as abandoned. The difference is in what we're *given*, not really what we *do*.

176 ↗(On Diff #81507)

I am not sure either of these comments in the body are necessary, personally. They seem to repeat the code and the function-level comment.

190 ↗(On Diff #81507)

Nit, "not-preserved".

227 ↗(On Diff #81507)

Perhaps this sentence should live on the constructor:

We take an AnalysisKey in our constructor because we need to know ...

I think maybe you had this comment here because you wanted to clarify what preserved and preservedSet return without writing repetitive comments? I think it's probably worth having brief comments there:

/// Returns true if our analysis was not abandoned and (a) the analysis was explicitly preserved, or (b) all analyses were preserved.

/// Returns true if our analysis was not abandoned and (a) the set was explicitly preserved, or (b) all analyses were preserved.

253 ↗(On Diff #81507)

s/in turn//

253 ↗(On Diff #81507)


254 ↗(On Diff #81507)

The prep phrase starting with "for" doesn't make much sense. Maybe say:

You can use this object to query whether an analysis was preserved. See the example in the comment on PreservedAnalysis.

262 ↗(On Diff #81507)

Same comments here.

268 ↗(On Diff #81507)

Suggest being explicit "preserved (and none are abandoned)."

270 ↗(On Diff #81507)

Suggest something like

This lets analyses optimize for the common case where a transformation made no changes to the IR.

280 ↗(On Diff #81507)

Suggest deleting starting with "and" -- it confuses more than it helps.

288 ↗(On Diff #81507)


297 ↗(On Diff #81507)


The analyses and analysis sets that are preserved.

Invariant: A given AnalysisKey is never in both PreservedIDs set and NotPreservedAnalysisIDs.

300 ↗(On Diff #81507)

I think this is pretty obvious, not sure it's necessary to say.

307 ↗(On Diff #81507)

Now can remove

and should not include any synthetic set IDs such as the "all" ID.

because type-safety. Suggest rewriting para to

An analysis cannot be in both PreservedIDs and NotPreservedAnalysisIDs. If an analysis is covered by a set in PreservedIDs but is in NotPreservedAnalysisIDs, we consider it not-preserved. That is, NotPreservedAnalysisIDs always "wins" over analysis sets in PreservedIDs.

560 ↗(On Diff #81507)

", which" and then start a new sentence at "but".

There is a rule for the comma here: If you have a "which/that/who" phrase that is not "narrowing", you almost always offset the phrase with commas. A "narrowing" "which/that/who" phrase restricts the meaning of the phrase before: "My coworker who uses ed is out sick." (No commas, I have more than one coworker.) If the phrase is not narrowing, it gets a comma: "My dad, who is a programmer, lives in CA." (Commas; I have only one dad.)

Like everything in English, it depends, but this one is relatively safe.

562 ↗(On Diff #81507)


Although, is the caller really erasing the types itself? Maybe you can just delete starting with "but".

570 ↗(On Diff #81507)


573 ↗(On Diff #81507)

Not sure we need the sentence starting with "This implementation" -- it seems pretty clear.

117 ↗(On Diff #81507)

Suggest s/without.*//

37 ↗(On Diff #81507)

You don't want to check an AllAnalysesOn here too?

828 ↗(On Diff #81507)

", and" (comma separates independent clauses)

chandlerc updated this revision to Diff 81724.Dec 16 2016, 1:50 AM
chandlerc marked 33 inline comments as done.
chandlerc edited edge metadata.

Update with fixes for Justin's round of comments.

Patch updated with fixes, see responses below.

675 ↗(On Diff #79533)

We can. It will never fire though because we instead do this test before walking all the IR units so that if we have 10million functions we don't query the PA 10 million times for this...

But we already query it for the all preserved.... so it seems silly in retrospect.

And what's more, the all query should really be handled in the allAnalysesOnSetPreserved, making this quite nice now.

We can keep the optimization in the callers that do this in a tight loop.

133 ↗(On Diff #81507)

How about "an analysis type" vs. "an analysis ID"?

157 ↗(On Diff #81507)

No, the *set* isn't even temporal. An abandoned analysis is subtracted from all sets at query time.

164 ↗(On Diff #81507)

Does clear terms (type vs. ID) help here? Or do you really want the verb moved around?

227 ↗(On Diff #81507)

Done but with slightly different wording. See what you think.

297 ↗(On Diff #81507)

Took a slightly different approach but same spirit.

560 ↗(On Diff #81507)

My problem is not that I'm not familiar with the rule about narrowing vs. non-narrowing, it is two-fold:

  1. When writing, I cannot keep both these rules and what I am trying to say in my head.
  1. When writing and reading my own text, I often end up with a very different interpretation of what it means to be a narrowing phrase. I think I read restrictions of meaning into things that English says aren't actually narrowing phrases.

Sorry. =/ I've been failing at this aspect of writing for over 15 years I fear.

37 ↗(On Diff #81507)

I can, I just wasn't bothering because this isn't really tested or updated at all yet. I just made it compile and behave exactly as it did before.

jlebar accepted this revision.Dec 16 2016, 9:37 AM
jlebar edited edge metadata.
jlebar added inline comments.
133 ↗(On Diff #81507)

If all I had were the comments and function signatures, I think I might still find this confusing -- is marking the analysis ID as preserved somehow different than marking a type as preserved? Given the implementation of the first function, it's pretty clear to me, though, so I think it's probably fine if you want to do it this way.

157 ↗(On Diff #81507)

Mark an analysis type as abandoned, removing it from the preserved set even if covered by some other set or previously explicitly marked as preserved.


Mark an analysis type as abandoned, removing it from the preserved set even if it's covered by some other set or was previously explicitly marked as preserved.

To me, these two sentences have no semantic difference -- the first one is just eliding some verbs. It sounds like these two mean something different to you?

Based on these comments, I think maybe you want to get across that abandoning an analysis undoes explicit preservation, but that "implicit" preservation via a set does not undo abandonment. If that's it, how about:

Mark an analysis type as abandoned. An abandoned analysis is not part of the preserved set, even if it is nominally covered by some other set or was previously explicitly marked as preserved.

possibly s/is not part/will not be part/
possibly s/is not part/is not considered part/

If you think that's still not clear, maybe we just need an example; that would make it unambiguous. This a tricky but important API.

164 ↗(On Diff #81507)

I'm fine with the "analysis ID" change; see above for the verb issue.

560 ↗(On Diff #81507)

Heh, okay. Sorry that wasn't helpful.

219 ↗(On Diff #81724)

Kind of runs on with "in order to skip them". I'd set it off in parens, but maybe it's just not necessary:

Both \c preserved() and \c preservedSet() need to check whether the analysis was abandoned, so we take the analysis ID here and cache whether it was abandoned.

Or if you want to reveal fewer implementation details:

A PreservedAnalysisChecker is tied to a particular Analysis because \c preserved() and \c preservedSet() both return false if the Analysis was abandoned.

271 ↗(On Diff #81724)

have been (or "no analysis has been" if you like)

278 ↗(On Diff #81724)


560 ↗(On Diff #81724)

Now that I see this -- do we need this comment at all? It's abundantly clear that this is a helper, and it's a private function.

chandlerc updated this revision to Diff 81932.Dec 19 2016, 2:04 AM
chandlerc marked 4 inline comments as done.
chandlerc edited edge metadata.

Update with even better comments thanks to Justin!

133 ↗(On Diff #81507)

Ah, I think I see a better way of wording this. Try now?

157 ↗(On Diff #81507)

I like your second wording attempt. That really gets at the heart of it I think. I'm happy to add an example if it helps though.

FYI (in case it is confusing) Justin marked this as LGTM already and I've addressed all the comments, so I'm going to land it to unblock further testing.

When Justin is back from vacation, I'll still circle back around with him to make sure the comments are in a state he's happy with.

This revision was automatically updated to reflect the committed changes.