This is an archive of the discontinued LLVM Phabricator instance.

[PM] Extend the explicit 'invalidate' method API on analysis results to accept an Invalidator that allows them to invalidate themselves if their dependencies are in turn invalidated.
ClosedPublic

Authored by chandlerc on Aug 19 2016, 8:33 PM.

Details

Summary

Rather than recording the dependency graph ahead of time when analysis
get results from other analyses, this simply lets each result trigger
the immediate invalidation of any analyses they actually depend on. They
do this in a way that has three nice properties:

  1. They don't have to handle transitive dependencies because the infrastructure will recurse for them.
  2. The invalidate methods are still called only once. We just dynamically discover the necessary topological ordering, everything is memoized nicely.
  3. The infrastructure still provides a default implementation and can access it so that only analyses which have dependencies need to do anything custom.

To make this work at all, the invalidation logic also has to defer the
deletion of the result objects themselves so that they can remain alive
until we have collected the complete set of results to invalidate.

This also requires that we not do the (very questionable) thing where we
mark invalidated analyses as preserved. I originally thought this would
make sense, but it really seems confusing and impractical. The only
reason to do it was to avoid re-invalidating analyses endlessly when
crossing IR-unit boundaries between analysis managers. I've solved that
in the previous patch by just having an IR-unit set, and having pass
manager mark that entire set as preserved. This seems dramatically
simpler and more robust anyways. And as a consequence, we lose nothing
by removing the constant marking of invalidated analyses as preserved.

A unittest is added here that has exactly the dependency pattern we are
concerned with. It hit the use-after-free described by Sean in much
detail in the long thread about analysis invalidation before this
change, and even in an intermediate form of this change where we failed
to defer the deletion of the result objects.

There is an important problem with doing dependency invalidation that
*isn't* solved here: we don't *enforce* that results correctly
invalidate all the analyses whose results they depend on.

I actually looked at what it would take to do that, and it isn't as hard
as I had thought but the complexity it introduces seems very likely to
outweigh the benefit. The technique would be to provide a base class for
an analysis result that would be populated with other results, and
automatically provide the invalidate method which immediately does the
correct thing. This approach has some nice pros IMO:

  • Handles the case we care about and nothing else: only *results* that depend on other analyses trigger extra invalidation.
  • Localized to the result rather than centralized in the analysis manager.
  • Ties the storage of the reference to another result to the triggering of the invalidation of that analysis.
  • Still supports extending invalidation in customized ways.

But the down sides here are:

  • Very heavy-weight meta-programming is needed to provide this base class.
  • Requires a pretty awful API for accessing the dependencies.

Ultimately, I fear it will not pull its weight. But we can re-evaluate
this at any point if we start discovering consistent problems where the
invalidation and dependencies get out of sync. It will fit as a clean
layer on top of the facilities in this patch that we can add if and when
we need it.

Note that I'm not really thrilled with the names for these APIs... The
name "Invalidator" seems ok but not great. The method name "invalidate"
also. But I've not come up with any better naming patterns.

I'm working on the actual fixes to various analyses that need to use these, but
I want to try to get tests for each of them so we don't regress. And those
changes are seperable and obvious so once this goes in I should be able to roll
them out throughout LLVM.

Event Timeline

chandlerc updated this revision to Diff 68762.Aug 19 2016, 8:33 PM
chandlerc retitled this revision from to [PM] Extend the explicit 'invalidate' method API on analysis results to accept an Invalidator that allows them to invalidate themselves if their dependencies are in turn invalidated..
chandlerc updated this object.
chandlerc added a subscriber: llvm-commits.
silvas added a subscriber: silvas.Aug 21 2016, 8:56 PM

This is a big improvement on the current state of things and can probably get rid of most if not all of the insufficient invalidation issues I was running into without a big refactor to pass stuff into the query path of a ton of analyses.

This is somewhat limited because it only works within a single IRUnit. So it won't cover cases like:

  • AAManager (function analysis) can be holding a pointer to GlobalsAA (module analysis)
  • LoopAccessAnalysis (loop analysis) holds pointers to various function analysis
  • Loop analyses implicitly hold pointers to Loop objects owned by LoopAnalysis. (analogously for CGSCC analyses and LazyCallGraph)

Those will likely not be an issue when running a standard pipeline as long as we don't have any transformations that preserve the inner proxies. E.g. during standard pipelines, when we invalidate module analyses and thus GlobalsAA, as long as transformations don't preserve the module to function proxy, the proxy will clear out the inner analysis manager.

In other words, the approach in this patch depends on having the for-loop in PreservedAnalyses invalidate(IRUnitT &IR, PreservedAnalyses PA) iterate over all analysis results that may need to be invalidated; however, since it is only iterating over the analysis results cached on a single IRUnit, any cross-IRUnit dependencies will not be correctly handled. (the invalidation of the inner proxies will invalidate analysis results cached on any inner IRUnitT's, so the main concern here is analysis results that hold pointers to analyses cached on their outer IRUnitT's or the their same IRUnitT)

IIRC, LoopAccessAnalysis was the only analysis that had data-structure-dependencies on other analyses (it holds SCEV* and Loop*) and so the cross-IRUnit case is probably worth thinking about long-term, even though currently I think we'll get away with it LoopAccessAnalysis getting invalidated by the function to loop proxy during standard pipelines.

FYI I'll be on vacation for the next two weeks, so feel free to move forward with this if others are onboard with the idea. I think this is a nice improvement.

This is a big improvement on the current state of things and can probably get rid of most if not all of the insufficient invalidation issues I was running into without a big refactor to pass stuff into the query path of a ton of analyses.

This is somewhat limited because it only works within a single IRUnit. So it won't cover cases like:

  • AAManager (function analysis) can be holding a pointer to GlobalsAA (module analysis)

Yea, I'm working on a follow-up patch that should address this. It's less pretty, but the fact that it is so much less common makes me hesitant to engineer a more heavyweight solution.

  • LoopAccessAnalysis (loop analysis) holds pointers to various function analysis

I'm specifically trying to end up with a solution that will work equally well here and for function analyses using module analyses.

I'll be looking at the loop stuff next though to help validate this or uncover any further issues.

  • Loop analyses implicitly hold pointers to Loop objects owned by LoopAnalysis. (analogously for CGSCC analyses and LazyCallGraph)

Both of these I think will be handled by the proxies, but it is certainly something that needs to be dealt with.

sanjoy removed a reviewer: sanjoy.
chandlerc updated this revision to Diff 69191.Aug 24 2016, 5:55 PM

Rebase and incorporate the changes to the CGSCC layer. No substantial changes
though.

Ping and adding Justin explicitly as we talked about this at the last social...

chandlerc updated this revision to Diff 72446.Sep 26 2016, 1:19 AM

Rebase and ping...

chandlerc updated this revision to Diff 73753.Oct 6 2016, 3:29 AM

Rebase and ping...

jlebar edited edge metadata.Oct 9 2016, 12:50 PM

I am still trying to understand why we've designed it this way, but here are some comments on comments.

include/llvm/IR/PassManager.h
357

I can't figure out what "half of a bijection" is, and this phrase apparently appears nowhere on the internet other than in this file. Can we rephrase?

357

Comma before "and" -- new independent clause.

361

An IRUnitT is not necessarily a function pointer, nor a pointer to an llvm::Function, right?

364

same here wrt "function pointer"

365

This comment in particular, but also the one above, just sort of spells out in English what the type is. It's definitely helpful to tell me what "void*" is, but you could write that as

std::pair</* AnalysisID */ void*, IRUnit *>

or even

using AnalysisID = void*;
std::pair<AnalysisID, IRUnit *>

Once we've made the code clearer in this way, perhaps we could use the comments to say something about what these types are for.

373–424

comma before "to"

377

Rephrase beginning at "and" -- bad parallelism.

378

Nit, we don't really pass the API per se. "We pass in an Invalidator", perhaps?

380

Rephrase starting with "rather than". I think this is important, because it's kind of the motivation for the whole design, but I don't grok what you're trying to get across.

395

"look up" is still the verb form. I am losing this fight, but I have at least a few more years before the space is gone forever. (The space makes the written form of the verb match the spoken form, which has two accented syllables -- the spoken noun form has only one accented syllable -- "LOOKup TAble", versus "LOOK UP 'TAble' in the dictionary".)

570

Kind of runs on -- split into two sentences?

Track whether each pass's result is invalidated.  Memoize the results using the IsResultInvalidated map.
575

type-erased

575

s/where/whereas/

575

s/but/and

576

again s/where/whereas/ (or "while") would help, I think.

Or maybe just

This is basically the same thing as Invalidator::invalidate, but we can't call it here because we're operating on the type-erased result.  Moreover if we instead called invalidate() directly, it would do an unnecessary lookup in ResultsList.

Or maybe even leave out that second sentence, because it's a premature optimization. :)

584

s/invalidator/Invalidator/?

594

No need to set up a contrast, I think. Maybe simply

Now erase the results that were invalidated.
604

(I don't know why we have both a list and a map when we have things like MapVector, but...okay, let's assume there's a good reason for this. :)

include/llvm/IR/PassManagerInternal.h
110

Based on this comment I have no idea what I am supposed to do with the InvalidatorT object.

lib/Analysis/CGSCCPassManager.cpp
74

Invaliadtion

(I wonder how hard it would be to make a good C++ spellchecker. You could imagine a tool that spellchecks only comments, and doesn't flag tokens that appear elsewhere in the code (outside comments)... Catching these typos doesn't seem like a good job for hoomans.)

((Now I'm thinking about using whatever tech Chrome's online spellchecker -- which is hauntingly accurate -- uses, not just to catch typos in comments, but also bugs in code...))

77

I don't understand this comment, particularly "with a set".

unittests/IR/PassManagerTest.cpp
392

s/analyses/analysis/

393

Kind of a run-on sentence, maybe split?

426

Can this constructor just take an std::function instead? That would mean you have to spell out the type of the std::function twice, but I think it would be more explicit.

450

Nit, missing period.

458

The "for one function invalidate" form is awkward for me. Maybe this would be easier to read as a bulleted list:

Next, invalidate
  - the indirect analysis for g(),
  - the underlying analysis for h(), and
  - both analyses for f().

It might also make sense to rejigger the code so that we can change the comment to read "f()", "g()", and "h()".

469

comma before "forcing"

479

ecah

483

sp, indiract

is invalidated

484

Not sure "total" is helpful.

488

Again not sure the second "total" helps.

jlebar added inline comments.Oct 9 2016, 9:46 PM
include/llvm/IR/PassManager.h
591

WRT the names:

I agree "invalidate" isn't a great name, because

if (!my_pass->invalidate(...)) {
  // my_pass was not invalidated!
}

checkIfInvalidated? onIRChange? notifyIRChanged? I like the first one because the bool return value sort of makes sense.

This would also change the comment above -- we're not really "trying" to invalidate the result. More informing it that something has changed, and giving it the option to mark itself as invalid.

I still don't really understand what we're supposed to do inside the invalidate function, so I'm not sure about the Invalidator class name. The only example I have is

bool TestIndirectFunctionAnalysis::invalidate(Function &F, const PreservedAnalyses &PA,
                FunctionAnalysisManager::Invalidator &Inv) {
  return !PA.preserved<TestIndirectFunctionAnalysis>() ||
         Inv.invalidate<TestFunctionAnalysis>(F, PA);
}

It naively seems to me that it would be better to structure things so that we never run this function if PA.preserved<TestIndirectFunctionAnalysis>(). The Invalidator could ensure this. Then our function could be

bool checkIfInvalidated(...) {
  return Inv.checkIfInvalidated<Dep1>() || Inv.checkIfInvalidated<Dep2>();
}

which actually makes sense to me. I think ideally Inv.checkIfInvalidated would take zero parameters, since it's just boilerplate, but maybe it should still take the Function/Module/whatever. We'd at least be able to get rid of the PreservedAnalyses argument, I think?

silvas added inline comments.Oct 13 2016, 2:11 AM
include/llvm/IR/PassManager.h
591

Currently, the invalidate method is a way to opt out of being invalidated (e.g. for TTI which knows it only depends on the target).

However, this cannot trigger invalidation of additional analysis result objects, which is what is needed when one analysis result holds pointers to another analysis results in order to avoid use-after free errors[1]. Hence the addition of the extra invalidation machinery in this patch. (this patch is by no means a comprehensive solution; I mentioned some problem areas in the initial comment in this review).

The Invalidator that is added in this patch basically provides a constrained interface to the analysis result list so that analysis result objects can transitively find and check whether any analysis result objects that they hold a pointer to have been invalidated. This allows invalidation to propagate from dependencies to dependents and trigger additional invalidation, avoiding use-after-free. (for layering reasons, analyses cannot know what will depend on them, but only what they depend on, so to do this with object recursion requires this process to be top-down dependent-to-dependency relying on memoization, rather than bottom-up dependency-to-dependent with a direct traversal)

Also, unfortunately, it is not entirely practical to avoid calling the invalidate method when PA.preserved<FooAnalysis>() is true. The difficulty again occurs when analysis result objects for FooAnalysis hold pointers to other analysis result objects. In that case, transformation passes would have to call PA.preserve<BarAnalysis>() for every analysis that FooAnalysis result objects transitively hold a pointer to (otherwise there would be a dangling pointer / use-after-free error when the BarAnalysis result object is invalidated due to not being in the "preserved" set). There is actually an additional difficulty due to AliasAnalysis, which depends on a dynamic (but fixed for a given pass pipeline invocation) set of type-erased analyses; hence there is no static set of PA.preserve<...>() calls which is necessarily correct.

I don't think that the PreservedAnalyses argument can be eliminated, because the !PA.preserved<FooAnalysis>() check is essentially the base-case of the object recursion facilitated by the Invalidator. I.e. it determines the roots from which invalidation must be propagated from dependency to dependent (although for layering reasons the object recursion goes from dependent to dependency, with memoization to keep the complexity under control).

[1] This is actually pervasive. Running any non-trivial pipeline on any non-trivial piece of code will currently trigger numerous use-after free errors with the new pass manager.

chandlerc marked 27 inline comments as done.Nov 22 2016, 2:39 PM

Whew, finally through this. Thanks so much Justin for the review. Responses to a few issues inline, and an updated patch on its way.

include/llvm/IR/PassManager.h
361

Just a stale comment.

365

True, the comment here is just trying to explain the type itself. The members below that use them I think have more helpful comments?

I could try to clean up the type names with aliases and remove the comments if you like. But maybe in a follow-up patch? It seems orthogonal to what this is trying to do.

I've cleaned up the comments very minorly here, but not in any way that substantively addresses this.

380

I've tried to clarify this. Keep complaining until it makes sense though.

576

I like your wording much better than mine. Stolen.

And I still want to document that we're dodging a map lookup -- if we go back, the author should be aware of this even if it is worth the tradeoff.

591

First and foremost, I'm down with renaming this, but I'd really like that to be a separate patch. It'll be a ton of boring changes that I don't want to try and juggle in this commit.

Second, regarding the preserved analysis argument and the IR unit argument: I think they both have good reasons to exist.

  1. Transformations may want to preserve an entire set of analyses, and analyses should query in the preserved analyses to see if a set that covers them has been preserved even if the individual analyses has not been.
  2. Analyses may want to check trivial information about the IR to determine from first principles that they have been preserved.

And as Sean points out, this is really just about adding the first of several pieces of infrastructure necessary to make invalidation reliable.

I'd like to brainstorm different names for this outside the review though to see if there is a clearly superior alternative.

include/llvm/IR/PassManagerInternal.h
110

Yea, I failed to update this. Fixed.

lib/Analysis/CGSCCPassManager.cpp
77

Tried better words.

unittests/IR/PassManagerTest.cpp
458

Not sure what rejiggering you had in mind here, but the comment seems sane now?

chandlerc updated this revision to Diff 78952.Nov 22 2016, 2:40 PM
chandlerc marked 3 inline comments as done.
chandlerc edited edge metadata.

Update with fixes to Justin's comments.

chandlerc updated this revision to Diff 79130.Nov 23 2016, 11:22 AM

Rebase now that we're using more clear types for the static key which is used
to compute an analysis ID.

chandlerc updated this revision to Diff 79386.Nov 28 2016, 3:30 AM

Rebase, especially picking up some more underlying refactorings and
improvements that simplify this patch.

jlebar accepted this revision.Nov 28 2016, 11:17 AM
jlebar edited edge metadata.

This looks good to me!

include/llvm/IR/PassManager.h
357

Is "pass ID" still the right way to describe the first element of the pair?

385

Should we comment what this returns? It's not obvious to me from the name, and the body is nontrivial.

591

I think I understand why you want the three args; thanks Sean and Chandler for the explanation.

I'd like to brainstorm different names for this outside the review though to see if there is a clearly superior alternative.

sgtm. What is the right venue for this discussion? We can seed it with my comment from above.

include/llvm/IR/PassManagerInternal.h
106

s/analises/analyses'/?

unittests/IR/PassManagerTest.cpp
458

Yes, seems sane now.

This revision is now accepted and ready to land.Nov 28 2016, 11:17 AM
chandlerc marked 2 inline comments as done.Nov 28 2016, 2:08 PM

Thanks so much! Submitting with suggested changes and getting the next patch in my queue ready for review.

include/llvm/IR/PassManager.h
357

switched to 'analysis ID' which seems more clear.

385

Yea, this was just completely missing documentation. Added. Happy to tweak and/or improve in follow-up commits.

591

I'll chat with you and maybe others on IRC first to get some ideas, and then I'll send a dedicated patch to do this.

This revision was automatically updated to reflect the committed changes.
chandlerc marked an inline comment as done.