Rather than recording the dependency graph ahead of time when analysis
get results from other analyses, this simply lets each result trigger
the immediate invalidation of any analyses they actually depend on. They
do this in a way that has three nice properties:
- They don't have to handle transitive dependencies because the infrastructure will recurse for them.
- The invalidate methods are still called only once. We just dynamically discover the necessary topological ordering, everything is memoized nicely.
- The infrastructure still provides a default implementation and can access it so that only analyses which have dependencies need to do anything custom.
To make this work at all, the invalidation logic also has to defer the
deletion of the result objects themselves so that they can remain alive
until we have collected the complete set of results to invalidate.
This also requires that we not do the (very questionable) thing where we
mark invalidated analyses as preserved. I originally thought this would
make sense, but it really seems confusing and impractical. The only
reason to do it was to avoid re-invalidating analyses endlessly when
crossing IR-unit boundaries between analysis managers. I've solved that
in the previous patch by just having an IR-unit set, and having pass
manager mark that entire set as preserved. This seems dramatically
simpler and more robust anyways. And as a consequence, we lose nothing
by removing the constant marking of invalidated analyses as preserved.
A unittest is added here that has exactly the dependency pattern we are
concerned with. It hit the use-after-free described by Sean in much
detail in the long thread about analysis invalidation before this
change, and even in an intermediate form of this change where we failed
to defer the deletion of the result objects.
There is an important problem with doing dependency invalidation that
*isn't* solved here: we don't *enforce* that results correctly
invalidate all the analyses whose results they depend on.
I actually looked at what it would take to do that, and it isn't as hard
as I had thought but the complexity it introduces seems very likely to
outweigh the benefit. The technique would be to provide a base class for
an analysis result that would be populated with other results, and
automatically provide the invalidate method which immediately does the
correct thing. This approach has some nice pros IMO:
- Handles the case we care about and nothing else: only *results* that depend on other analyses trigger extra invalidation.
- Localized to the result rather than centralized in the analysis manager.
- Ties the storage of the reference to another result to the triggering of the invalidation of that analysis.
- Still supports extending invalidation in customized ways.
But the down sides here are:
- Very heavy-weight meta-programming is needed to provide this base class.
- Requires a pretty awful API for accessing the dependencies.
Ultimately, I fear it will not pull its weight. But we can re-evaluate
this at any point if we start discovering consistent problems where the
invalidation and dependencies get out of sync. It will fit as a clean
layer on top of the facilities in this patch that we can add if and when
we need it.
Note that I'm not really thrilled with the names for these APIs... The
name "Invalidator" seems ok but not great. The method name "invalidate"
also. But I've not come up with any better naming patterns.
I'm working on the actual fixes to various analyses that need to use these, but
I want to try to get tests for each of them so we don't regress. And those
changes are seperable and obvious so once this goes in I should be able to roll
them out throughout LLVM.