This is an archive of the discontinued LLVM Phabricator instance.

[SCEV] Remove the walk of the entire expression subgraph on every lookup of a cached SCEV for a particular value.
Needs ReviewPublic

Authored by chandlerc on Feb 9 2017, 2:08 PM.
This revision needs review, but there are no reviewers specified.

Details

Reviewers
None
Summary

This was added waaaaay back in r185843 to handle the issue where
a SCEVUnknown wrapped around an IR value is deleted when the IR value is
deleted. It becomes null and makes the entire expression subgraph
containing it invalid. The fix was to check for these null nodes in the
subgraph on every query because SCEVs don't have uselists to update.
Unfortunately, this means querying every SCEV in a large graph is
trivially quadratic. Bad news. A great example of this is the unittest
ScalarEvolutionTests::SCEVAddExpr which becomes quite slow with a larger
number of expressions. When the loop was at 1000, the test took well
over 20 seconds for me.

Unfortunately, we *really* don't want to add uselist based invalidation
here because it will make removing instruction walk their subgraphs!
Especially considering that a common pattern is to delete the IR and
then forget the loop, this would be a big waste.

So this patch introduces a different approach. We add a generation count
to ScalarEvolution, and for composite expression SCEVs (SCEVExpr base
class in this patch, but a better name would be welocme) we track the
generation count at which the SCEV was valid. Then, each time the value
handle invalidates a SCEVUnknown node at the leaves of a subgraph it
also moves the generation count forward. When we validate, we check the
subgraphs with old generation counts, *and update the generation count
where valid*. We also clear out the part of the subgraph that we
trivially know is invalid rather than leaving it around to consume
memory. This essentially allows the validation of a subgraph to be
cached *between* queries to SCEVs.

Now, as long as the deletion of the IR and the queries to SCEV are
*batched*, the quadratic component falls away. This seems very likely to
be true in practice and certainly is true in the unittest. That unittest
with the loop size set to 1000 is over 5x faster with this patch.
Naturally, I can magnify this difference by making the size of the graph
larger. =]

When discussing this, there were some initial concerns around memory
overhead, but because this is isolated in compound expressions,
I suspect it will be fine. The largest overhead is likely from the unary
cast expressions, but even there it seems likely to be a reasonable cost
given the other things cached for every SCEV.

Unfortunately (but maybe fortunately?), this makes SCEV *much* more strict
about caching SCEVs across IR mutations because that can cause it to build new
expressions out of these cached SCEVs with skewed generations. The only failure
of this kind in 'check-llvm' was loop reroll which I've fixed here to forget
the loop after it deletes tons of instructions and not cache things across that
deletion.

However, if you run with this patch and assertions across the test suite, the
unroller also triggers the assert that catches skewed generations when building
new expresisons. This one is especially concerning as it appears to happen
before the unroller modifies any IR, making it look like some other pass forgot
to call forgetLoop? Unclear. It's also generally unclear whether we want this
degree of strictness, or if we don't, how/when we should re-validate cached
SCEVs.

I'm sadly out of time to keep hacking on this, but I wanted to send it out in
case others are interested in pushing this forward.

Event Timeline

chandlerc created this revision.Feb 9 2017, 2:08 PM