This is an archive of the discontinued LLVM Phabricator instance.

[MemorySSA] Fix exponential compile-time updating MemorySSA.
ClosedPublic

Authored by efriedma on Mar 20 2018, 3:47 PM.

Details

Summary

MemorySSAUpdater::getPreviousDefRecursive is a recursive algorithm, for each block, it computes the previous definition for each predecessor, then takes those definitions and combines them. But currently it doesn't remember results which it already computed; this means it can visit the same block multiple times, which adds up to exponential time overall.

To fix this, this patch adds a cache. If we computed the result for a block already, we don't need to visit it again because we'll come up with the same result. Well, unless we RAUW a MemoryPHI; in that case, the TrackingVH will be updated automatically.

I'm not sure this is the best fix, but it seems to work.

The testcase isn't really a test for the bug, but it adds coverage for the case where tryRemoveTrivialPhi erases an existing PHI node. (It's hard to write a good regression test for a performance issue.)

Diff Detail

Repository
rL LLVM

Event Timeline

efriedma created this revision.Mar 20 2018, 3:47 PM

So, staring at https://pp.info.uni-karlsruhe.de/uploads/publikationen/braun13cc.pdf, i don't see how we can get exponential time, and thus wonder if i screwed this up somehow.
I can see we walk blocks to find the defs and they don't have to (because of how they use it), but that should only add a factor of O(N).

In particular, i believe each block should only ever be visited twice.

In particular, they should have the same time bounds as us, if both algorithms operate on blocks that each contains 1 instruction.
If the exponential factor comes from repeated walking, they should be just as exponential
Thus, my assumption is we screwed up.

(but otherwise, if we discover the paper is wrong/etc, this change looks reasnable)

I think the problem comes from the way we implement the marker variation.

In the main algorithm, readVariableRecursive only calls itself recursively in the case where a block has one predecessor. In this case, it's theoretically possible to visit a block an arbitrary number of times if you have a deeply nested if statement, but that's at worst quadratic in the number of nested if statements. (I'm not sure the complexity analysis in the paper accounts for this correctly, but it probably doesn't have much practical impact.)

The exponential-time problem comes from the additional recursive calls introduced by the marker variation, I think. The variation isn't fully described in the paper, but adding recursive calls for blocks with multiple predecessors makes the algorithm exponential if you don't use some sort of cache to stop the recursion.

dberlin requested changes to this revision.Mar 21 2018, 2:59 PM

(there is a longer report version that describes the marker variation in detail that is escaping me at the moment. There is also code that implements it in clang).

So staring, i think the main difference is in fact, like you say, that they have an effective cache and we don't.
In particular, the marker variation code looks at the variable map before continuing to traverse blocks, acting exactly as the cache you've implemented. It stores the ongoing results in writeVariable (which we don't have) to avoid having to find them again.
(I didn't notice this at first because the coding style is truly horrible).

So, IMHO, what you've done seems exactly right.

We should be updating the CachedPreviousDef for phis we insert/remove as well.
(IOW, it should be updated where you see the writevariable calls in the algorithm, i believe)

This revision now requires changes to proceed.Mar 21 2018, 2:59 PM
efriedma updated this revision to Diff 139654.Mar 23 2018, 2:37 PM

Update the MemoryAccess cache for blocks with multiple predecessors.

dberlin accepted this revision.Mar 23 2018, 3:18 PM

LGTM, thanks!

(I do wonder if we shouldn't add a unit test to the MemorySSAUpdater unit tests that just has 100 nested if or something that will take forever with exponential time but be fine with N^2 time, but otherwise, ...)

This revision is now accepted and ready to land.Mar 23 2018, 3:18 PM
This revision was automatically updated to reflect the committed changes.