This is an archive of the discontinued LLVM Phabricator instance.

[BasicAA] Add depth limit
Changes PlannedPublic

Authored by nikic on Feb 13 2021, 7:04 AM.

Details

Summary

This patch adds a depth limit for the recursive alias analysis queries in BasicAA. Unfortunately, this is pretty tricky, because we need to think about how this interacts with BatchAA. Just doing a naive cutoff means that BatchAA can produce worse or better results depending on the order of queries, which is something we want to avoid. For this reason, this patch tracks the starting depth and maximum depth for cache entries, to determine whether a cached result can be used at a given depth.

For the depth limit of 8 used here, we get the following stats differences at O3:

aa.NumMayAlias | 5010256 | 5010623
aa.NumNoAlias | 27200820 | 27200306

This is also a minor compile-time regression: https://llvm-compile-time-tracker.com/compare.php?from=20cb6c7cebb5a3566640a9bf0da4729993ce6020&to=5e9effe146ed5d2a3934484db2493ffd66b0b5df&stat=instructions You can also see that with ThinLTO mafft regresses (0.87% codesize increase).

This addresses https://bugs.llvm.org/show_bug.cgi?id=49151, where BasicAA recurses enough to overflow the stack. Possibly D92401 could be reapplied, as it ran into exponential complexity in some cases.

Diff Detail

Event Timeline

nikic created this revision.Feb 13 2021, 7:04 AM
nikic requested review of this revision.Feb 13 2021, 7:05 AM
Herald added a project: Restricted Project. · View Herald TranscriptFeb 13 2021, 7:05 AM

Since this is (only?) about avoiding stack overflows (with which stack size?),
i'd think we should go with some much safer and less controversial value;
16, and even 64 should be plenty good still.

nikic added a comment.Feb 13 2021, 1:29 PM

@lebedev.ri As mentioned in the summary, this is not just about stack overflows, but also about exponential queries, with an eye on D92401. The restriction on phi-of-phis without PhiValues analysis is also something that may be relaxed if we have sensible cutoffs. I would expect those things to be much more valuable in terms of analysis power than the ability to handle very deep queries.

Though now that I think on it, it might make more sense to restrict the total number of recursive queries rather than the query depth in particular, which could allow either a deep linear query, or a shallow branching one.

If we want to address stack overflows only (which may well make sense as an intermediate step), then it would probably be good enough to introduce a cutoff at a very high depth (like 512) and not bother with BatchAA consistency at all. The problematic test case reaches a maximum depth of 2507, which is a bit excessive :)

nikic planned changes to this revision.Feb 14 2021, 2:32 PM

I plan to experiment a bit more in this area.

I tested this with 512 threshold and I'm only seeing a notable runtime regression in the testsuite multisource in MallocBench. I would not block on that, the stack overflow issue is a larger concern.

nikic added a comment.Feb 18 2021, 2:22 PM

I've opened D96996 to just add a simple 512 depth limit, without all the extra complexity in this patch. That provides a quick fix for the immediate issue, while leaving a more comprehensive limit for later.

reames added a subscriber: reames.Feb 25 2021, 6:43 PM

Though now that I think on it, it might make more sense to restrict the total number of recursive queries rather than the query depth in particular, which could allow either a deep linear query, or a shallow branching one.

+1 to this thought. Are you working on a patch in this direction? I'm interested in seeing this resolved so that D92401 can be reapplied. I've got a test case that needs something along the lines of D92401 to get decent results, and I don't see any good workarounds. If you need help or motivated review, let me know.

nikic added a comment.Feb 26 2021, 1:17 PM

While changing this to limit number of recursive queries instead of depth, I ran into some issues trying to keep BatchAA consistency. My assumption was that if a recursive query hits the limit and returns MayAlias, the whole query will be MayAlias. However, while this is true for BasicAA in isolation, it isn't true when combined with e.g. GlobalsAA. This means we can't accurately emulate the result of a query as if it weren't cached, or at least I don't see any obvious way. Possibly we need to relax this to "BatchAA should not produce worse results than AA" rather than "BatchAA should produce the same results as AA".