Page MenuHomePhabricator

[BasicAA] Handle recursive queries more efficiently (NFCI)
Needs ReviewPublic

Authored by nikic on Oct 24 2020, 3:55 AM.

Details

Summary

An alias query currently works out roughly like this:

  • Look up location pair in cache.
  • Perform BasicAA logic (including cache lookup and insertion...)
  • Perform a recursive query using BestAAResults.
    • Look up location pair in cache (and thus do not recurse into BasicAA)
    • Query all the other AA providers.
  • Query all the other AA providers.

This is a lot of unnecessary work, all ultimately caused by the BestAAResults query at the end of aliasCheck(). The reason we perform it, is that aliasCheck() is getting called recursively, and we of course want those recursive queries to also make use of other AA providers, not just BasicAA. We can solve this by making the recursive queries directly use BestAAResults (which will check both BasicAA and other providers), rather than recursing into aliasCheck().

There are some tradeoffs:

  • We can no longer pass through the precomputed underlying object to aliasCheck(). This is not a major concern, because nowadays getUnderlyingObject() is quite cheap.
  • Results from other AA providers are no longer cached inside BasicAA. The way this worked was already a bit iffy, in that a result could be cached, but if it was MayAlias, we'd still end up re-querying other providers anyway. If we want to cache non-BasicAA results, we should do that in a more principled manner.

In any case, despite those tradeoffs, this works out to be a decent compile-time improvment: https://llvm-compile-time-tracker.com/compare.php?from=1a7a9efec3cf872dcf3e1a6e6fe39e797c4d9fc6&to=93127a4d7350261ed9ee2ccaa9c369eda2b60198&stat=instructions I think it also simplifies the mental model of how BasicAA works. It took me quite a while to fully understand how these things interact.

Diff Detail

Event Timeline

nikic created this revision.Oct 24 2020, 3:55 AM
Herald added a project: Restricted Project. · View Herald TranscriptOct 24 2020, 3:55 AM
nikic requested review of this revision.Oct 24 2020, 3:55 AM

This looks like a nice approach to clean up the recursion, thank you for working on this!

The issue I'd like to bring up is what is the impact on the actual AA results provided with this change. For compile time this is a clear gain, even if small.
But for run time, I'm seeing a regression of 22% in singlesource/dhrystone_fldry (optimized build on haswell), and a 31% on another non-public benchmark.
I'm also seeing a few nice gains, which may offset these regressions. So I'm trying to understand here, what's the difference in processing that leads to these discrepancies?

nikic added a comment.Wed, Nov 4, 9:56 AM

The issue I'd like to bring up is what is the impact on the actual AA results provided with this change. For compile time this is a clear gain, even if small.
But for run time, I'm seeing a regression of 22% in singlesource/dhrystone_fldry (optimized build on haswell), and a 31% on another non-public benchmark.
I'm also seeing a few nice gains, which may offset these regressions. So I'm trying to understand here, what's the difference in processing that leads to these discrepancies?

Thanks for running those tests! I'm trying to reproduce the Dhrystone fldry results, but wasn't able to produce a binary hash difference before/after this patch after trying various flag combinations (including -fexperimental-new-pass-manager and -farch=haswell). Would it be possible to share the CMake configuration that is being used?

nikic added a comment.Sun, Nov 8, 7:18 AM

I've been staring at this code for a while, but can't spot the pathway which would produce different results. At least under the assumption that BasicAA is the first AA provider, which is the case for standard pipelines.

I've also tried to collect some AliasResults stats (using this hack: https://gist.github.com/nikic/101258f96b9973d63ab2fb92acbca028) in the hope that I can find a difference somewhere, but came up empty.

nikic updated this revision to Diff 305828.Tue, Nov 17, 9:27 AM

Strip pointer casts in GlobalsModRef. This *might* be the cause for observed differences, as previously the call from BasicAA would have been performed with casts stripped.