Page MenuHomePhabricator

[BasicAA] Handle recursive queries more efficiently (NFCI)
ClosedPublic

Authored by nikic on Oct 24 2020, 3:55 AM.

Details

Summary

An alias query currently works out roughly like this:

  • Look up location pair in cache.
  • Perform BasicAA logic (including cache lookup and insertion...)
  • Perform a recursive query using BestAAResults.
    • Look up location pair in cache (and thus do not recurse into BasicAA)
    • Query all the other AA providers.
  • Query all the other AA providers.

This is a lot of unnecessary work, all ultimately caused by the BestAAResults query at the end of aliasCheck(). The reason we perform it, is that aliasCheck() is getting called recursively, and we of course want those recursive queries to also make use of other AA providers, not just BasicAA. We can solve this by making the recursive queries directly use BestAAResults (which will check both BasicAA and other providers), rather than recursing into aliasCheck().

There are some tradeoffs:

  • We can no longer pass through the precomputed underlying object to aliasCheck(). This is not a major concern, because nowadays getUnderlyingObject() is quite cheap.
  • Results from other AA providers are no longer cached inside BasicAA. The way this worked was already a bit iffy, in that a result could be cached, but if it was MayAlias, we'd still end up re-querying other providers anyway. If we want to cache non-BasicAA results, we should do that in a more principled manner.

In any case, despite those tradeoffs, this works out to be a decent compile-time improvment: https://llvm-compile-time-tracker.com/compare.php?from=1a7a9efec3cf872dcf3e1a6e6fe39e797c4d9fc6&to=93127a4d7350261ed9ee2ccaa9c369eda2b60198&stat=instructions I think it also simplifies the mental model of how BasicAA works. It took me quite a while to fully understand how these things interact.

Diff Detail

Event Timeline

nikic created this revision.Oct 24 2020, 3:55 AM
Herald added a project: Restricted Project. · View Herald TranscriptOct 24 2020, 3:55 AM
nikic requested review of this revision.Oct 24 2020, 3:55 AM

This looks like a nice approach to clean up the recursion, thank you for working on this!

The issue I'd like to bring up is what is the impact on the actual AA results provided with this change. For compile time this is a clear gain, even if small.
But for run time, I'm seeing a regression of 22% in singlesource/dhrystone_fldry (optimized build on haswell), and a 31% on another non-public benchmark.
I'm also seeing a few nice gains, which may offset these regressions. So I'm trying to understand here, what's the difference in processing that leads to these discrepancies?

nikic added a comment.Nov 4 2020, 9:56 AM

The issue I'd like to bring up is what is the impact on the actual AA results provided with this change. For compile time this is a clear gain, even if small.
But for run time, I'm seeing a regression of 22% in singlesource/dhrystone_fldry (optimized build on haswell), and a 31% on another non-public benchmark.
I'm also seeing a few nice gains, which may offset these regressions. So I'm trying to understand here, what's the difference in processing that leads to these discrepancies?

Thanks for running those tests! I'm trying to reproduce the Dhrystone fldry results, but wasn't able to produce a binary hash difference before/after this patch after trying various flag combinations (including -fexperimental-new-pass-manager and -farch=haswell). Would it be possible to share the CMake configuration that is being used?

nikic added a comment.Nov 8 2020, 7:18 AM

I've been staring at this code for a while, but can't spot the pathway which would produce different results. At least under the assumption that BasicAA is the first AA provider, which is the case for standard pipelines.

I've also tried to collect some AliasResults stats (using this hack: https://gist.github.com/nikic/101258f96b9973d63ab2fb92acbca028) in the hope that I can find a difference somewhere, but came up empty.

nikic updated this revision to Diff 305828.Nov 17 2020, 9:27 AM

Strip pointer casts in GlobalsModRef. This *might* be the cause for observed differences, as previously the call from BasicAA would have been performed with casts stripped.

Hello. I tried running our downstream benchmarks with this patch and it did not appear to have any effect, either in performance or codesize. (That doesn't mean that nothing is effected, but it's at least a good sign).

asbirlea accepted this revision.Wed, Jan 13, 4:15 PM

lgtm.
Runtime impact results look good.

This revision is now accepted and ready to land.Wed, Jan 13, 4:15 PM

Thank you @dmgreen and @asbirlea for testing!

This revision was automatically updated to reflect the committed changes.
rnk added a subscriber: rnk.Fri, Jan 15, 12:30 PM

I reverted this, it caused crashes while compiling harfbuzz in Chromium: https://bugs.chromium.org/p/chromium/issues/detail?id=1167305

I have the reproducer and am starting creduce soon.

rnk added a comment.Fri, Jan 15, 12:56 PM

CReduce on plain C code is lightning fast, no template goo grind away. I got this reduced test case that crashes when this patch is applied:

char b, c;
void d(char *e, char *__restrict h, long j) {
  char *a = e + j;
  for (; e < a; e++, h++)
    b = *e, *e = *h;
}
int f;
char *g;
void m(long e) {
  char *i, *k;
  char l;
  g = i = k = &l;
  while (k)
    for (; i < k;) {
      if (i)
        d(g, i, e);
      g += e;
      if (f) {
        d(k, &c, e);
        i += e;
      }
    }
}

Compile like so:
clang -cc1 -triple aarch64-unknown-linux-android21 -O3 -emit-llvm hb-ot-map-6508fc.reduced.cpp

nikic added a comment.Fri, Jan 15, 1:06 PM

@rnk Thanks the the reproducer! Unfortunately I'm not seeing the crash. I'm at a1be47b4771467998d7549dcd1b9f9cebdaa9af9 directly before your revert using a Release+Asserts build.

rnk added a comment.Fri, Jan 15, 1:14 PM

I see. Maybe ASLR matters? I have the full stack trace if it helps:
https://reviews.llvm.org/P8253

nikic added a comment.Fri, Jan 15, 1:26 PM

@rnk Sorry, I got my binaries mixed up again... I can reproduce the crash now.

nikic added a comment.Sun, Jan 17, 1:39 AM

I've fixed the issue in b1c2f1282a237e9bc60f1b0020bc7535ca019739 and reapplied this change. The problem was introduced by D91936 and exposed here.