Page MenuHomePhabricator

[BasicAA] Enable -basic-aa-recphi by default
ClosedPublic

Authored by dmgreen on Jul 1 2020, 2:43 PM.

Details

Summary

This option was added a while back, to help improve AA around pointer phi loops. It looks for phi(gep(phi, const), x) loops, checking if x can then prove more precise aliasing info. It should not have an effect on compile time, only adding a couple of extra checks to existing code.

Requires D82987.

Diff Detail

Event Timeline

dmgreen created this revision.Jul 1 2020, 2:43 PM
Herald added a project: Restricted Project. · View Herald TranscriptJul 1 2020, 2:43 PM
Herald added a subscriber: hiraditya. · View Herald Transcript
dmgreen updated this revision to Diff 274933.Jul 1 2020, 3:05 PM

Fixup basic-aa argument.

fhahn added a comment.Jul 8 2020, 6:19 AM

Do you have any stats on how often this tiggers?

It should be triggering relatively often, from looking at the tests/benchmarks I was seeing. Any code that uses pointer induction variables can trigger it. Whether that will common will be a matter of the style of the original code, and whether it actually proves move NoAlias will be variable. But it seems to come up enough to make code improvements in a fair number of cases.

From what I can tell it should be almost free though, in terms of compile time. Just an extra check of whether a phi is a simple recursion.

It should be triggering relatively often, from looking at the tests/benchmarks I was seeing. Any code that uses pointer induction variables can trigger it. Whether that will common will be a matter of the style of the original code, and whether it actually proves move NoAlias will be variable. But it seems to come up enough to make code improvements in a fair number of cases.

From what I can tell it should be almost free though, in terms of compile time. Just an extra check of whether a phi is a simple recursion.

Baseline checks are generally self hosting and running the test suite. Make sure that everything looks clean. Check for regressions (compile time or execution time). Have you already done that? Also, checking on how many times this change triggers during that process is useful information.

Thanks for the info. That's very useful.

I think I need to find a way to get better compile time numbers. Whenever I try to run them they come up so noisy as to be almost useless. I am in the very lucky position where all of the benchmarks I run are bare-metal and deterministic, so I can very accurately get performance numbers with zero noise in them (at least of the "I run the same binary twice and get the same result" kind). Those benchmarks showed this to be a improvement, and codesize test we run at -Oz where essentially flat with some small ups and downs.

I had run the llvm-test-suite for testing, but not for performance. That was what pointed at the problems in D82987. Luckily there was a gcc torture test that pointed out concisely what the problem was without having to go find out which part of cjpeg was broken. With that fixed the rest of the suite passed, along with a bootstrap. I also ran some csmith testing overnight some times back with a this and a couple of other patches in it, but that didn't show any problems even when D82987 wasn't fixed, so probably doesn't mean much.

I can try and get some statistics. I usually trust the "zero noise" benchmarks more, due to showing performance more directly than statistics or noisy results usually do. But I acknowledge that they may be different to the test-suite. Plus I guess for basic-aa checking the number of noalias vs mayalias calls can be a more direct indication of how AA on it's own is performing.

hfinkel accepted this revision.Jul 9 2020, 4:31 AM

Given that compile time looks okay (and it's clearly doing something, as its causing more simplification (or, at least, fewer instructions)), as does correctness, this LGTM. If we notice any significant performance problems, we can revert (but given that we're relatively close to branching, we should commit this sooner rather than later to give the maximum opportunity to spot such things).

This revision is now accepted and ready to land.Jul 9 2020, 4:31 AM

Thanks.

I had started collecting some statistics. It appears this patch lowers the number of calls to alias(..) by 0.8%, which matches up with what @nikic was seeing. The percentage of calls that end up as No/Must/Partial alias goes up from 65.7% to 66%. Which probably doesn't tell you a lot on it's own considering all the things that could be going on, but it's in the right direction.

Like I said, my benchmarks show this to be an improvement, and it should give better aliasing info. If it does cause problems for anyone I am of course happy to take look.

This revision was automatically updated to reflect the committed changes.
fhahn added a comment.Jul 9 2020, 7:00 AM

Thanks.

I had started collecting some statistics. It appears this patch lowers the number of calls to alias(..) by 0.8%, which matches up with what @nikic was seeing. The percentage of calls that end up as No/Must/Partial alias goes up from 65.7% to 66%. Which probably doesn't tell you a lot on it's own considering all the things that could be going on, but it's in the right direction.

Like I said, my benchmarks show this to be an improvement, and it should give better aliasing info. If it does cause problems for anyone I am of course happy to take look.

yeah that should be fine, the main thing I was worried about is compile-time here, but it sounds like we are good there.

Hi,

I noticed that we got a runtime failure with this patch for our out-of-tree target. I haven't gone to the bottom with it yet, but I saw that a colleague of mine wrote a PR about a miscompile with -basic-aa-recphi that at least hasn't been handled according to bugzilla:
https://bugs.llvm.org/show_bug.cgi?id=37952

Thanks. I will take a look.

From this, that bug appears to be fixed: https://godbolt.org/z/6W8hc8. Probably from D82987 I would expect. I will check. Opt 10.0.0 does show the problem, where as trunk seems to be fixed.

Let me know about the runtime failure.

Thanks. I will take a look.

From this, that bug appears to be fixed: https://godbolt.org/z/6W8hc8. Probably from D82987 I would expect. I will check. Opt 10.0.0 does show the problem, where as trunk seems to be fixed.

Let me know about the runtime failure.

Ok thanks! We'll dig further in the new error.

hans added a subscriber: hans.Jul 10 2020, 3:23 AM

We're seeing a runtime error in Chromium too, tracked at https://bugs.chromium.org/p/chromium/issues/detail?id=1103818

I'll try to provide more details shortly.

OK Thanks. I'll revert the patch and we can see what's up in the reproducers.

bjope added a subscriber: bjope.Jul 10 2020, 5:43 AM

Here is a reproducer for the new problem that @uabelho noticed. Might be possible to reduce it further, but I think it at least show the problem:

; opt -basic-aa-recphi=0 -gvn -o - -S
; opt -basic-aa-recphi=1 -gvn -o - -S

declare i16 @myprintf(i32)

define i16 @main(i16 %argc.5.par, i16** nocapture readnone %argv.6.par) {
  %int_arr.10 = alloca [3 x i16], align 1
  %_tmp1 = getelementptr inbounds [3 x i16], [3 x i16]* %int_arr.10, i16 0, i16 2
  br label %bb1

bb1:                                              ; preds = %bb1, %0
  %i.7.0 = phi i16 [ 2, %0 ], [ %_tmp5, %bb1 ]
  %ls1.9.0 = phi i16* [ %_tmp1, %0 ], [ %_tmp7, %bb1 ]
  store i16 %i.7.0, i16* %ls1.9.0, align 1
  %_tmp5 = add nsw i16 %i.7.0, -1
  %_tmp7 = getelementptr i16, i16* %ls1.9.0, i16 -1
  %_tmp9 = icmp sgt i16 %i.7.0, 0
  br i1 %_tmp9, label %bb1, label %bb3

bb3:                                              ; preds = %bb1
  %_tmp11 = getelementptr inbounds [3 x i16], [3 x i16]* %int_arr.10, i16 0, i16 1
  %_tmp12 = load i16, i16* %_tmp11, align 1
  %_tmp13 = sext i16 %_tmp12 to i32
  %_tmp16 = call i16 @myprintf(i32 %_tmp13)
  %_tmp18.not = icmp eq i16 %_tmp12, 1
  br i1 %_tmp18.not, label %bb5, label %bb4

bb4:                                              ; preds = %bb3
  ret i16 1

bb5:                                              ; preds = %bb3, %bb4
  ret i16 0
}

See https://godbolt.org/z/j59xxh for the diff.

Thanks. I was just trying to come up with a reproducer that used negative gep constants! But the example I was trying didn't seem to be showing the same problem. It does make sense that could cause problems though.

I reverted the option as rGe1135b486aae and will try to put together a fix for the issue.

hans added a comment.Jul 10 2020, 7:45 AM

Apologies for not having a better repro of the Chromium/libpng problem. Once there's a fix, I'd be happy to try it out in our build if you'd like.

Not a problem. These problems can get very complex in real code.

I have put up D83576 to hopefully fix the issue, but won't attempt to turn the option back on until after the new branch gets created.

hans added a comment.Mon, Jul 13, 9:37 AM

Not a problem. These problems can get very complex in real code.

I have put up D83576 to hopefully fix the issue, but won't attempt to turn the option back on until after the new branch gets created.

I tried out D83576 on top of af839a96187e3538d63ad57571e4bdf01e2b15c5 (the flag enabling), but the miscompile in Chromium still remains.

I identified the .o file that changes in a bad way and put together steps to reproduce here: https://bugs.chromium.org/p/chromium/issues/detail?id=1103818#c12

OK Thanks. I'll take a look. That reproducer looks pretty useful.

I was thinking about trying to reproduce a full chromium build. But hadn't looked yet. I'll try and see what is wrong here in the smaller case first.

D83576 is now in. The other issue in chromium looks like an incorrect use of restrict causing the compiler to now exploit some undefined behaviour in their optimizations to zlib. @hans has some more details in the chromium ticket above.

I'll see what I can do about that case, run some benchmarks and see how best to fix it. I will try and make sure that is fixed before this is re-committed.

Thanks to Adenilson and Hans I believe chromium has now fixed the issue their end. I have also ran another bootstrap and the llvm-test-suite, which are both still doing OK.

If no-one has objections, I'll give enabling the option another go.

OK. Lets give this another go. There is one additional test update where we manage to convert a loop to a memcpy call.

Same rules as before - let me know if this still causes any problems.

bjope added a comment.Tue, Aug 4, 3:51 AM

Thanks for the fixes Dave. Will make sure that we start using this in our (downstream) fuzzy testing again.

Sounds good. Let me know if you find anything.