This is an archive of the discontinued LLVM Phabricator instance.

[clang][NewPM] Add RUNS for tests that produce slightly different IR under new PM
AbandonedPublic

Authored by leonardchan on Jun 11 2019, 4:23 PM.

Details

Summary

With the new pass manager enabled by default, some tests produce slightly different IR from the legacy PM. This patch just adds separate new PM runs and checks for the new PM IR.

This fixes:

Clang :: CodeGen/avx512-reduceMinMaxIntrin.c
Clang :: CodeGen/avx512f-builtins.c
Clang :: CodeGen/avx512vl-builtins.c
Clang :: CodeGen/avx512vlbw-builtins.c
Clang :: CodeGenOpenCL/convergent.cl

For CodeGenOpenCL/convergent.cl, the new PM produced a slightly different for loop, but this still checks for no loop unrolling as intended.

For the avx-builtins tests, the new PM would produce extra bitcasts that did not affect the overall logic of the function, but I do not know a simple way to get rid of these bitcasts.

For CodeGen/avx512-reduceMinMaxIntrin.c, temporary local variables are produced in the codegen, but the allocas are out of order and the tests were pretty strict on order even though the logic/use of these variables was the same. Nearly all checks in this large file were broken, so I was hoping we could run it with legacy only for now and come back to it later to make this easier.

Diff Detail

Event Timeline

leonardchan created this revision.Jun 11 2019, 4:23 PM

This really confused me. We shouldn't be seeing this kind of difference in the new PM.

But I think I figured it out.

Both PMs have to run *some* inliner at -O0. This is because we need to inline always_inline functions. The new PM has a *super* simple (and fast compile-time wise) inliner designed for this. It's really good at its job. The old PM runs the normal inliner. This thing isn't simple at all. It does complex stuff to incrementally simplify code while inlining because that can have a huge effect on the overall efficiency of the compiler *when doing full optimization passes*. But we're not doing that.

The result is that because all of these tests check code that is emitted by always_inline functions in our builtin headers, and then *inlined* into the test functions, the inliner in the old pass manager did a bunch of "cleanup" of the code that the new PM doesn't do.

Honestly, I think the new PM's behavior is correct for -O0, but I think it makes these tests less easy to understand. I think we should instead change the command running here, and when we do I think we will find a set of checks that work for both PMs.

When we are willing to do something like run opt in addition to clang (the convergent.cl case), instead of just doing -instnamer we should also do -instcombine. I suspect this will remove the differences.

When we're just directly using %clang_cc1 I suspect that rather than checking the completely unoptimized code we should check with -O1 to make the IR much more stable and easy to inspect. It should also result in identical results from the two PMs.

clang/test/CodeGenOpenCL/convergent.cl
2

Probably want to use opt -passes=instnamer or some such to use the new PM in the opt invocation as well...

This comment was removed by leonardchan.

For the avx tests, I don't suppose you know a simple way to generate these tests? They're about 10k lines long and it's taking a while to go through it by hand to replace the current IR checks with codegen for -O1. (Perhaps something along the lines of utils/update_llc_test_checks.py in llvm?)

leonardchan marked an inline comment as done.
leonardchan added a reviewer: craig.topper.

OK so I instead wrote a script that generated the tests for me since changing each individual case by hand was painful. All the -cc1 tests now match, although I don't know any ways of simplifying the IR further to help reduce test size. I would definitely love to know if there are any ways to simplify the instructions. Unfortunately for convergent.cl, the codegen still differs slightly around the end of the loop when using -instcombine in addition to instnamer:

< for.body:                                         ; preds = %for.body, %entry
<   %i.03 = phi i32 [ 0, %entry ], [ %inc, %for.body ]
---
> for.body:                                         ; preds = %for.body.for.body_crit_edge, %entry
>   %inc.phi = phi i32 [ 1, %entry ], [ %inc.1, %for.body.for.body_crit_edge ]
92,94c92,97
<   %inc = add nuw nsw i32 %i.03, 1
<   %cmp = icmp ult i32 %inc, 10
<   br i1 %cmp, label %for.body, label %for.cond.cleanup
---
>   %cmp = icmp ult i32 %inc.phi, 10
>   br i1 %cmp, label %for.body.for.body_crit_edge, label %for.cond.cleanup
> 
> for.body.for.body_crit_edge:                      ; preds = %for.body
>   %inc.1 = add nuw nsw i32 %inc.phi, 1
>   br label %for.body

Also adding @craig.topper in case he as any insight on the updated tests.

clang/test/CodeGenOpenCL/convergent.cl
2

Unfortunately there doesn't seem to be a new PM equivalent for instnamer (but one does exist for instcombine). For simplicity, should we stick with using the legacy driver for now?

OMG, I'm so sorry, I had no idea that the tests would explode like that... Yeah, I don't think that's useful....

Maybe a better approach is to just explicitly run the code through opt -passes=instsimplify before handing it to FileCheck? That should produce almost identical results to the old PM's inliner?

Feel free to just try this out and report back -- if it isn't looking good, don't invest a bunch of time in it, we should talk to Craig and figure out what the right long-term strategy here is.

OMG, I'm so sorry, I had no idea that the tests would explode like that... Yeah, I don't think that's useful....

Maybe a better approach is to just explicitly run the code through opt -passes=instsimplify before handing it to FileCheck? That should produce almost identical results to the old PM's inliner?

Feel free to just try this out and report back -- if it isn't looking good, don't invest a bunch of time in it, we should talk to Craig and figure out what the right long-term strategy here is.

Hmm so -instsimplify and combinations of other passes like intsnamer and instcombine don't seem to clean this up any further. The IR still differs between PMs (at -O0), and I can provide the diffs if necessary.

In the short term, I imagine an "ok" solution for now would be to just run the original tests with -fno-experimental-new-pas-manager, and either have a separate file with these new PM -O1 checks at or separate runs in the same file to prevent regressions. I emailed Craig for his insight and info on these tests.

For now I reverted back to an early diff for this with the -O0 tests only.

chandlerc accepted this revision.Jun 20 2019, 7:41 PM

Eh, this seems close enough now. I'd like a better approach for the x86 builtins, but no idea what it will end up being.

This revision is now accepted and ready to land.Jun 20 2019, 7:41 PM

I'm going to try to work on the X86 tests. Can we hold off on committing those?

I'm going to try to work on the X86 tests. Can we hold off on committing those?

Ok. For now I'll just commit the convergent.cl test separately.

leonardchan abandoned this revision.Jul 29 2019, 12:15 PM

Replaced with D65110