This is an archive of the discontinued LLVM Phabricator instance.

[clang][NewPM] Add new pass manager RUN lines to avx512f-builtins.c
AbandonedPublic

Authored by craig.topper on Jun 20 2019, 10:53 PM.

Details

Summary

This is split from D63174 to see what it takes to get both pass managers to do the same thing

I've disable -O0 optnone and ran instsimplify on both outputs. This seems to get us converged with some test updates. The only thing I don't like is the tests of true and false comparison predicates with masking. The only IR we ended up with is just loads and stores. The cmp and AND instruction are folded out by instsimplify.

I'll do more tests from D63174 if we think this is a decent direction

Diff Detail

Event Timeline

craig.topper created this revision.Jun 20 2019, 10:53 PM
craig.topper edited the summary of this revision. (Show Details)
craig.topper edited subscribers, added: cfe-commits; removed: llvm-commits.Jun 20 2019, 11:53 PM

Any updates on this? I'm thinking that in the meantime maybe we could commit D63174 and work on this while that lands. If so, we could get an upstream new PM buildbot that can catch any new PM regressions.

I skimmed D63174 but haven't applied either of these patches to test locally, so I may not have the full picture.

IMO, we do not want clang regression tests running -instcombine/-instsimplify. That can cause clang tests to break when an underlying LLVM change is made. Forcing LLVM devs to depend on clang and fix the resulting breakage is backwards and unexpected extra work. This has happened to me several times.

As a compromise to the -O0 IR explosion, we do have precedent for running the optimizer's -mem2reg pass since that doesn't change frequently at this point.

And I haven't tried this, but we do have utils/update_cc_test_checks.py - this is supposed to take the manual labor out of generating assertions in the same way that we do in the optimizer and codegen regression tests with utils/update_test_checks.py and utils/update_llc_test_checks.py. Can you start with that and remove the irrelevant CHECK lines, so only the common/important lines remain? Or just use independent FileCheck '--check-prefixes'?

I skimmed D63174 but haven't applied either of these patches to test locally, so I may not have the full picture.

IMO, we do not want clang regression tests running -instcombine/-instsimplify. That can cause clang tests to break when an underlying LLVM change is made. Forcing LLVM devs to depend on clang and fix the resulting breakage is backwards and unexpected extra work.

+1

This has happened to me several times.

As a compromise to the -O0 IR explosion, we do have precedent for running the optimizer's -mem2reg pass since that doesn't change frequently at this point.

And I haven't tried this, but we do have utils/update_cc_test_checks.py - this is supposed to take the manual labor out of generating assertions in the same way that we do in the optimizer and codegen regression tests with utils/update_test_checks.py and utils/update_llc_test_checks.py. Can you start with that and remove the irrelevant CHECK lines, so only the common/important lines remain? Or just use independent FileCheck '--check-prefixes'?

That script has bitrot and is unusable last time i checked; everyone preferred to manually write broken checklines here :)

I skimmed D63174 but haven't applied either of these patches to test locally, so I may not have the full picture.

IMO, we do not want clang regression tests running -instcombine/-instsimplify. That can cause clang tests to break when an underlying LLVM change is made. Forcing LLVM devs to depend on clang and fix the resulting breakage is backwards and unexpected extra work. This has happened to me several times.

As a compromise to the -O0 IR explosion, we do have precedent for running the optimizer's -mem2reg pass since that doesn't change frequently at this point.

And I haven't tried this, but we do have utils/update_cc_test_checks.py - this is supposed to take the manual labor out of generating assertions in the same way that we do in the optimizer and codegen regression tests with utils/update_test_checks.py and utils/update_llc_test_checks.py. Can you start with that and remove the irrelevant CHECK lines, so only the common/important lines remain? Or just use independent FileCheck '--check-prefixes'?

I definitely agree running -instcombine would be bad since it can replace squences with other sequences. -instsimplify is a little less scary because our intrinsic tests shouldn't really have a lot of things that are trivially reducible. Though that may not be as true as I want it to be. The main issue we seemed to need -instsimplify for with the new pass manager is to merge redundant bitcasts. The inliner in the old pass manager seemed to do that itself, but the new pass manager's always inliner doesn't.

I skimmed D63174 but haven't applied either of these patches to test locally, so I may not have the full picture.

IMO, we do not want clang regression tests running -instcombine/-instsimplify. That can cause clang tests to break when an underlying LLVM change is made. Forcing LLVM devs to depend on clang and fix the resulting breakage is backwards and unexpected extra work. This has happened to me several times.

As a compromise to the -O0 IR explosion, we do have precedent for running the optimizer's -mem2reg pass since that doesn't change frequently at this point.

And I haven't tried this, but we do have utils/update_cc_test_checks.py - this is supposed to take the manual labor out of generating assertions in the same way that we do in the optimizer and codegen regression tests with utils/update_test_checks.py and utils/update_llc_test_checks.py. Can you start with that and remove the irrelevant CHECK lines, so only the common/important lines remain? Or just use independent FileCheck '--check-prefixes'?

I definitely agree running -instcombine would be bad since it can replace squences with other sequences. -instsimplify is a little less scary because our intrinsic tests shouldn't really have a lot of things that are trivially reducible. Though that may not be as true as I want it to be. The main issue we seemed to need -instsimplify for with the new pass manager is to merge redundant bitcasts. The inliner in the old pass manager seemed to do that itself, but the new pass manager's always inliner doesn't.

I think it could be that the new PM Inliner isn't added to the pipeline at -O0. It only seems to be added during optimized runs. @chandlerc might know if this was intentional or not. If so, perhaps these bitcasts are intended and the new PM is still doing its job in this case.

There's some inliner running because the intrinsics are implemented as always_inline functions and they are clearly being inlined in -O0. In a previous post, Chandler said the new PM has a special inliner for always_inline in -O0 and the old pass manager just used the normal inliner.

There's some inliner running because the intrinsics are implemented as always_inline functions and they are clearly being inlined in -O0. In a previous post, Chandler said the new PM has a special inliner for always_inline in -O0 and the old pass manager just used the normal inliner.

Oh I forgot that these were marked always_inline. Yes, this special inliner is the AlwaysInliner which is purposefully designed differently than the normal inliner in the legacy PM according to D23299. I'm proposing that we could perhaps just edit the tests to ignore the bitcasts since the different behavior is intended (the AlwaysInliner isn't doing extra work like combining these bitcasts). This way we can still check for the various intrinsics emitted without their IR instruction mappings getting optimized out, and we won't need to use instsimplify to make sure the IR matches.

Taking an example from my other patch, we'd have something like:

diff --git a/clang/test/CodeGen/avx512f-builtins.c b/clang/test/CodeGen/avx512f-builtins.c
index 15571b639b6..4ad63d73235 100644
--- a/clang/test/CodeGen/avx512f-builtins.c
+++ b/clang/test/CodeGen/avx512f-builtins.c
@@ -10479,7 +10479,7 @@ __m512i test_mm512_maskz_abs_epi64 (__mmask8 __U, __m512i __A)
   // CHECK: [[SUB:%.*]] = sub <8 x i64> zeroinitializer, [[A:%.*]]
   // CHECK: [[CMP:%.*]] = icmp sgt <8 x i64> [[A]], zeroinitializer
   // CHECK: [[SEL:%.*]] = select <8 x i1> [[CMP]], <8 x i64> [[A]], <8 x i64> [[SUB]]
-  // CHECK: select <8 x i1> %{{.*}}, <8 x i64> [[SEL]], <8 x i64> %{{.*}}
+  // CHECK: select <8 x i1> %{{.*}}, <8 x i64> {{.*}}, <8 x i64> %{{.*}}  // Ignore the output of the redundant bitcasts
   return _mm512_maskz_abs_epi64 (__U,__A);
 }

It also seems like for some of these tests that some bitcasts are already ignored.

There's some inliner running because the intrinsics are implemented as always_inline functions and they are clearly being inlined in -O0. In a previous post, Chandler said the new PM has a special inliner for always_inline in -O0 and the old pass manager just used the normal inliner.

Oh I forgot that these were marked always_inline. Yes, this special inliner is the AlwaysInliner which is purposefully designed differently than the normal inliner in the legacy PM according to D23299. I'm proposing that we could perhaps just edit the tests to ignore the bitcasts since the different behavior is intended (the AlwaysInliner isn't doing extra work like combining these bitcasts). This way we can still check for the various intrinsics emitted without their IR instruction mappings getting optimized out, and we won't need to use instsimplify to make sure the IR matches.

Taking an example from my other patch, we'd have something like:

diff --git a/clang/test/CodeGen/avx512f-builtins.c b/clang/test/CodeGen/avx512f-builtins.c
index 15571b639b6..4ad63d73235 100644
--- a/clang/test/CodeGen/avx512f-builtins.c
+++ b/clang/test/CodeGen/avx512f-builtins.c
@@ -10479,7 +10479,7 @@ __m512i test_mm512_maskz_abs_epi64 (__mmask8 __U, __m512i __A)
   // CHECK: [[SUB:%.*]] = sub <8 x i64> zeroinitializer, [[A:%.*]]
   // CHECK: [[CMP:%.*]] = icmp sgt <8 x i64> [[A]], zeroinitializer
   // CHECK: [[SEL:%.*]] = select <8 x i1> [[CMP]], <8 x i64> [[A]], <8 x i64> [[SUB]]
-  // CHECK: select <8 x i1> %{{.*}}, <8 x i64> [[SEL]], <8 x i64> %{{.*}}
+  // CHECK: select <8 x i1> %{{.*}}, <8 x i64> {{.*}}, <8 x i64> %{{.*}}  // Ignore the output of the redundant bitcasts
   return _mm512_maskz_abs_epi64 (__U,__A);
 }

It also seems like for some of these tests that some bitcasts are already ignored.

That would allow the select operands to be completely reversed silently. I'll admit that the intrinsics tests probably already have cases where the checks are weak, but we shouldn't lower the quality of the ones that do better checking already.

What if we just only check the output from the new pass manager. I don't think I care about the differences between the two.

What if we just only check the output from the new pass manager. I don't think I care about the differences between the two.

+1

*ping* Is it ok to proceed with only checking the new PM output for these tests? If so I could just edit my previous patch to remove the legacy PM run lines since they already include the bitcasts from the new PM.

Just to make sure we're on the same page (and sorry I didn't jump in sooner)...

With the old PM, *anything* that is always_inline *gets* instsimplify run on it, even at -O0, even if you didn't want that. So using -instsimplify explicitly is, IMO, not any more scary of a reliance on LLVM's behavior than the old PM already subjected us to...

That said, if the x86 maintainers are comfortable with *only* using the new PM (because it has an always inliner that literally does nothing else and thus has an absolute minimum amount of LLVM transformations applied), I certainly don't have any objections. =D

Just to make sure we're on the same page (and sorry I didn't jump in sooner)...

With the old PM, *anything* that is always_inline *gets* instsimplify run on it, even at -O0, even if you didn't want that. So using -instsimplify explicitly is, IMO, not any more scary of a reliance on LLVM's behavior than the old PM already subjected us to...

That said, if the x86 maintainers are comfortable with *only* using the new PM (because it has an always inliner that literally does nothing else and thus has an absolute minimum amount of LLVM transformations applied), I certainly don't have any objections. =D

My assumption is that eventually there will only be the "new PM". So eventually we'll only be testing that PM. So I don't have any issue testing only it now.

I created D65110 if we're ok with just using the new PM.

I created D65110 if we're ok with just using the new PM.

@craig.topper Is this patch still relevant?

craig.topper abandoned this revision.Aug 20 2019, 3:58 PM

I don't think so.