This is an archive of the discontinued LLVM Phabricator instance.

[RISCV][Clang] Add some RVV Floating-Point intrinsic functions. (vfclass, vfmerge, vfrec7, vfrsqrt7, vfsqrt)
ClosedPublic

Authored by khchen on Apr 1 2021, 9:26 AM.

Details

Summary

Support vfclass, vfmerge, vfrec7, vfrsqrt7, vfsqrt instructions.

Diff Detail

Event Timeline

khchen created this revision.Apr 1 2021, 9:26 AM
khchen requested review of this revision.Apr 1 2021, 9:26 AM
Herald added a project: Restricted Project. · View Herald TranscriptApr 1 2021, 9:26 AM
khchen retitled this revision from [RISCV][Clang] Add some RVV Floating-Point intrinsic functions. to [RISCV][Clang] Add some RVV Floating-Point intrinsic functions. (vfclass, vfmerge, vfrec7, vfrsqrt7, vfsqrt).
This revision is now accepted and ready to land.Apr 6 2021, 10:22 AM
This revision was landed with ongoing or failed builds.Apr 11 2021, 7:30 PM
This revision was automatically updated to reflect the committed changes.
thakis added a subscriber: thakis.Apr 14 2021, 4:36 PM

One of the ten commits

One of the ten commits that landed here (https://github.com/llvm/llvm-project/compare/a3bfddbb6a27...59d5b8c27b43), this being one of them, slowed down check-clang by over 20%: https://bugs.llvm.org/show_bug.cgi?id=49962

Since recovering test runtime regressions is very difficult, it's probably a good idea to revert this first and then reland in smaller chunks to see which of the commits causes the problem (?)

One of the ten commits that landed here (https://github.com/llvm/llvm-project/compare/a3bfddbb6a27...59d5b8c27b43), this being one of them, slowed down check-clang by over 20%: https://bugs.llvm.org/show_bug.cgi?id=49962

Since recovering test runtime regressions is very difficult, it's probably a good idea to revert this first and then reland in smaller chunks to see which of the commits causes the problem (?)

The fact that we're running the optimization pipeline in these tests might be to blame. Might also be that riscv_vector.h is currently about 71000 lines which probably isn't quick to parse.

The fact that we're running the optimization pipeline in these tests might be to blame. Might also be that riscv_vector.h is currently about 71000 lines which probably isn't quick to parse.

I found removing all overloaded api definition functions in riscv_vector.h could save the time more than 100% (run llvm-lit with one file vloxei.c, cost 3.57s-> 1.56s)
I'm not sure how to improve the riscv_vector.h, is it possible to generate a PCH for riscv_vector.h?

rogfer01 added a comment.EditedApr 15 2021, 12:09 AM

One of the ten commits that landed here (https://github.com/llvm/llvm-project/compare/a3bfddbb6a27...59d5b8c27b43), this being one of them, slowed down check-clang by over 20%: https://bugs.llvm.org/show_bug.cgi?id=49962

Since recovering test runtime regressions is very difficult, it's probably a good idea to revert this first and then reland in smaller chunks to see which of the commits causes the problem (?)

The fact that we're running the optimization pipeline in these tests might be to blame. Might also be that riscv_vector.h is currently about 71000 lines which probably isn't quick to parse.

I hacked the RISCVVEmitter.cpp file to emit a riscv_vector_testing.h that wraps all the definitions with macros like RVV_TEST_vadd and RVV_TEST_vadd_mask (I used the IRName as it was easier for the experiment).

The testing header looks like this

#if defined(RVV_TEST_vadd)                                                       
__rvv_overloaded vint8m1_t vadd(vint8m1_t op0, vint8m1_t op1, size_t op2){       
  return vadd_vv_i8m1(op0, op1, op2);                                            
}                                                                                
#endif // RVV_TEST_vadd                                                          
                                                                                 
#if defined(RVV_TEST_vadd_mask)                                                  
__rvv_overloaded vint8m1_t vadd(vbool8_t op0, vint8m1_t op1, vint8m1_t op2, vint8m1_t op3, size_t op4){
  return vadd_vv_i8m1_m(op0, op1, op2, op3, op4);                                
}                                                                                
#endif // RVV_TEST_vadd_mask                                                     

In a release build (in a machine that is not very fast) looks like this:

$ ./bin/llvm-lit --time-test -sv ../llvm-src/clang/test/CodeGen/RISCV/rvv-intrinsics/vadd.c ../llvm-src/clang/test/CodeGen/RISCV/rvv-intrinsics/vadd_fast.c 
llvm-lit: /home/rferrer/fio/upstream/llvm-src/llvm/utils/lit/lit/llvm/config.py:428: note: using clang: /home/rferrer/fio/upstream/llvm-build-release/bin/clang
Slowest Tests:
--------------------------------------------------------------------------
8.91s: Clang :: CodeGen/RISCV/rvv-intrinsics/vadd.c
2.11s: Clang :: CodeGen/RISCV/rvv-intrinsics/vadd_fast.c

Where vadd.c is the current file and vadd_fast.c is the same as vadd.c with this change

--- vadd.c	2021-04-15 06:51:30.554996630 +0000
+++ vadd_fast.c	2021-04-15 07:00:38.684074973 +0000
@@ -8,7 +8,9 @@
 // RUN:   -Werror -Wall -o - %s -S >/dev/null 2>&1 | FileCheck --check-prefix=ASM --allow-empty %s
 
 // ASM-NOT: warning
-#include <riscv_vector.h>
+#define RVV_TEST_vadd
+#define RVV_TEST_vadd_mask
+#include <riscv_vector_testing.h>
 
 // CHECK-RV32-LABEL: @test_vadd_vv_i8mf8(
 // CHECK-RV32-NEXT:  entry:

This seems to suggest that making sure the tested header contains only the necessary will significantly speed up the testing.

We would want to make sure riscv_vector.h and riscv_vector_testing.h are equivalent when all the macros are enabled. I guess such a test is feasible if we preprocess the testing header with all the RVV_TEST_nnn macros enabled and then we do some form of textual comparison (which will likely have to ignore whitespace). This way we can test riscv_vector.h using its proxy riscv_vector_testing.h.

I've left a PoC of my approach here https://reviews.llvm.org/D100529 in case we can use that

Was there any more progress on this? If not, let's revert for now to not permanently slow down tests by over 20%.

Was there any more progress on this? If not, let's revert for now to not permanently slow down tests by over 20%.

https://reviews.llvm.org/D100611
https://reviews.llvm.org/D100617

Do you mean revert https://github.com/llvm/llvm-project/compare/a3bfddbb6a27...59d5b8c27b43?

Was there any more progress on this? If not, let's revert for now to not permanently slow down tests by over 20%.

https://reviews.llvm.org/D100611
https://reviews.llvm.org/D100617

Great, let's wait and see for a few days to see if it lands, and if so how much that recovers then.

Do you mean revert https://github.com/llvm/llvm-project/compare/a3bfddbb6a27...59d5b8c27b43?

Yes.

Was there any more progress on this? If not, let's revert for now to not permanently slow down tests by over 20%.

https://reviews.llvm.org/D100611
https://reviews.llvm.org/D100617

Great, let's wait and see for a few days to see if it lands, and if so how much that recovers then.

Do you mean revert https://github.com/llvm/llvm-project/compare/a3bfddbb6a27...59d5b8c27b43?

Yes.

Please wait a few days.
Those commits add almost total 300k line from all tests.
I think maybe reverting is not a good idea if we have a good progress on this slow down issue.

Please wait a few days.
Those commits add almost total 300k line from all tests.
I think maybe reverting is not a good idea if we have a good progress on this slow down issue.

Yes, if we can fix the regression in a few days, then all is good.

I think maybe reverting is not a good idea if we have a good progress on this slow down issue.

Yes, if we can fix the regression in a few days, then all is good.

Another ping here. check-clang is down to 5 min now, but it's still a 11% regression (used to be 4.5 min, went to 5.5 min, now 5 min).

I think maybe reverting is not a good idea if we have a good progress on this slow down issue.

Yes, if we can fix the regression in a few days, then all is good.

Another ping here. check-clang is down to 5 min now, but it's still a 11% regression (used to be 4.5 min, went to 5.5 min, now 5 min).

https://reviews.llvm.org/D100611 has not landed yet. Is the 5 min number with or without that patch?

I think maybe reverting is not a good idea if we have a good progress on this slow down issue.

Yes, if we can fix the regression in a few days, then all is good.

Another ping here. check-clang is down to 5 min now, but it's still a 11% regression (used to be 4.5 min, went to 5.5 min, now 5 min).

https://reviews.llvm.org/D100611 has not landed yet. Is the 5 min number with or without that patch?

Without. I'm just looking at my bot cycle times. Ok, let's wait for that to play out :)

@thakis
https://reviews.llvm.org/D100611 had landed, could you check your bot to see the cycle time?

No effect, or even worse. We're back at 5:30 cycle time and we'v been in that state for over 2 weeks now. I think it's time to revert back to green, so this slowdown doesn't become permanent.

(here's a build form a few minutes ago: http://45.33.8.238/macm1/8325/summary.html)

craig.topper added a comment.EditedApr 27 2021, 4:36 PM

I believe we had reductions at builds 7601 and 8175.

It looks like there may have been an increase between builds 7979 and 7991 the build failed for a while there. I suspect "e951b04 [AArch64][SVE] Regression test all ACLE tests with C++" as it added a quite a few RUN lines.

I think we also caused an increase at build 8202 when these 4 commits went in
bd32c2d [RISCV] Implement the vwcvt{u}.x.x.v/vncvt.x.x.w builtin.
645c5f2 [RISCV] Implement the pseudo compare builtin.
bfb3fca [RISCV] Implement the vfabs.v/vfneg.v builtin.
4b24341 [RISCV] Implement the vmmv.m/vmnot.m builtin.

I believe we had reductions at builds 7601 and 8175.

It looks like there may have been an increase between builds 7979 and 7991 the build failed for a while there. I suspect "e951b04 [AArch64][SVE] Regression test all ACLE tests with C++" as it added a quite a few RUN lines.

I think we also caused an increase at build 8202 when these 4 commits went in
bd32c2d [RISCV] Implement the vwcvt{u}.x.x.v/vncvt.x.x.w builtin.
645c5f2 [RISCV] Implement the pseudo compare builtin.
bfb3fca [RISCV] Implement the vfabs.v/vfneg.v builtin.
4b24341 [RISCV] Implement the vmmv.m/vmnot.m builtin.

Any disagreement about reverting riscv builtin changes until there's some way to test them without increasing check-clang time by 20%?

I believe we had reductions at builds 7601 and 8175.

It looks like there may have been an increase between builds 7979 and 7991 the build failed for a while there. I suspect "e951b04 [AArch64][SVE] Regression test all ACLE tests with C++" as it added a quite a few RUN lines.

I think we also caused an increase at build 8202 when these 4 commits went in
bd32c2d [RISCV] Implement the vwcvt{u}.x.x.v/vncvt.x.x.w builtin.
645c5f2 [RISCV] Implement the pseudo compare builtin.
bfb3fca [RISCV] Implement the vfabs.v/vfneg.v builtin.
4b24341 [RISCV] Implement the vmmv.m/vmnot.m builtin.

Any disagreement about reverting riscv builtin changes until there's some way to test them without increasing check-clang time by 20%?

We're preparing a patch to remove to stop testing both rv32 and rv64 on every test. That should reduce the time by half. What is an acceptable number?

I believe we had reductions at builds 7601 and 8175.

It looks like there may have been an increase between builds 7979 and 7991 the build failed for a while there. I suspect "e951b04 [AArch64][SVE] Regression test all ACLE tests with C++" as it added a quite a few RUN lines.

I think we also caused an increase at build 8202 when these 4 commits went in
bd32c2d [RISCV] Implement the vwcvt{u}.x.x.v/vncvt.x.x.w builtin.
645c5f2 [RISCV] Implement the pseudo compare builtin.
bfb3fca [RISCV] Implement the vfabs.v/vfneg.v builtin.
4b24341 [RISCV] Implement the vmmv.m/vmnot.m builtin.

Any disagreement about reverting riscv builtin changes until there's some way to test them without increasing check-clang time by 20%?

We're preparing a patch to remove to stop testing both rv32 and rv64 on every test. That should reduce the time by half. What is an acceptable number?

I have removed rv32 test cases for vector intrinsics in https://github.com/llvm/llvm-project/commit/b358a2be52480c008a0789c78f5df1bb2053bfd8.

thakis added a subscriber: rnk.Apr 30 2021, 6:22 PM

! In D99741#2721669, @craig.topper wrote:
We're preparing a patch to remove to stop testing both rv32 and rv64 on every test. That should reduce the time by half. What is an acceptable number?

You tell me. I'd say well below 1%?

I wanted to prepare a graph that shows check-clang time over the last year – it stayed basically constant and the jumped by 20% (now 10% now that half the tests are disabled). It'd look dramatic, but I haven't had time to make a graph. I'm sure you can imagine it though :) (Also, I feel the burden of proof shouldn't be on people pointing out regressions.)

@rnk suggested putting these tests behind some kind of REQUIRES: expensive-checks thing that's off by default, instead of reverting everything, until perf is sorted out. That sounds like a great idea to me – wdyt?

craig.topper added a comment.EditedMay 1 2021, 2:53 PM

! In D99741#2721669, @craig.topper wrote:
We're preparing a patch to remove to stop testing both rv32 and rv64 on every test. That should reduce the time by half. What is an acceptable number?

You tell me. I'd say well below 1%?

I wanted to prepare a graph that shows check-clang time over the last year – it stayed basically constant and the jumped by 20% (now 10% now that half the tests are disabled). It'd look dramatic, but I haven't had time to make a graph. I'm sure you can imagine it though :) (Also, I feel the burden of proof shouldn't be on people pointing out regressions.)

@rnk suggested putting these tests behind some kind of REQUIRES: expensive-checks thing that's off by default, instead of reverting everything, until perf is sorted out. That sounds like a great idea to me – wdyt?

I just uploaded D101700 which takes lit on clang/test/CodeGen/RISCV/rvv* from ~8 seconds to ~5.5 seconds as measured on a 48 core server I use at work. Not sure how that translates to a less capable machine with less cores, slower disk, etc.

For comparison, here are some times for running lit on subsets of tests on this machine before the patch. Mostly looked at clang/test/CodeGen as I was curious to see how close we are to other targets.

clang/test/                 - 68.75 seconds
clang/test/CodeGen          - 24.59 seconds
clang/test/CodeGen/X86      - 2.68 seconds
clang/test/CodeGen/aarch64* - 12.33 seconds
clang/test/CodeGen/arm*     - 2.04 seconds

We're also considering D100529 to further reduce the RISCV test time. Another option we've talked about is concatenating some of the tests into a single file to cut down the number of times we parse riscv_vector.h.

At the time the check-clang time increase was first raised, lit on clang/test/CodeGen/RISCV/rvv* was 22.57 seconds on my machine. clang/test/CodeGen/ was 34.15 seconds.