This is an archive of the discontinued LLVM Phabricator instance.

[libc][math] Added coshf function.
ClosedPublic

Authored by orex on Jul 7 2022, 4:14 AM.

Details

Summary

Latest performance:

CORE_MATH_PERF_MODE=rdtsc PERF_ARGS='' ./perf.sh coshf
GNU libc version: 2.31
GNU libc release: stable
18.077
12.679
13.202
CORE_MATH_PERF_MODE=rdtsc PERF_ARGS='--latency' ./perf.sh coshf
GNU libc version: 2.31
GNU libc release: stable
48.662
37.058
51.001

Diff Detail

Event Timeline

orex created this revision.Jul 7 2022, 4:14 AM
Herald added projects: Restricted Project, Restricted Project. · View Herald TranscriptJul 7 2022, 4:14 AM
orex requested review of this revision.Jul 7 2022, 4:14 AM

I couldn't build this patch on top of 'main' (revision 81af344):

/localdisk/zimmerma/llvm-project/libc/src/math/generic/coshf.cpp:11:10: fatal error: 'src/math/generic/expxf.h' file not found
#include "src/math/generic/expxf.h"
         ^~~~~~~~~~~~~~~~~~~~~~~~~~

Is there any dependency?

orex added a comment.Jul 12 2022, 4:58 AM

Yes. You should first apply D129005, after D129215 and after that one. The initial revision were cut to several ones to improve review process and new function deployment.

Yes. You should first apply D129005, after D129215 and after that one. The initial revision were cut to several ones to improve review process and new function deployment.

thank you. I get one patch failure when I merge the last patch:

$ patch -p1 -i /tmp/D129275.diff
patching file libc/config/darwin/arm/entrypoints.txt
Hunk #1 succeeded at 111 (offset 1 line).
patching file libc/config/linux/aarch64/entrypoints.txt
Hunk #1 succeeded at 130 (offset 1 line).
patching file libc/config/linux/x86_64/entrypoints.txt
Hunk #1 succeeded at 136 (offset 1 line).
patching file libc/config/windows/entrypoints.txt
Hunk #1 succeeded at 114 (offset 1 line).
patching file libc/spec/stdc.td
patching file libc/src/__support/FPUtil/FPBits.h
Hunk #1 FAILED at 57.
Hunk #2 succeeded at 154 (offset 4 lines).
1 out of 2 hunks FAILED -- saving rejects to file libc/src/__support/FPUtil/FPBits.h.rej
patching file libc/src/math/CMakeLists.txt
patching file libc/src/math/coshf.h
patching file libc/src/math/generic/CMakeLists.txt
patching file libc/src/math/generic/coshf.cpp
patching file libc/test/src/math/CMakeLists.txt
Hunk #1 succeeded at 1314 (offset 3 lines).
patching file libc/test/src/math/coshf_test.cpp
patching file libc/test/src/math/exhaustive/CMakeLists.txt
patching file libc/test/src/math/exhaustive/coshf_test.cpp
patching file libc/utils/MPFRWrapper/MPFRUtils.h
patching file libc/utils/MPFRWrapper/MPFRUtils.cpp

This is on top of revision 1301995 (main).

orex added a comment.Jul 12 2022, 5:52 AM

Can you try to put all the chain on top of the revision 60d6be5dd3f411cfe1b5392cbb... for now. I'll rebase the revisions to the last main tonight.

Can you try to put all the chain on top of the revision 60d6be5dd3f411cfe1b5392cbb... for now. I'll rebase the revisions to the last main tonight.

thank you, it works perfectly. All exhaustive tests do pass for the 4 rounding modes. For the efficiency I get on a AMD EPYC 7282 with gcc 10.2.1 and clang 11.0.1-2:

zimmerma@biscotte:~/svn/core-math$ LIBM=/localdisk/zimmerma/llvm-project/build/projects/libc/lib/libllvmlibc.a CORE_MATH_LAUNCHER="/localdisk/zimmerma/glibc-2.35/install/lib/ld-linux-x86-64.so.2 --library-path /localdisk/zimmerma/glibc-2.35/install/lib" CORE_MATH_PERF_MODE=rdtsc ./perf.sh coshf
GNU libc version: 2.35
GNU libc release: stable
17.537
15.789
35.447

which means 18 cycles for CORE-MATH, 16 cycles for glibc 2.35, and 35 for llvm-libc.

lntue added inline comments.Jul 19 2022, 9:45 PM
libc/src/__support/FPUtil/FPBits.h
62

Why do we need extra inline's here? They should be implicit for class methods with definitions in the headers?

libc/src/math/generic/coshf.cpp
44–46

Add comments about your expanded formula / computations:

exp(x) = ep_p.mult_exp * (ep_p.r + 1)
exp(-x) = ep_m.mult_exp * (ep_m.r + 1)
cosh(x) = (exp(x) + exp(-x)) / 2 = ...

In the evaluation, it looks like there is (... - 1.0) + (... - 1.0) followed by (0.5 * ...) + 1.0 so all the 1.0's are actually cancelled. Maybe this can be simplified to:

ep = fputil::multiply_add(ep_p.mult_exp, ep_p.r, ep_p.mult_exp) +
     fputil::multiply_add(ep_m.mult_exp, ep_m.r, ep_m.mult_exp);
return 0.5 * ep;

And the 0.5 * ep can even be dropped with an update in exp_eval.

libc/test/src/math/CMakeLists.txt
1320

Unit test doesn't need NO_RUN_POSTBUILD, unless you only want to run it manually.

libc/test/src/math/coshf_test.cpp
65

0.5 for tolerance?

72

0.5 for tolerance?

77

0.5 for tolerance?

orex updated this revision to Diff 448284.Jul 28 2022, 3:49 AM

Rebasing on main with small fixes.

orex edited the summary of this revision. (Show Details)Jul 28 2022, 3:56 AM
orex updated this revision to Diff 448303.Jul 28 2022, 4:57 AM
orex marked 6 inline comments as done.

Review fixes.

lntue added inline comments.Jul 28 2022, 6:27 AM
libc/src/math/generic/CMakeLists.txt
1137

Add expxf.h to HDRS and nearest_integer to DEPENDS

orex updated this revision to Diff 448322.Jul 28 2022, 6:38 AM

Review fixes.

lntue accepted this revision.Jul 28 2022, 6:43 AM

Looks good to me. Let's wait for @zimmermann6 to confirm the accuracy and performance.

This revision is now accepted and ready to land.Jul 28 2022, 6:43 AM

I get slightly different figures on my machine:

zimmerma@biscotte:~/svn/core-math$ LIBM=/localdisk/zimmerma/llvm-project/build/projects/libc/lib/libllvmlibc.a CORE_MATH_PERF_MODE=rdtsc ./perf.sh coshf
GNU libc version: 2.33
GNU libc release: release
17.730
19.322
22.815
zimmerma@biscotte:~/svn/core-math$ LIBM=/localdisk/zimmerma/llvm-project/build/projects/libc/lib/libllvmlibc.a CORE_MATH_PERF_MODE=rdtsc PERF_ARGS=--latency ./perf.sh coshf
GNU libc version: 2.33
GNU libc release: release
49.478
48.614
75.194
orex added a comment.Jul 29 2022, 3:42 AM

Thank you Paul for sharing the results.
The results I got using the same compiler (llvm 11) as Paul (@zimmermann6):

CORE_MATH_PERF_MODE=rdtsc PERF_ARGS='' ./perf.sh coshf
GNU libc version: 2.31
GNU libc release: stable
18.534
13.019
17.590
CORE_MATH_PERF_MODE=rdtsc PERF_ARGS='--latency' ./perf.sh coshf
GNU libc version: 2.31
GNU libc release: stable
49.670
38.334
50.461

As you can see llvm 12 significantly improve throughput of this version of coshf over version 11. Partially this problem can be explained by this difference. Another source of the difference is Intel vs AMD. We observe such difference with (@lntue).
Paul, can you confirm, that the precision is OK? I think that we can push the changes even though the solution is not the fastest for all platforms/compilers? Tue?

lntue added a comment.Jul 29 2022, 6:31 AM

Thank you Paul for sharing the results.
The results I got using the same compiler (llvm 11) as Paul (@zimmermann6):

CORE_MATH_PERF_MODE=rdtsc PERF_ARGS='' ./perf.sh coshf
GNU libc version: 2.31
GNU libc release: stable
18.534
13.019
17.590
CORE_MATH_PERF_MODE=rdtsc PERF_ARGS='--latency' ./perf.sh coshf
GNU libc version: 2.31
GNU libc release: stable
49.670
38.334
50.461

As you can see llvm 12 significantly improve throughput of this version of coshf over version 11. Partially this problem can be explained by this difference. Another source of the difference is Intel vs AMD. We observe such difference with (@lntue).
Paul, can you confirm, that the precision is OK? I think that we can push the changes even though the solution is not the fastest for all platforms/compilers? Tue?

Even though there is a regression in performance with clang-11, it still looks good for me. You can go ahead with this patch.

This revision was automatically updated to reflect the committed changes.