This is an archive of the discontinued LLVM Phabricator instance.

[libc][math] Implement acosf function correctly rounded for all rounding modes.
ClosedPublic

Authored by lntue on Sep 8 2022, 8:04 PM.

Details

Summary

Implement acosf function correctly rounded for all rounding modes.

We perform range reduction as follows:

  • When |x| < 2^(-10), we use cubic Taylor polynomial:
acos(x) = pi/2 - asin(x) ~ pi/2 - x - x^3 / 6.
  • When 2^(-10) <= |x| <= 0.5, we use the same approximation that is used for asinf(x) when |x| <= 0.5:
acos(x) = pi/2 - asin(x) ~ pi/2 - x - x^3 * P(x^2).
  • When 0.5 < x <= 1, we use the double angle formula: cos(2y) = 1 - 2 * sin^2 (y) to reduce to:
acos(x) = 2 * asin( sqrt( (1 - x)/2 ) )
  • When -1 <= x < -0.5, we reduce to the positive case above using the formula:
acos(x) = pi - acos(-x)

Performance benchmark using perf tool from the CORE-MATH project on Ryzen 1700:

$ CORE_MATH_PERF_MODE="rdtsc" ./perf.sh acosf
GNU libc version: 2.35
GNU libc release: stable
CORE-MATH reciprocal throughput   : 28.613
System LIBC reciprocal throughput : 29.204
LIBC reciprocal throughput        : 24.271

$ CORE_MATH_PERF_MODE="rdtsc" ./perf.sh asinf --latency
GNU libc version: 2.35
GNU libc release: stable
CORE-MATH latency   : 55.554
System LIBC latency : 76.879
LIBC latency        : 62.118

Diff Detail

Event Timeline

lntue created this revision.Sep 8 2022, 8:04 PM
Herald added projects: Restricted Project, Restricted Project. · View Herald TranscriptSep 8 2022, 8:04 PM
lntue requested review of this revision.Sep 8 2022, 8:04 PM
lntue updated this revision to Diff 458952.Sep 8 2022, 8:46 PM

Fix tests and exceptional values.

lntue edited the summary of this revision. (Show Details)Sep 8 2022, 8:57 PM
zimmermann6 accepted this revision.Sep 9 2022, 1:33 AM

great work! The reciprocal throughput is indeed slightly better than CORE-MATH, and the latency slightly worse:

# reciprocal throughput
GNU libc version: 2.34
GNU libc release: stable
33.819
37.064
29.462
# latency
GNU libc version: 2.34
GNU libc release: stable
54.951
80.046
62.001
This revision is now accepted and ready to land.Sep 9 2022, 1:33 AM
orex accepted this revision.Sep 9 2022, 1:49 AM
orex added inline comments.
libc/src/math/generic/asinf.cpp
17

Please delete this. It is not needed here. My fault.

libc/src/math/generic/inv_trigf_utils.h
95

Don't you think that it is better to put this array to cpp file?

libc/test/src/math/exhaustive/acosf_test.cpp
39

Do you really need this test until inf? Out of range values can be covered by unittests?

lntue added inline comments.Sep 9 2022, 6:46 AM
libc/src/math/generic/asinf.cpp
17

We do need this to set EDOM for out of range inputs. errno.h header is generated separately (it's kind of complete I think) and does not depend on math.h, so it's safe to include and use in math entrypoints. That's not true for math.h constants though, as we are building that header's implementation here.

libc/src/math/generic/inv_trigf_utils.h
95

This is a small table that we do want to inline, and we don't take its address anywhere, so by leaving its definition in the header, it does improve the performance. I got the reciprocal throughput of 24 for leaving its definition here, vs 26 for putting its definition in the cpp file and providing external linkage in the header.

libc/test/src/math/exhaustive/acosf_test.cpp
39

I think it's better to leave the whole range in the committed tests to safe guard against future changes. Ideally (and soon) we will run these tests automatically with the CI's. For manual testing, we can always restrict the the range that we are interested. The main reason that I didn't make other exhaustive tests running full range is that the ulp function comparing to mpfr outputs did not handle NaN properly. It was fixed in https://reviews.llvm.org/D133400, so I think it's better to just test everything in these exhaustive tests.