Implement acosf function correctly rounded for all rounding modes.
We perform range reduction as follows:
- When |x| < 2^(-10), we use cubic Taylor polynomial:
acos(x) = pi/2 - asin(x) ~ pi/2 - x - x^3 / 6.
- When 2^(-10) <= |x| <= 0.5, we use the same approximation that is used for asinf(x) when |x| <= 0.5:
acos(x) = pi/2 - asin(x) ~ pi/2 - x - x^3 * P(x^2).
- When 0.5 < x <= 1, we use the double angle formula: cos(2y) = 1 - 2 * sin^2 (y) to reduce to:
acos(x) = 2 * asin( sqrt( (1 - x)/2 ) )
- When -1 <= x < -0.5, we reduce to the positive case above using the formula:
acos(x) = pi - acos(-x)
Performance benchmark using perf tool from the CORE-MATH project on Ryzen 1700:
$ CORE_MATH_PERF_MODE="rdtsc" ./perf.sh acosf GNU libc version: 2.35 GNU libc release: stable CORE-MATH reciprocal throughput : 28.613 System LIBC reciprocal throughput : 29.204 LIBC reciprocal throughput : 24.271 $ CORE_MATH_PERF_MODE="rdtsc" ./perf.sh asinf --latency GNU libc version: 2.35 GNU libc release: stable CORE-MATH latency : 55.554 System LIBC latency : 76.879 LIBC latency : 62.118
Please delete this. It is not needed here. My fault.