Implement asinf function correctly rounded for all rounding modes.
For |x| <= 0.5, we approximate asin(x) by
asin(x) = x * P(x^2)
where P(X^2) = Q(X) is a degree-20 minimax even polynomial approximating
asin(x)/x on [0, 0.5] generated by Sollya with:
> Q = fpminimax(asin(x)/x, [|0, 2, 4, 6, 8, 10, 12, 14, 16, 18, 20|], [|1, D...|], [0, 0.5]);
When |x| > 0.5, we perform range reduction as follow:
Assume further that 0.5 < x <= 1, and let:
y = asin(x)
We will use the double angle formula:
cos(2X) = 1 - 2 sin^2(X)
and the complement angle identity:
x = sin(y) = cos(pi/2 - y) = 1 - 2 sin^2 (pi/4 - y/2)
So:
sin(pi/4 - y/2) = sqrt( (1 - x)/2 )
And hence:
pi/4 - y/2 = asin( sqrt( (1 - x)/2 ) )
Equivalently:
asin(x) = y = pi/2 - 2 * asin( sqrt( (1 - x)/2 ) )
Let u = (1 - x)/2, then
asin(x) = pi/2 - 2 * asin(u)
Moreover, since 0.5 < x <= 1,
0 <= u < 1/4, and 0 <= sqrt(u) < 0.5.
And hence we can reuse the same polynomial approximation of asin(x) when
|x| <= 0.5:
asin(x) = pi/2 - 2 * u * P(u^2).
Performance benchmark using perf tool from the CORE-MATH project on Ryzen 1700:
$ CORE_MATH_PERF_MODE="rdtsc" ./perf.sh asinf CORE-MATH reciprocal throughput : 23.418 System LIBC reciprocal throughput : 27.310 LIBC reciprocal throughput : 22.741 $ CORE_MATH_PERF_MODE="rdtsc" ./perf.sh asinf --latency GNU libc version: 2.35 GNU libc release: stable CORE-MATH latency : 58.884 System LIBC latency : 62.055 LIBC latency : 62.037