Implement asinf function correctly rounded for all rounding modes.

For `|x| <= 0.5`, we approximate `asin(x)` by

asin(x) = x * P(x^2)

where `P(X^2) = Q(X)` is a degree-20 minimax even polynomial approximating

`asin(x)/x` on `[0, 0.5]` generated by Sollya with:

> Q = fpminimax(asin(x)/x, [|0, 2, 4, 6, 8, 10, 12, 14, 16, 18, 20|], [|1, D...|], [0, 0.5]);

When `|x| > 0.5`, we perform range reduction as follow:

Assume further that `0.5 < x <= 1`, and let:

y = asin(x)

We will use the double angle formula:

cos(2X) = 1 - 2 sin^2(X)

and the complement angle identity:

x = sin(y) = cos(pi/2 - y) = 1 - 2 sin^2 (pi/4 - y/2)

So:

sin(pi/4 - y/2) = sqrt( (1 - x)/2 )

And hence:

pi/4 - y/2 = asin( sqrt( (1 - x)/2 ) )

Equivalently:

asin(x) = y = pi/2 - 2 * asin( sqrt( (1 - x)/2 ) )

Let `u = (1 - x)/2`, then

asin(x) = pi/2 - 2 * asin(u)

Moreover, since `0.5 < x <= 1`,

0 <= u < 1/4, and 0 <= sqrt(u) < 0.5.

And hence we can reuse the same polynomial approximation of `asin(x)` when

`|x| <= 0.5`:

asin(x) = pi/2 - 2 * u * P(u^2).

Performance benchmark using `perf` tool from the CORE-MATH project on Ryzen 1700:

$ CORE_MATH_PERF_MODE="rdtsc" ./perf.sh asinf CORE-MATH reciprocal throughput : 23.418 System LIBC reciprocal throughput : 27.310 LIBC reciprocal throughput : 22.741 $ CORE_MATH_PERF_MODE="rdtsc" ./perf.sh asinf --latency GNU libc version: 2.35 GNU libc release: stable CORE-MATH latency : 58.884 System LIBC latency : 62.055 LIBC latency : 62.037