Implement exp10f function correctly rounded to all rounding modes.
Algorithm: perform range reduction to reduce
10^x = 2^(hi + mid) * 10^lo
where:
hi is an integer, 0 <= mid * 2^5 < 2^5 -log10(2) / 2^6 <= lo <= log10(2) / 2^6
Then 2^mid is stored in a table of 32 entries and the product 2^hi * 2^mid is
performed by adding hi into the exponent field of 2^mid.
10^lo is then approximated by a degree-5 minimax polynomials generated by Sollya with:
> P = fpminimax((10^x - 1)/x, 4, [|D...|], [-log10(2)/64. log10(2)/64]);
Performance benchmark using perf tool from the CORE-MATH project on Ryzen 1700:
$ CORE_MATH_PERF_MODE="rdtsc" ./perf.sh exp10f GNU libc version: 2.35 GNU libc release: stable CORE-MATH reciprocal throughput : 10.215 System LIBC reciprocal throughput : 7.944 LIBC reciprocal throughput : 38.538 LIBC reciprocal throughput : 12.175 (with `-msse4.2` flag) LIBC reciprocal throughput : 9.862 (with `-mfma` flag) $ CORE_MATH_PERF_MODE="rdtsc" ./perf.sh exp10f --latency GNU libc version: 2.35 GNU libc release: stable CORE-MATH latency : 40.744 System LIBC latency : 37.546 BEFORE LIBC latency : 48.989 LIBC latency : 44.486 (with `-msse4.2` flag) LIBC latency : 40.221 (with `-mfma` flag)
This patch relies on https://reviews.llvm.org/D134002