Improve the performance of expm1f:
- Rearrange the selection logic for different cases to improve the overall
throughput.
- Use the same degree-4 polynomial for large inputs as expf
(https://reviews.llvm.org/D122418), reduced from a degree-7 polynomial.
Performance benchmark using perf tool from CORE-MATH project
(https://gitlab.inria.fr/core-math/core-math/-/tree/master):
Before this patch:
$ ./perf.sh expm1f CORE-MATH reciprocal throughput : 15.362 System LIBC reciprocal throughput : 53.288 LIBC reciprocal throughput : 54.572 $ ./perf.sh expm1f --latency CORE-MATH latency : 57.759 System LIBC latency : 147.146 LIBC latency : 118.057
After this patch:
$ ./perf.sh expm1f CORE-MATH reciprocal throughput : 15.359 System LIBC reciprocal throughput : 53.188 LIBC reciprocal throughput : 14.600 $ ./perf.sh expm1f --latency CORE-MATH latency : 57.774 System LIBC latency : 147.119 LIBC latency : 60.280