Use nearest_integer instructions to improve expf performance.
Performance tests with CORE-MATH's perf tool:
Before the patch:
$ ./perf.sh expf LIBC-location: /home/lnt/experiment/llvm-project/build/projects/libc/lib/libllvmlibc.a GNU libc version: 2.31 GNU libc release: stable CORE-MATH reciprocal throughput : 9.860 System LIBC reciprocal throughput : 7.728 LIBC reciprocal throughput : 12.363 $ ./perf.sh expf --latency LIBC-location: /home/lnt/experiment/llvm-project/build/projects/libc/lib/libllvmlibc.a GNU libc version: 2.31 GNU libc release: stable CORE-MATH latency : 42.802 System LIBC latency : 35.941 LIBC latency : 49.808
After the patch:
$ ./perf.sh expf LIBC-location: /home/lnt/experiment/llvm/llvm-project/build/projects/libc/lib/libllvmlibc.a GNU libc version: 2.31 GNU libc release: stable CORE-MATH reciprocal throughput : 9.441 System LIBC reciprocal throughput : 7.382 LIBC reciprocal throughput : 8.843 $ ./perf.sh expf --latency LIBC-location: /home/lnt/experiment/llvm/llvm-project/build/projects/libc/lib/libllvmlibc.a GNU libc version: 2.31 GNU libc release: stable CORE-MATH latency : 44.192 System LIBC latency : 37.693 LIBC latency : 44.145
you should probably also explicitly include multiply add