This is an archive of the discontinued LLVM Phabricator instance.

[libc][math] Improved performance of exp2f function.
ClosedPublic

Authored by orex on Jul 1 2022, 10:04 AM.

Details

Summary

New exp2 function algorithm:

  1. Improved performance: 19.211 vs 29.002 by core-math perf tool.
  2. Improved accuracy. Only two special values left.
  3. Lookup table size reduced twice.

Latest version benchmark:

GNU libc version: 2.31
GNU libc release: stable
8.009
5.610
8.176

Diff Detail

Event Timeline

orex created this revision.Jul 1 2022, 10:04 AM
Herald added projects: Restricted Project, Restricted Project. · View Herald TranscriptJul 1 2022, 10:04 AM
orex requested review of this revision.Jul 1 2022, 10:04 AM

I confirm the new function is correctly rounded (for all rounding modes). For what concerns efficiency, here is what I get on a AMD EPYC 7282 with gcc 10.2.1 and clang 11.0.1-2:

zimmerma@biscotte:~/svn/core-math$ LIBM=/localdisk/zimmerma/llvm-project/build/projects/libc/lib/libllvmlibc.a CORE_MATH_PERF_MODE=rdtsc ./perf.sh exp2f
GNU libc version: 2.31
GNU libc release: stable
9.720
6.228
18.081


zimmerma@biscotte:~/svn/core-math$ LIBM=/localdisk/zimmerma/llvm-project/build/projects/libc/lib/libllvmlibc-bec8dff.a CORE_MATH_PERF_MODE=rdtsc ./perf.sh exp2f
GNU libc version: 2.31
GNU libc release: stable
9.714
6.215
23.455

The first run is with this patch, the second one with revision bec8dff of llvm-libc: the reciprocal throughput decreased from 23 to 18 cycles.

orex updated this revision to Diff 448055.Jul 27 2022, 8:29 AM

Improved performance by fputil::nearest_integer function.

orex edited the summary of this revision. (Show Details)Jul 27 2022, 8:31 AM

Overall the improvement looks good for me. Let's wait for @zimmermann6 to double check the numbers.

libc/src/math/generic/CMakeLists.txt
504

missing nearest_integer dependency.

libc/src/math/generic/exp2f.cpp
14

You should also add multiply_add header and dependency directly.

orex updated this revision to Diff 448122.Jul 27 2022, 12:06 PM

Added explicit dependencies.

orex marked 2 inline comments as done.Jul 27 2022, 12:07 PM

OK. Thanks! let's wait.

zimmermann6 accepted this revision.Jul 28 2022, 12:19 AM

I confirm the reciprocal throughput decreased from 18 to 10 cycles (on the same machine as above):

zimmerma@biscotte:~/svn/core-math$ LIBM=/localdisk/zimmerma/llvm-project/build/projects/libc/lib/libllvmlibc.a CORE_MATH_PERF_MODE=rdtsc ./perf.sh exp2f
GNU libc version: 2.33
GNU libc release: release
9.728
7.085
10.040
zimmerma@biscotte:~/svn/core-math$ LIBM=/localdisk/zimmerma/llvm-project/build/projects/libc/lib/libllvmlibc.a CORE_MATH_PERF_MODE=rdtsc PERF_ARGS=--latency ./perf.sh exp2f
GNU libc version: 2.33
GNU libc release: release
37.106
29.520
48.515

Good work Kirill!

This revision is now accepted and ready to land.Jul 28 2022, 12:19 AM
orex updated this revision to Diff 448258.Jul 28 2022, 1:21 AM

Merged to last main, some cosmetic changes.

This revision was automatically updated to reflect the committed changes.