This is an archive of the discontinued LLVM Phabricator instance.

[libc] Change sinf range reduction to mod pi/16 to be shared with cosf.
ClosedPublic

Authored by lntue on Jul 27 2022, 5:34 AM.

Details

Summary

Change sinf range reduction to mod pi/16 to be shared with cosf.

Previously, sinf used range reduction mod pi, but this cannot be used to implement cosf since the minimax algorithm for cosf does not converge due to critical points at pi/2. In order to be able to share the same range reduction functions for both sinf and cosf, we change the range reduction to mod pi/16 for the following reasons:

  • The table size is sufficiently small: 32 entries for sin(k * pi/16) with k = 0..31. It could be reduced to 16 entries if we treat the final sign separately, with an extra multiplication at the end.
  • The polynomials' degrees are reduced to 7/8 from 15, with extra computations to combine sin and cos with trig sum equality.
  • The number of exceptional cases reduced to 2 (with FMA) and 3 (without FMA).
  • The latency is reduced while maintaining similar throughput as before.

Diff Detail

Event Timeline

lntue created this revision.Jul 27 2022, 5:34 AM
Herald added projects: Restricted Project, Restricted Project. · View Herald TranscriptJul 27 2022, 5:34 AM
lntue requested review of this revision.Jul 27 2022, 5:34 AM
lntue edited the summary of this revision. (Show Details)Jul 27 2022, 5:44 AM
zimmermann6 accepted this revision.Jul 27 2022, 6:02 AM

here are the timings I get:

zimmerma@biscotte:~/svn/core-math$ !273
LIBM=/localdisk/zimmerma/llvm-project/build/projects/libc/lib/libllvmlibc.a CORE_MATH_PERF_MODE=rdtsc ./perf.sh sinf
GNU libc version: 2.33
GNU libc release: release
16.784
23.823
14.114
zimmerma@biscotte:~/svn/core-math$ LIBM=/localdisk/zimmerma/llvm-project/build/projects/libc/lib/libllvmlibc.a CORE_MATH_PERF_MODE=rdtsc PERF_ARGS=--latency ./perf.sh sinf
GNU libc version: 2.33
GNU libc release: release
47.889
57.795
52.998
This revision is now accepted and ready to land.Jul 27 2022, 6:02 AM
orex added inline comments.Jul 27 2022, 8:38 AM
libc/src/math/generic/range_reduction.h
82

From my point of view, this line can be changed to static_cast<int64_t>(k_hi + k_low), because k_hi and k_low are already "integer", so you can do one static cast instead of two. Probably it can increase performance.

lntue added inline comments.Jul 27 2022, 8:47 AM
libc/src/math/generic/range_reduction.h
82

This is actually a must, since there are inputs which makes k_hi < 2^54 and k_hi + k_lo > 2^54, causing rounding errors on the last integral bits due to rounding. If we work it out carefully and adjust a bit, we might be able to avoid these rounding errors. But for simplicity, I went with casting both to int64 in this patch.

This revision was landed with ongoing or failed builds.Jul 27 2022, 9:23 AM
This revision was automatically updated to reflect the committed changes.
orex added inline comments.Jul 27 2022, 9:29 AM
libc/src/math/generic/range_reduction.h
82

Sorry be intrusive, but below you convert the result to int. Does such behavior is covered by C/C++ standard? When you tries to push signed value to type which can't hold it...

lntue added inline comments.Jul 27 2022, 10:17 AM
libc/src/math/generic/range_reduction.h
82

I probably mixing up int and int64_t when moving things around. But it is well-defined as truncation for 2-complement representations, and guaranteed to be equal modulo 2^(bit size of int). So the end results are unchanged for us.

orex added inline comments.Jul 27 2022, 10:26 AM
libc/src/math/generic/range_reduction.h
82

Sorry again, but according to this https://en.cppreference.com/w/cpp/language/implicit_conversion it is well defined only in C++20, but as I remember well we are aiming to C++17.

lntue added inline comments.Jul 27 2022, 10:59 AM
libc/src/math/generic/range_reduction.h
82

Implementation-defined is not UB, so we are fine as long as clang and gcc have the same convention. But anyway, feel free to remove the implicit conversion to int below to make it strictly conform to C++17.

orex added inline comments.Jul 27 2022, 11:03 AM
libc/src/math/generic/range_reduction.h
82

If you know, that it is OK, I have no problem with it. Thank you.

santoshn added inline comments.Jul 27 2022, 11:05 AM
libc/src/math/generic/range_reduction.h
82

The primary thing that is needed for this range reduction is the last few bits of $k$. So a modulo operation or a mask will make the integer fit in uint32_t and avoid UB or implementation dependent behavior.