This is an archive of the discontinued LLVM Phabricator instance.

Paths

Table of Contentst

-
libc/
-
docs/
-
math.rst
-
src/math/generic/
-
math/
-
generic/
-
CMakeLists.txt
-
common_constants.h
-
common_constants.cpp
3/8
range_reduction.h
-
range_reduction_fma.h
-
sinf.cpp
-
utils/bazel/llvm-project-overlay/libc/
-
bazel/
-
llvm-project-overlay/
-
libc/
-
BUILD.bazel

Differential D130629

[libc] Change sinf range reduction to mod pi/16 to be shared with cosf.
ClosedPublic

Authored by lntue on Jul 27 2022, 5:34 AM.

Download Raw Diff

Details

Reviewers

michaelrj
sivachandra
orex
santoshn
zimmermann6

Commits

rG15b9380dfd4e: [libc] Change sinf range reduction to mod pi/16 to be shared with cosf.

Summary

Change sinf range reduction to mod pi/16 to be shared with cosf.

Previously, sinf used range reduction mod pi, but this cannot be used to implement cosf since the minimax algorithm for cosf does not converge due to critical points at pi/2. In order to be able to share the same range reduction functions for both sinf and cosf, we change the range reduction to mod pi/16 for the following reasons:

The table size is sufficiently small: 32 entries for sin(k * pi/16) with k = 0..31. It could be reduced to 16 entries if we treat the final sign separately, with an extra multiplication at the end.
The polynomials' degrees are reduced to 7/8 from 15, with extra computations to combine sin and cos with trig sum equality.
The number of exceptional cases reduced to 2 (with FMA) and 3 (without FMA).
The latency is reduced while maintaining similar throughput as before.

Diff Detail

Repository: rG LLVM Github Monorepo

Event Timeline

lntue created this revision.Jul 27 2022, 5:34 AM

Herald added projects: Restricted Project, Restricted Project. · View Herald TranscriptJul 27 2022, 5:34 AM

Herald added subscribers: ecnelises, tschuett, mgorny. · View Herald Transcript

lntue requested review of this revision.Jul 27 2022, 5:34 AM

Harbormaster completed remote builds in B177841: Diff 448011.Jul 27 2022, 5:39 AM

lntue edited the summary of this revision. (Show Details)Jul 27 2022, 5:44 AM

here are the timings I get:

zimmerma@biscotte:~/svn/core-math$ !273
LIBM=/localdisk/zimmerma/llvm-project/build/projects/libc/lib/libllvmlibc.a CORE_MATH_PERF_MODE=rdtsc ./perf.sh sinf
GNU libc version: 2.33
GNU libc release: release
16.784
23.823
14.114
zimmerma@biscotte:~/svn/core-math$ LIBM=/localdisk/zimmerma/llvm-project/build/projects/libc/lib/libllvmlibc.a CORE_MATH_PERF_MODE=rdtsc PERF_ARGS=--latency ./perf.sh sinf
GNU libc version: 2.33
GNU libc release: release
47.889
57.795
52.998

This revision is now accepted and ready to land.Jul 27 2022, 6:02 AM

orex added inline comments.Jul 27 2022, 8:38 AM

libc/src/math/generic/range_reduction.h
82	From my point of view, this line can be changed to `static_cast<int64_t>(k_hi + k_low)`, because `k_hi` and `k_low` are already "integer", so you can do one static cast instead of two. Probably it can increase performance.

lntue added inline comments.Jul 27 2022, 8:47 AM

libc/src/math/generic/range_reduction.h
82	This is actually a must, since there are inputs which makes `k_hi < 2^54` and `k_hi + k_lo > 2^54`, causing rounding errors on the last integral bits due to rounding. If we work it out carefully and adjust a bit, we might be able to avoid these rounding errors. But for simplicity, I went with casting both to int64 in this patch.

This revision was landed with ongoing or failed builds.Jul 27 2022, 9:23 AM

Closed by commit rG15b9380dfd4e: [libc] Change sinf range reduction to mod pi/16 to be shared with cosf. (authored by lntue). · Explain Why

This revision was automatically updated to reflect the committed changes.

lntue added a commit: rG15b9380dfd4e: [libc] Change sinf range reduction to mod pi/16 to be shared with cosf..

orex added inline comments.Jul 27 2022, 9:29 AM

libc/src/math/generic/range_reduction.h
82	Sorry be intrusive, but below you convert the result to `int`. Does such behavior is covered by C/C++ standard? When you tries to push signed value to type which can't hold it...

lntue added inline comments.Jul 27 2022, 10:17 AM

libc/src/math/generic/range_reduction.h
82	I probably mixing up `int` and `int64_t` when moving things around. But it is well-defined as truncation for 2-complement representations, and guaranteed to be equal modulo `2^(bit size of int)`. So the end results are unchanged for us.

orex added inline comments.Jul 27 2022, 10:26 AM

libc/src/math/generic/range_reduction.h
82	Sorry again, but according to this https://en.cppreference.com/w/cpp/language/implicit_conversion it is well defined only in C++20, but as I remember well we are aiming to C++17.

lntue added inline comments.Jul 27 2022, 10:59 AM

libc/src/math/generic/range_reduction.h
82	Implementation-defined is not UB, so we are fine as long as `clang` and `gcc` have the same convention. But anyway, feel free to remove the implicit conversion to `int` below to make it strictly conform to C++17.

orex added inline comments.Jul 27 2022, 11:03 AM

libc/src/math/generic/range_reduction.h
82	If you know, that it is OK, I have no problem with it. Thank you.

santoshn added inline comments.Jul 27 2022, 11:05 AM

libc/src/math/generic/range_reduction.h
82	The primary thing that is needed for this range reduction is the last few bits of $k$. So a modulo operation or a mask will make the integer fit in uint32_t and avoid UB or implementation dependent behavior.

Revision Contents

Path

Size

libc/

docs/

math.rst

2 lines

src/

math/

generic/

1 line

6 lines

17 lines

99 lines

range_reduction_fma.h

119 lines

sinf.cpp

130 lines

utils/

bazel/

llvm-project-overlay/

libc/

BUILD.bazel

1 line

Diff 448072

libc/docs/math.rst

	Show First 20 Lines • Show All 217 Lines • ▼ Show 20 Lines
	\| logf \| 12 \| 10 \| 56 \| 46 \| :math:`[e^{-1}, e]` \| Ryzen 1700 \| Ubuntu 20.04 LTS x86_64 \| Clang 12.0.0 \| FMA \|			\| logf \| 12 \| 10 \| 56 \| 46 \| :math:`[e^{-1}, e]` \| Ryzen 1700 \| Ubuntu 20.04 LTS x86_64 \| Clang 12.0.0 \| FMA \|
	+--------------+-----------+-------------------+-----------+-------------------+-------------------------------------+------------+-------------------------+--------------+---------------+			+--------------+-----------+-------------------+-----------+-------------------+-------------------------------------+------------+-------------------------+--------------+---------------+
	\| log10f \| 13 \| 25 \| 57 \| 72 \| :math:`[e^{-1}, e]` \| Ryzen 1700 \| Ubuntu 20.04 LTS x86_64 \| Clang 12.0.0 \| FMA \|			\| log10f \| 13 \| 25 \| 57 \| 72 \| :math:`[e^{-1}, e]` \| Ryzen 1700 \| Ubuntu 20.04 LTS x86_64 \| Clang 12.0.0 \| FMA \|
	+--------------+-----------+-------------------+-----------+-------------------+-------------------------------------+------------+-------------------------+--------------+---------------+			+--------------+-----------+-------------------+-----------+-------------------+-------------------------------------+------------+-------------------------+--------------+---------------+
	\| log1pf \| 16 \| 33 \| 61 \| 97 \| :math:`[e^{-0.5} - 1, e^{0.5} - 1]` \| Ryzen 1700 \| Ubuntu 20.04 LTS x86_64 \| Clang 12.0.0 \| FMA \|			\| log1pf \| 16 \| 33 \| 61 \| 97 \| :math:`[e^{-0.5} - 1, e^{0.5} - 1]` \| Ryzen 1700 \| Ubuntu 20.04 LTS x86_64 \| Clang 12.0.0 \| FMA \|
	+--------------+-----------+-------------------+-----------+-------------------+-------------------------------------+------------+-------------------------+--------------+---------------+			+--------------+-----------+-------------------+-----------+-------------------+-------------------------------------+------------+-------------------------+--------------+---------------+
	\| log2f \| 13 \| 10 \| 57 \| 46 \| :math:`[e^{-1}, e]` \| Ryzen 1700 \| Ubuntu 20.04 LTS x86_64 \| Clang 12.0.0 \| FMA \|			\| log2f \| 13 \| 10 \| 57 \| 46 \| :math:`[e^{-1}, e]` \| Ryzen 1700 \| Ubuntu 20.04 LTS x86_64 \| Clang 12.0.0 \| FMA \|
	+--------------+-----------+-------------------+-----------+-------------------+-------------------------------------+------------+-------------------------+--------------+---------------+			+--------------+-----------+-------------------+-----------+-------------------+-------------------------------------+------------+-------------------------+--------------+---------------+
	\| sinf \| 14 \| 26 \| 65 \| 59 \| :math:`[-\pi, \pi]` \| Ryzen 1700 \| Ubuntu 20.04 LTS x86_64 \| Clang 12.0.0 \| FMA \|			\| sinf \| 13 \| 25 \| 54 \| 57 \| :math:`[-\pi, \pi]` \| Ryzen 1700 \| Ubuntu 20.04 LTS x86_64 \| Clang 12.0.0 \| FMA \|
	+--------------+-----------+-------------------+-----------+-------------------+-------------------------------------+------------+-------------------------+--------------+---------------+			+--------------+-----------+-------------------+-----------+-------------------+-------------------------------------+------------+-------------------------+--------------+---------------+

	References			References
	==========			==========

	* `CRLIBM <https://hal-ens-lyon.archives-ouvertes.fr/ensl-01529804/file/crlibm.pdf>`_.			* `CRLIBM <https://hal-ens-lyon.archives-ouvertes.fr/ensl-01529804/file/crlibm.pdf>`_.
	* `RLIBM <https://people.cs.rutgers.edu/~sn349/rlibm/>`_.			* `RLIBM <https://people.cs.rutgers.edu/~sn349/rlibm/>`_.
	* `Sollya <https://www.sollya.org/>`_.			* `Sollya <https://www.sollya.org/>`_.
	* `The CORE-MATH Project <https://core-math.gitlabpages.inria.fr/>`_.			* `The CORE-MATH Project <https://core-math.gitlabpages.inria.fr/>`_.
	* `The GNU C Library (glibc) <https://www.gnu.org/software/libc/>`_.			* `The GNU C Library (glibc) <https://www.gnu.org/software/libc/>`_.
	* `The GNU MPFR Library <https://www.mpfr.org/>`_.			* `The GNU MPFR Library <https://www.mpfr.org/>`_.

libc/src/math/generic/CMakeLists.txt

Show First 20 Lines • Show All 73 Lines • ▼ Show 20 Lines	add_entrypoint_object(
sinf		sinf
SRCS		SRCS
sinf.cpp		sinf.cpp
HDRS		HDRS
../sinf.h		../sinf.h
range_reduction.h		range_reduction.h
range_reduction_fma.h		range_reduction_fma.h
DEPENDS		DEPENDS
		.common_constants
libc.include.math		libc.include.math
libc.src.errno.errno		libc.src.errno.errno
libc.src.__support.FPUtil.fputil		libc.src.__support.FPUtil.fputil
libc.src.__support.FPUtil.fma		libc.src.__support.FPUtil.fma
libc.src.__support.FPUtil.multiply_add		libc.src.__support.FPUtil.multiply_add
libc.src.__support.FPUtil.nearest_integer		libc.src.__support.FPUtil.nearest_integer
libc.src.__support.FPUtil.polyeval		libc.src.__support.FPUtil.polyeval
COMPILE_OPTIONS		COMPILE_OPTIONS
▲ Show 20 Lines • Show All 1,037 Lines • Show Last 20 Lines

libc/src/math/generic/common_constants.h

	Show All 25 Lines
	extern const double EXP_M1[195];			extern const double EXP_M1[195];

	// Lookup table for exp(m * 2^(-7)) with m = 0, ..., 127.			// Lookup table for exp(m * 2^(-7)) with m = 0, ..., 127.
	// Table is generated with Sollya as follow:			// Table is generated with Sollya as follow:
	// > display = hexadecimal;			// > display = hexadecimal;
	// > for i from 0 to 127 do { D(exp(i / 128)); };			// > for i from 0 to 127 do { D(exp(i / 128)); };
	extern const double EXP_M2[128];			extern const double EXP_M2[128];

				// Lookup table for sin(k * pi / 16) with k = 0, ..., 31.
				// Table is generated with Sollya as follow:
				// > display = hexadecimal;
				// > for k from 0 to 31 do { D(sin(k * pi/16)); };
				extern const double SIN_K_PI_OVER_16[32];

	} // namespace __llvm_libc			} // namespace __llvm_libc

	#endif // LLVM_LIBC_SRC_MATH_GENERIC_COMMON_CONSTANTS_H			#endif // LLVM_LIBC_SRC_MATH_GENERIC_COMMON_CONSTANTS_H

libc/src/math/generic/common_constants.cpp

Show First 20 Lines • Show All 220 Lines • ▼ Show 20 Lines	const double EXP_M2[128] = {
0x1.30aaa04e80d05p1, 0x1.330e587b62b28p1, 0x1.3576dce33feadp1,		0x1.30aaa04e80d05p1, 0x1.330e587b62b28p1, 0x1.3576dce33feadp1,
0x1.37e437282d4eep1, 0x1.3a5670ff972edp1, 0x1.3ccd9432682b4p1,		0x1.37e437282d4eep1, 0x1.3a5670ff972edp1, 0x1.3ccd9432682b4p1,
0x1.3f49aa9d30590p1, 0x1.41cabe304cb34p1, 0x1.4450d8f00edd4p1,		0x1.3f49aa9d30590p1, 0x1.41cabe304cb34p1, 0x1.4450d8f00edd4p1,
0x1.46dc04f4e5338p1, 0x1.496c4c6b832dap1, 0x1.4c01b9950a111p1,		0x1.46dc04f4e5338p1, 0x1.496c4c6b832dap1, 0x1.4c01b9950a111p1,
0x1.4e9c56c731f5dp1, 0x1.513c2e6c731d7p1, 0x1.53e14b042f9cap1,		0x1.4e9c56c731f5dp1, 0x1.513c2e6c731d7p1, 0x1.53e14b042f9cap1,
0x1.568bb722dd593p1, 0x1.593b7d72305bbp1,		0x1.568bb722dd593p1, 0x1.593b7d72305bbp1,
};		};

		// Lookup table for sin(k * pi / 16) with k = 0, ..., 31.
		// Table is generated with Sollya as follow:
		// > display = hexadecimal;
		// > for k from 0 to 31 do { D(sin(k * pi/16)); };
		const double SIN_K_PI_OVER_16[32] = {
		0x0.0000000000000p+0, 0x1.8f8b83c69a60bp-3, 0x1.87de2a6aea963p-2,
		0x1.1c73b39ae68c8p-1, 0x1.6a09e667f3bcdp-1, 0x1.a9b66290ea1a3p-1,
		0x1.d906bcf328d46p-1, 0x1.f6297cff75cb0p-1, 0x1.0000000000000p+0,
		0x1.f6297cff75cb0p-1, 0x1.d906bcf328d46p-1, 0x1.a9b66290ea1a3p-1,
		0x1.6a09e667f3bcdp-1, 0x1.1c73b39ae68c8p-1, 0x1.87de2a6aea963p-2,
		0x1.8f8b83c69a60bp-3, 0x0.0000000000000p+0, -0x1.8f8b83c69a60bp-3,
		-0x1.87de2a6aea963p-2, -0x1.1c73b39ae68c8p-1, -0x1.6a09e667f3bcdp-1,
		-0x1.a9b66290ea1a3p-1, -0x1.d906bcf328d46p-1, -0x1.f6297cff75cb0p-1,
		-0x1.0000000000000p+0, -0x1.f6297cff75cb0p-1, -0x1.d906bcf328d46p-1,
		-0x1.a9b66290ea1a3p-1, -0x1.6a09e667f3bcdp-1, -0x1.1c73b39ae68c8p-1,
		-0x1.87de2a6aea963p-2, -0x1.8f8b83c69a60bp-3};

} // namespace __llvm_libc		} // namespace __llvm_libc

libc/src/math/generic/range_reduction.h

	Show All 12 Lines
	#include "src/__support/FPUtil/except_value_utils.h"			#include "src/__support/FPUtil/except_value_utils.h"
	#include "src/__support/FPUtil/multiply_add.h"			#include "src/__support/FPUtil/multiply_add.h"
	#include "src/__support/FPUtil/nearest_integer.h"			#include "src/__support/FPUtil/nearest_integer.h"

	namespace __llvm_libc {			namespace __llvm_libc {

	namespace generic {			namespace generic {

	static constexpr uint32_t FAST_PASS_BOUND = 0x4c80'0000U; // 2^26			static constexpr uint32_t FAST_PASS_BOUND = 0x4a80'0000U; // 2^22

	static constexpr int N_ENTRIES = 8;			static constexpr int N_ENTRIES = 8;

	// We choose to split bits of 1/pi into 28-bit precision pieces, so that the			// We choose to split bits of 16/pi into 28-bit precision pieces, so that the
	// product of x * ONE_OVER_PI_28[i] is exact.			// product of x * SIXTEEN_OVER_PI_28[i] is exact.
	// These are generated by Sollya with:			// These are generated by Sollya with:
	// > a1 = D(round(1/pi, 28, RN)); a1;			// > a1 = D(round(16/pi, 28, RN)); a1;
	// > a2 = D(round(1/pi - a1, 28, RN)); a2;			// > a2 = D(round(16/pi - a1, 28, RN)); a2;
	// > a3 = D(round(1/pi - a1 - a2, 28, RN)); a3;			// > a3 = D(round(16/pi - a1 - a2, 28, RN)); a3;
	// > a4 = D(round(1/pi - a1 - a2 - a3, 28, RN)); a4;			// > a4 = D(round(16/pi - a1 - a2 - a3, 28, RN)); a4;
	// ...			// ...
	static constexpr double ONE_OVER_PI_28[N_ENTRIES] = {			static constexpr double SIXTEEN_OVER_PI_28[N_ENTRIES] = {
	0x1.45f306ep-2, -0x1.b1bbeaep-33, 0x1.3f84ebp-62, -0x1.7056592p-92,			0x1.45f306ep+2, -0x1.b1bbeaep-29, 0x1.3f84ebp-58, -0x1.7056592p-88,
	0x1.c0db62ap-121, -0x1.4cd8778p-150, -0x1.bef806cp-179, 0x1.63abdecp-209};			0x1.c0db62ap-117, -0x1.4cd8778p-146, -0x1.bef806cp-175, 0x1.63abdecp-205};

	// Exponents of the least significant bits of the corresponding entries in			// Exponents of the least significant bits of the corresponding entries in
	// ONE_OVER_PI_28.			// SIXTEEN_OVER_PI_28.
	static constexpr int ONE_OVER_PI_28_LSB_EXP[N_ENTRIES] = {			static constexpr int SIXTEEN_OVER_PI_28_LSB_EXP[N_ENTRIES] = {
	-29, -60, -86, -119, -148, -175, -205, -235};			-25, -56, -82, -115, -144, -171, -201, -231};

	// Return (k mod 2) and y, where			// Return k and y, where
	// k = round(x / pi) and y = (x / pi) - k.			// k = round(x * 16 / pi) and y = (x * 16 / pi) - k.
	static inline int64_t small_range_reduction(double x, double &y) {			static inline int64_t small_range_reduction(double x, double &y) {
	double prod = x * ONE_OVER_PI_28[0];			double prod = x * SIXTEEN_OVER_PI_28[0];
	double kd = fputil::nearest_integer(prod);			double kd = fputil::nearest_integer(prod);
	y = prod - kd;			y = prod - kd;
	y = fputil::multiply_add(x, ONE_OVER_PI_28[1], y);			y = fputil::multiply_add(x, SIXTEEN_OVER_PI_28[1], y);
	y = fputil::multiply_add(x, ONE_OVER_PI_28[2], y);			y = fputil::multiply_add(x, SIXTEEN_OVER_PI_28[2], y);
	return static_cast<int64_t>(kd);			return static_cast<int64_t>(kd);
	}			}

	// Return k and y, where			// Return k and y, where
	// k = round(x / pi) and y = (x / pi) - k.			// k = round(x * 16 / pi) and y = (x * 16 / pi) - k.
	// For large range, there are at most 2 parts of ONE_OVER_PI_28 contributing to			// For large range, there are at most 2 parts of SIXTEEN_OVER_PI_28 contributing
	// the unit binary digit (k & 1). If the least significant bit of x * the least			// to the lowest 5 binary digits (k & 31). If the least significant bit of
	// significant bit of ONE_OVER_PI_28[i] > 1, we can completely ignore			// x * the least significant bit of SIXTEEN_OVER_PI_28[i] >= 32, we can
	// ONE_OVER_PI_28[i].			// completely ignore SIXTEEN_OVER_PI_28[i].
	static inline int64_t large_range_reduction(double x, int x_exp, double &y) {			static inline int64_t large_range_reduction(double x, int x_exp, double &y) {
	int idx = 0;			int idx = 0;
	y = 0;			y = 0;
	int x_lsb_exp = x_exp - fputil::FloatProperties<float>::MANTISSA_WIDTH;			int x_lsb_exp_m4 = x_exp - fputil::FloatProperties<float>::MANTISSA_WIDTH;

	// Skipping the first parts of 1/pi such that:			// Skipping the first parts of 16/pi such that:
	// LSB of x * LSB of ONE_OVER_PI_28[i] > 1.			// LSB of x * LSB of SIXTEEN_OVER_PI_28[i] >= 32.
	while (x_lsb_exp + ONE_OVER_PI_28_LSB_EXP[idx] > 0)			while (x_lsb_exp_m4 + SIXTEEN_OVER_PI_28_LSB_EXP[idx] > 4)
	++idx;			++idx;

	double prod_hi = x * ONE_OVER_PI_28[idx];			double prod_hi = x * SIXTEEN_OVER_PI_28[idx];
	// Get the integral part of x * ONE_OVER_PI_28[idx]			// Get the integral part of x * SIXTEEN_OVER_PI_28[idx]
	double k_hi = fputil::nearest_integer(prod_hi);			double k_hi = fputil::nearest_integer(prod_hi);
	// Get the fractional part of x * ONE_OVER_PI_28[idx]			// Get the fractional part of x * SIXTEEN_OVER_PI_28[idx]
	double frac = prod_hi - k_hi;			double frac = prod_hi - k_hi;
	double prod_lo = fputil::multiply_add(x, ONE_OVER_PI_28[idx + 1], frac);			double prod_lo = fputil::multiply_add(x, SIXTEEN_OVER_PI_28[idx + 1], frac);
	double k_lo = fputil::nearest_integer(prod_lo);			double k_lo = fputil::nearest_integer(prod_lo);

	// Now y is the fractional parts.			// Now y is the fractional parts.
	y = prod_lo - k_lo;			y = prod_lo - k_lo;
	y = fputil::multiply_add(x, ONE_OVER_PI_28[idx + 2], y);			y = fputil::multiply_add(x, SIXTEEN_OVER_PI_28[idx + 2], y);
	y = fputil::multiply_add(x, ONE_OVER_PI_28[idx + 3], y);			y = fputil::multiply_add(x, SIXTEEN_OVER_PI_28[idx + 3], y);

	return static_cast<int64_t>(k_hi + k_lo);			return static_cast<int64_t>(k_hi) + static_cast<int64_t>(k_lo);
				orexUnsubmitted Not Done Reply Inline Actions From my point of view, this line can be changed to `static_cast<int64_t>(k_hi + k_low)`, because `k_hi` and `k_low` are already "integer", so you can do one static cast instead of two. Probably it can increase performance. orex: From my point of view, this line can be changed to `static_cast<int64_t>(k_hi + k_low)`…
				lntueAuthorUnsubmitted Done Reply Inline Actions This is actually a must, since there are inputs which makes `k_hi < 2^54` and `k_hi + k_lo > 2^54`, causing rounding errors on the last integral bits due to rounding. If we work it out carefully and adjust a bit, we might be able to avoid these rounding errors. But for simplicity, I went with casting both to int64 in this patch. lntue: This is actually a must, since there are inputs which makes `k_hi < 2^54` and `k_hi + k_lo >…
				orexUnsubmitted Not Done Reply Inline Actions Sorry be intrusive, but below you convert the result to `int`. Does such behavior is covered by C/C++ standard? When you tries to push signed value to type which can't hold it... orex: Sorry be intrusive, but below you convert the result to `int`. Does such behavior is covered by…
				lntueAuthorUnsubmitted Done Reply Inline Actions I probably mixing up `int` and `int64_t` when moving things around. But it is well-defined as truncation for 2-complement representations, and guaranteed to be equal modulo `2^(bit size of int)`. So the end results are unchanged for us. lntue: I probably mixing up `int` and `int64_t` when moving things around. But it is well-defined as…
				orexUnsubmitted Not Done Reply Inline Actions Sorry again, but according to this https://en.cppreference.com/w/cpp/language/implicit_conversion it is well defined only in C++20, but as I remember well we are aiming to C++17. orex: Sorry again, but according to this https://en.cppreference.
				lntueAuthorUnsubmitted Done Reply Inline Actions Implementation-defined is not UB, so we are fine as long as `clang` and `gcc` have the same convention. But anyway, feel free to remove the implicit conversion to `int` below to make it strictly conform to C++17. lntue: Implementation-defined is not UB, so we are fine as long as `clang` and `gcc` have the same…
				orexUnsubmitted Not Done Reply Inline Actions If you know, that it is OK, I have no problem with it. Thank you. orex: If you know, that it is OK, I have no problem with it. Thank you.
				santoshnUnsubmitted Not Done Reply Inline Actions The primary thing that is needed for this range reduction is the last few bits of $k$. So a modulo operation or a mask will make the integer fit in uint32_t and avoid UB or implementation dependent behavior. santoshn: The primary thing that is needed for this range reduction is the last few bits of $k$. So a…
	}			}

	// Exceptional cases.			// Exceptional cases.
	static constexpr int N_EXCEPT_SMALL = 4;			static constexpr int N_EXCEPTS = 3;

	static constexpr fputil::ExceptionalValues<float, N_EXCEPT_SMALL> SmallExcepts{			static constexpr fputil::ExceptionalValues<float, N_EXCEPTS> SinfExcepts{
	/* inputs */ {			/* inputs */ {
	0x3fa7832a, // x = 0x1.4f0654p0			0x3fa7832a, // x = 0x1.4f0654p0
	0x46199998, // x = 0x1.33333p13			0x46199998, // x = 0x1.33333p13
	0x4afdece4, // x = 0x1.fbd9c8p22			0x55cafb2a, // x = 0x1.95f654p44
	0x4c2332e9, // x = 0x1.4665d2p25
	},			},
	/* outputs (RZ, RU offset, RD offset, RN offset) */			/* outputs (RZ, RU offset, RD offset, RN offset) */
	{			{
	{0x3f7741b5, 1, 0, 1}, // x = 0x1.4f0654p0, sin(x) = 0x1.ee836ap-1 (RZ)			{0x3f7741b5, 1, 0, 1}, // x = 0x1.4f0654p0, sin(x) = 0x1.ee836ap-1 (RZ)
	{0xbeb1fa5d, 0, 1, 0}, // x = 0x1.33333p13, sin(x) = -0x1.63f4bap-2 (RZ)			{0xbeb1fa5d, 0, 1, 0}, // x = 0x1.33333p13, sin(x) = -0x1.63f4bap-2 (RZ)
	{0xbf7fb6e0, 0, 1, 1}, // x = 0x1.fbd9c8p22, sin(x) = -0x1.ff6dcp-1 (RZ)
	{0xbf7fffff, 0, 1,
	1}, // x = 0x1.4665d2p25, sin(x) = -0x1.fffffep-1 (RZ)
	}};

	static constexpr int N_EXCEPT_LARGE = 5;

	static constexpr fputil::ExceptionalValues<float, N_EXCEPT_LARGE> LargeExcepts{
	/* inputs */ {
	0x523947f6, // x = 0x1.728fecp37
	0x53b146a6, // x = 0x1.628d4cp40
	0x55cafb2a, // x = 0x1.95f654p44
	0x6a1976f1, // x = 0x1.32ede2p85
	0x77584625, // x = 0x1.b08c4ap111
	},
	/* outputs (RZ, RU offset, RD offset, RN offset) */
	{
	{0xbf12791d, 0, 1,
	1}, // x = 0x1.728fecp37, sin(x) = -0x1.24f23ap-1 (RZ)
	{0xbf7fffff, 0, 1,
	1}, // x = 0x1.628d4cp40, sin(x) = -0x1.fffffep-1 (RZ)
	{0xbf7e7a16, 0, 1,			{0xbf7e7a16, 0, 1,
	1}, // x = 0x1.95f654p44, sin(x) = -0x1.fcf42cp-1 (RZ)			1}, // x = 0x1.95f654p44, sin(x) = -0x1.fcf42cp-1 (RZ)
	{0x3f7fffff, 1, 0, 1}, // x = 0x1.32ede2p85, sin(x) = 0x1.fffffep-1 (RZ)
	{0xbf7fffff, 0, 1,
	1}, // x = 0x1.b08c4ap111, sin(x) = -0x1.fffffep-1 (RZ)
	}};			}};

	} // namespace generic			} // namespace generic

	} // namespace __llvm_libc			} // namespace __llvm_libc

	#endif // LLVM_LIBC_SRC_MATH_GENERIC_RANGE_REDUCTION_H			#endif // LLVM_LIBC_SRC_MATH_GENERIC_RANGE_REDUCTION_H

libc/src/math/generic/range_reduction_fma.h

	Show All 12 Lines
	#include "src/__support/FPUtil/FPBits.h"			#include "src/__support/FPUtil/FPBits.h"
	#include "src/__support/FPUtil/except_value_utils.h"			#include "src/__support/FPUtil/except_value_utils.h"
	#include "src/__support/FPUtil/nearest_integer.h"			#include "src/__support/FPUtil/nearest_integer.h"

	namespace __llvm_libc {			namespace __llvm_libc {

	namespace fma {			namespace fma {

	static constexpr uint32_t FAST_PASS_BOUND = 0x5880'0000U; // 2^50			static constexpr uint32_t FAST_PASS_BOUND = 0x5680'0000U; // 2^46

	// Digits of 1/pi, generated by Sollya with:			// Digits of 1/pi, generated by Sollya with:
	// > a0 = D(1/pi);			// > a0 = D(16/pi);
	// > a1 = D(1/pi - a0);			// > a1 = D(16/pi - a0);
	// > a2 = D(1/pi - a0 - a1);			// > a2 = D(16/pi - a0 - a1);
	// > a3 = D(1/pi - a0 - a1 - a2);			// > a3 = D(16/pi - a0 - a1 - a2);
	static constexpr double ONE_OVER_PI[5] = {			static constexpr double SIXTEEN_OVER_PI[5] = {
	0x1.45f306dc9c883p-2, -0x1.6b01ec5417056p-56, -0x1.6447e493ad4cep-110,			0x1.45f306dc9c883p+2, -0x1.6b01ec5417056p-52, -0x1.6447e493ad4cep-106,
	0x1.e21c820ff28b2p-164, -0x1.508510ea79237p-219};			0x1.e21c820ff28b2p-160, -0x1.508510ea79237p-215};

	// Return k and y, where			// Return k and y, where
	// k = round(x / pi) and y = (x / pi) - k.			// k = round(x * 16 / pi) and y = (x * 16 / pi) - k.
	// Assume x is non-negative.			// Assume x is non-negative.
	static inline int64_t small_range_reduction(double x, double &y) {			static inline int64_t small_range_reduction(double x, double &y) {
	double kd = fputil::nearest_integer(x * ONE_OVER_PI[0]);			double kd = fputil::nearest_integer(x * SIXTEEN_OVER_PI[0]);
	y = fputil::fma(x, ONE_OVER_PI[0], -kd);			y = fputil::fma(x, SIXTEEN_OVER_PI[0], -kd);
	y = fputil::fma(x, ONE_OVER_PI[1], y);			y = fputil::fma(x, SIXTEEN_OVER_PI[1], y);
	return static_cast<int64_t>(kd);			return static_cast<int64_t>(kd);
	}			}

	// Return k and y, where			// Return k and y, where
	// k = round(x / pi) and y = (x / pi) - k.			// k = round(x * 16 / pi) and y = (x * 16 / pi) - k.
	static inline int64_t large_range_reduction(double x, int x_exp, double &y) {			static inline int64_t large_range_reduction(double x, int x_exp, double &y) {
	// 2^50 <= \|x\| < 2^104			// 2^46 <= \|x\| < 2^99
	if (x_exp < 103) {			if (x_exp < 99) {
	// - When x < 2^104, the unit bit is contained in the full exact product of			// - When x < 2^99, the full exact product of x * SIXTEEN_OVER_PI[0]
	// x * ONE_OVER_PI[0].			// contains at least one integral bit <= 2^4.
	// - When 2^50 <= \|x\| < 2^55, the unit bit is contained			// - When 2^46 <= \|x\| < 2^56, the lowest 5 unit bits are contained
	// in the last 8 bits of double(x * ONE_OVER_PI[0]).			// in the last 10 bits of double(x * SIXTEEN_OVER_PI[0]).
	// - When \|x\| >= 2^55, the LSB of double(x * ONE_OVER_PI[0]) is at least 2.			// - When \|x\| >= 2^56, the LSB of double(x * SIXTEEN_OVER_PI[0]) is at least
	fputil::FPBits<double> prod_hi(x * ONE_OVER_PI[0]);			// 32.
	prod_hi.bits &= (x_exp < 55) ? (~0xffULL) : (~0ULL); // \|x\| < 2^55			fputil::FPBits<double> prod_hi(x * SIXTEEN_OVER_PI[0]);
				prod_hi.bits &= (x_exp < 56) ? (~0xfffULL) : (~0ULL); // \|x\| < 2^56
	double k_hi = fputil::nearest_integer(static_cast<double>(prod_hi));			double k_hi = fputil::nearest_integer(static_cast<double>(prod_hi));
	double truncated_prod = fputil::fma(x, ONE_OVER_PI[0], -k_hi);			double truncated_prod = fputil::fma(x, SIXTEEN_OVER_PI[0], -k_hi);
	double prod_lo = fputil::fma(x, ONE_OVER_PI[1], truncated_prod);			double prod_lo = fputil::fma(x, SIXTEEN_OVER_PI[1], truncated_prod);
	double k_lo = fputil::nearest_integer(prod_lo);			double k_lo = fputil::nearest_integer(prod_lo);
	y = fputil::fma(x, ONE_OVER_PI[1], truncated_prod - k_lo);			y = fputil::fma(x, SIXTEEN_OVER_PI[1], truncated_prod - k_lo);
	y = fputil::fma(x, ONE_OVER_PI[2], y);			y = fputil::fma(x, SIXTEEN_OVER_PI[2], y);
	y = fputil::fma(x, ONE_OVER_PI[3], y);			y = fputil::fma(x, SIXTEEN_OVER_PI[3], y);

	return static_cast<int64_t>(k_lo);			return static_cast<int64_t>(k_lo);
	}			}

	// - When x >= 2^104, the full exact product of x * ONE_OVER_PI[0] does not			// - When x >= 2^110, the full exact product of x * SIXTEEN_OVER_PI[0] does
	// contain the unit bit, so we can ignore it completely.			// not contain any of the lowest 5 unit bits, so we can ignore it completely.
	// - When 2^104 <= \|x\| < 2^109, the unit bit is contained			// - When 2^99 <= \|x\| < 2^110, the lowest 5 unit bits are contained
	// in the last 8 bits of double(x * ONE_OVER_PI[1]).			// in the last 12 bits of double(x * SIXTEEN_OVER_PI[1]).
	// - When \|x\| >= 2^109, the LSB of double(x * ONE_OVER_PI[1]) is at least 2.			// - When \|x\| >= 2^110, the LSB of double(x * SIXTEEN_OVER_PI[1]) is at
	fputil::FPBits<double> prod_hi(x * ONE_OVER_PI[1]);			// least 32.
	prod_hi.bits &= (x_exp < 109) ? (~0xffULL) : (~0ULL); // \|x\| < 2^55			fputil::FPBits<double> prod_hi(x * SIXTEEN_OVER_PI[1]);
				prod_hi.bits &= (x_exp < 110) ? (~0xfffULL) : (~0ULL); // \|x\| < 2^110
	double k_hi = fputil::nearest_integer(static_cast<double>(prod_hi));			double k_hi = fputil::nearest_integer(static_cast<double>(prod_hi));
	double truncated_prod = fputil::fma(x, ONE_OVER_PI[1], -k_hi);			double truncated_prod = fputil::fma(x, SIXTEEN_OVER_PI[1], -k_hi);
	double prod_lo = fputil::fma(x, ONE_OVER_PI[2], truncated_prod);			double prod_lo = fputil::fma(x, SIXTEEN_OVER_PI[2], truncated_prod);
	double k_lo = fputil::nearest_integer(prod_lo);			double k_lo = fputil::nearest_integer(prod_lo);
	y = fputil::fma(x, ONE_OVER_PI[2], truncated_prod - k_lo);			y = fputil::fma(x, SIXTEEN_OVER_PI[2], truncated_prod - k_lo);
	y = fputil::fma(x, ONE_OVER_PI[3], y);			y = fputil::fma(x, SIXTEEN_OVER_PI[3], y);
	y = fputil::fma(x, ONE_OVER_PI[4], y);			y = fputil::fma(x, SIXTEEN_OVER_PI[4], y);

	return static_cast<int64_t>(k_lo);			return static_cast<int64_t>(k_lo);
	}			}

	// Exceptional cases.			// Exceptional cases.
	static constexpr int N_EXCEPT_SMALL = 9;			static constexpr int N_EXCEPTS = 2;

	static constexpr fputil::ExceptionalValues<float, N_EXCEPT_SMALL> SmallExcepts{			static constexpr fputil::ExceptionalValues<float, N_EXCEPTS> SinfExcepts{
	/* inputs */ {			/* inputs */ {
	0x3fa7832a, // x = 0x1.4f0654p0			0x3fa7832a, // x = 0x1.4f0654p0
	0x40171973, // x = 0x1.2e32e6p1
	0x4096cbe4, // x = 0x1.2d97c8p2
	0x433b7490, // x = 0x1.76e92p7
	0x437ce5f1, // x = 0x1.f9cbe2p7
	0x46199998, // x = 0x1.33333p13
	0x474d246f, // x = 0x1.9a48dep15
	0x4afdece4, // x = 0x1.fbd9c8p22
	0x55cafb2a, // x = 0x1.95f654p44			0x55cafb2a, // x = 0x1.95f654p44
	},			},
	/* outputs (RZ, RU offset, RD offset, RN offset) */			/* outputs (RZ, RU offset, RD offset, RN offset) */
	{			{
	{0x3f7741b5, 1, 0, 1}, // x = 0x1.4f0654p0, sin(x) = 0x1.ee836ap-1 (RZ)			{0x3f7741b5, 1, 0, 1}, // x = 0x1.4f0654p0, sin(x) = 0x1.ee836ap-1 (RZ)
	{0x3f34290f, 1, 0, 1}, // x = 0x1.2e32e6p1, sin(x) = 0x1.68521ep-1 (RZ)
	{0xbf7fffff, 0, 1, 1}, // x = 0x1.2d97c8p2, sin(x) = -0x1.fffffep-1 (RZ)
	{0xbf5cce62, 0, 1, 0}, // x = 0x1.76e92p7, sin(x) = -0x1.b99cc4p-1 (RZ)
	{0x3f7fffff, 1, 0, 1}, // x = 0x1.f9cbe2p7, sin(x) = 0x1.fffffep-1 (RZ)
	{0xbeb1fa5d, 0, 1, 0}, // x = 0x1.33333p13, sin(x) = -0x1.63f4bap-2 (RZ)
	{0x3f7fffff, 1, 0, 1}, // x = 0x1.9a48dep15, sin(x) = 0x1.fffffep-1 (RZ)
	{0xbf7fb6e0, 0, 1, 1}, // x = 0x1.fbd9c8p22, sin(x) = -0x1.ff6dcp-1 (RZ)
	{0xbf7e7a16, 0, 1,			{0xbf7e7a16, 0, 1,
	1}, // x = 0x1.95f654p44, sin(x) = -0x1.fcf42cp-1 (RZ)			1}, // x = 0x1.95f654p44, sin(x) = -0x1.fcf42cp-1 (RZ)
	}};			}};

	static constexpr int N_EXCEPT_LARGE = 5;

	static constexpr fputil::ExceptionalValues<float, N_EXCEPT_LARGE> LargeExcepts{
	/* inputs */ {
	0x5ebcfdde, // x = 0x1.79fbbcp62
	0x5fa6eba7, // x = 0x1.4dd74ep64
	0x6386134e, // x = 0x1.0c269cp72
	0x6a1976f1, // x = 0x1.32ede2p85
	0x727669d4, // x = 0x1.ecd3a8p101
	},
	/* outputs (RZ, RU offset, RD offset, RN offset) */
	{
	{0x3f50622d, 1, 0, 0}, // x = 0x1.79fbbcp62, sin(x) = 0x1.a0c45ap-1 (RZ)
	{0xbe52464a, 0, 1,
	0}, // x = 0x1.4dd74ep64, sin(x) = -0x1.a48c94p-3 (RZ)
	{0x3f7cb2e7, 1, 0, 0}, // x = 0x1.0c269cp72, sin(x) = 0x1.f965cep-1 (RZ)
	{0x3f7fffff, 1, 0, 1}, // x = 0x1.32ede2p85, sin(x) = 0x1.fffffep-1 (RZ)
	{0xbf7a781d, 0, 1,
	0}, // x = 0x1.ecd3a8p101, sin(x) = -0x1.f4f038p-1 (RZ)
	}};

	} // namespace fma			} // namespace fma

	} // namespace __llvm_libc			} // namespace __llvm_libc

	#endif // LLVM_LIBC_SRC_MATH_GENERIC_RANGE_REDUCTION_FMA_H			#endif // LLVM_LIBC_SRC_MATH_GENERIC_RANGE_REDUCTION_FMA_H

libc/src/math/generic/sinf.cpp

//===-- Single-precision sin function -------------------------------------===//		//===-- Single-precision sin function -------------------------------------===//
//		//
// Part of the LLVM Project, under the Apache License v2.0 with LLVM Exceptions.		// Part of the LLVM Project, under the Apache License v2.0 with LLVM Exceptions.
// See https://llvm.org/LICENSE.txt for license information.		// See https://llvm.org/LICENSE.txt for license information.
// SPDX-License-Identifier: Apache-2.0 WITH LLVM-exception		// SPDX-License-Identifier: Apache-2.0 WITH LLVM-exception
//		//
//===----------------------------------------------------------------------===//		//===----------------------------------------------------------------------===//

#include "src/math/sinf.h"		#include "src/math/sinf.h"
		#include "common_constants.h"
#include "src/__support/FPUtil/BasicOperations.h"		#include "src/__support/FPUtil/BasicOperations.h"
#include "src/__support/FPUtil/FEnvImpl.h"		#include "src/__support/FPUtil/FEnvImpl.h"
#include "src/__support/FPUtil/FPBits.h"		#include "src/__support/FPUtil/FPBits.h"
#include "src/__support/FPUtil/PolyEval.h"		#include "src/__support/FPUtil/PolyEval.h"
#include "src/__support/FPUtil/except_value_utils.h"		#include "src/__support/FPUtil/except_value_utils.h"
#include "src/__support/FPUtil/multiply_add.h"		#include "src/__support/FPUtil/multiply_add.h"
#include "src/__support/common.h"		#include "src/__support/common.h"

#include <errno.h>		#include <errno.h>

#if defined(LIBC_TARGET_HAS_FMA)		#if defined(LIBC_TARGET_HAS_FMA)
#include "range_reduction_fma.h"		#include "range_reduction_fma.h"
// using namespace __llvm_libc::fma;		// using namespace __llvm_libc::fma;
using __llvm_libc::fma::FAST_PASS_BOUND;		using __llvm_libc::fma::FAST_PASS_BOUND;
using __llvm_libc::fma::large_range_reduction;		using __llvm_libc::fma::large_range_reduction;
using __llvm_libc::fma::LargeExcepts;		using __llvm_libc::fma::N_EXCEPTS;
using __llvm_libc::fma::N_EXCEPT_LARGE;		using __llvm_libc::fma::SinfExcepts;
using __llvm_libc::fma::N_EXCEPT_SMALL;
using __llvm_libc::fma::small_range_reduction;		using __llvm_libc::fma::small_range_reduction;
using __llvm_libc::fma::SmallExcepts;
#else		#else
#include "range_reduction.h"		#include "range_reduction.h"
// using namespace __llvm_libc::generic;		// using namespace __llvm_libc::generic;
using __llvm_libc::generic::FAST_PASS_BOUND;		using __llvm_libc::generic::FAST_PASS_BOUND;
using __llvm_libc::generic::large_range_reduction;		using __llvm_libc::generic::large_range_reduction;
using __llvm_libc::generic::LargeExcepts;		using __llvm_libc::generic::N_EXCEPTS;
using __llvm_libc::generic::N_EXCEPT_LARGE;		using __llvm_libc::generic::SinfExcepts;
using __llvm_libc::generic::N_EXCEPT_SMALL;
using __llvm_libc::generic::small_range_reduction;		using __llvm_libc::generic::small_range_reduction;
using __llvm_libc::generic::SmallExcepts;
#endif		#endif

namespace __llvm_libc {		namespace __llvm_libc {

LLVM_LIBC_FUNCTION(float, sinf, (float x)) {		LLVM_LIBC_FUNCTION(float, sinf, (float x)) {
using FPBits = typename fputil::FPBits<float>;		using FPBits = typename fputil::FPBits<float>;
FPBits xbits(x);		FPBits xbits(x);

uint32_t x_u = xbits.uintval();		uint32_t x_u = xbits.uintval();
uint32_t x_abs = x_u & 0x7fff'ffffU;		uint32_t x_abs = x_u & 0x7fff'ffffU;
double xd, y;		double xd = static_cast<double>(x);

// Range reduction:		// Range reduction:
// For \|x\| > pi/16, we perform range reduction as follows:		// For \|x\| > pi/16, we perform range reduction as follows:
// Find k and y such that:		// Find k and y such that:
// x = (k + y) * pi		// x = (k + y) * pi/16
// k is an integer		// k is an integer
// \|y\| < 0.5		// \|y\| < 0.5
// For small range (\|x\| < 2^50 when FMA instructions are available, 2^26		// For small range (\|x\| < 2^46 when FMA instructions are available, 2^22
// otherwise), this is done by performing:		// otherwise), this is done by performing:
// k = round(x * 1/pi)		// k = round(x * 16/pi)
// y = x * 1/pi - k		// y = x * 16/pi - k
// For large range, we will omit all the higher parts of 1/pi such that the		// For large range, we will omit all the higher parts of 16/pi such that the
// least significant bits of their full products with x are larger than 1,		// least significant bits of their full products with x are larger than 31,
// since sin(x + i * 2pi) = sin(x).		// since sin((k + y + 32i) pi/16) = sin(x + i * 2pi) = sin(x).
//		//
// When FMA instructions are not available, we store the digits of 1/pi in		// When FMA instructions are not available, we store the digits of 16/pi in
// chunks of 28-bit precision. This will make sure that the products:		// chunks of 28-bit precision. This will make sure that the products:
// x * ONE_OVER_PI_28[i] are all exact.		// x * SIXTEEN_OVER_PI_28[i] are all exact.
// When FMA instructions are available, we simply store the digits of 1/pi in		// When FMA instructions are available, we simply store the digits of 16/pi in
// chunks of doubles (53-bit of precision).		// chunks of doubles (53-bit of precision).
// So when multiplying by the largest values of single precision, the		// So when multiplying by the largest values of single precision, the
// resulting output should be correct up to 2^(-208 + 128) ~ 2^-80. By the		// resulting output should be correct up to 2^(-208 + 128) ~ 2^-80. By the
// worst-case analysis of range reduction, \|y\| >= 2^-38, so this should give		// worst-case analysis of range reduction, \|y\| >= 2^-38, so this should give
// us more than 40 bits of accuracy. For the worst-case estimation of range		// us more than 40 bits of accuracy. For the worst-case estimation of range
// reduction, see for instances:		// reduction, see for instances:
// Elementary Functions by J-M. Muller, Chapter 11,		// Elementary Functions by J-M. Muller, Chapter 11,
// Handbook of Floating-Point Arithmetic by J-M. Muller et. al.,		// Handbook of Floating-Point Arithmetic by J-M. Muller et. al.,
// Chapter 10.2.		// Chapter 10.2.
//		//
// Once k and y are computed, we then deduce the answer by the sine of sum		// Once k and y are computed, we then deduce the answer by the sine of sum
// formula:		// formula:
// sin(x) = sin((k + y)*pi)		// sin(x) = sin((k + y)*pi/16)
// = sin(ypi) cos(kpi) + cos(ypi) * sin(k*pi)		// = sin(ypi/16) cos(kpi/16) + cos(ypi/16) * sin(k*pi/16)
// = (-1)^(k & 1) * sin(y*pi)		// The values of sin(kpi/16) and cos(kpi/16) for k = 0..31 are precomputed
// ~ (-1)^(k & 1) * y * P(y^2)		// and stored using a vector of 32 doubles. Sin(ypi/16) and cos(ypi/16) are
// where y*P(y^2) is a degree-15 minimax polynomial generated by Sollya		// computed using degree-7 and degree-8 minimax polynomials generated by
// with: > Q = fpminimax(sin(x*pi)/x, [\|0, 2, 4, 6, 8, 10, 12, 14\|],		// Sollya respectively.
// [\|D...\|], [0, 0.5]);

// \|x\| <= pi/16		// \|x\| <= pi/16
if (x_abs <= 0x3e49'0fdbU) {		if (unlikely(x_abs <= 0x3e49'0fdbU)) {
xd = static_cast<double>(x);

// \|x\| < 0x1.d12ed2p-12f		// \|x\| < 0x1.d12ed2p-12f
if (x_abs < 0x39e8'9769U) {		if (unlikely(x_abs < 0x39e8'9769U)) {
if (unlikely(x_abs == 0U)) {		if (unlikely(x_abs == 0U)) {
// For signed zeros.		// For signed zeros.
return x;		return x;
}		}
// When \|x\| < 2^-12, the relative error of the approximation sin(x) ~ x		// When \|x\| < 2^-12, the relative error of the approximation sin(x) ~ x
// is:		// is:
// \|sin(x) - x\| / \|sin(x)\| < \|x^3\| / (6\|x\|)		// \|sin(x) - x\| / \|sin(x)\| < \|x^3\| / (6\|x\|)
// = x^2 / 6		// = x^2 / 6
Show All 31 Lines	#endif // LIBC_TARGET_HAS_FMA
// > display = hexadecimal;		// > display = hexadecimal;
// > Q = fpminimax(sin(x)/x, [\|0, 2, 4, 6, 8\|], [\|1, D...\|], [0, pi/16]);		// > Q = fpminimax(sin(x)/x, [\|0, 2, 4, 6, 8\|], [\|1, D...\|], [0, pi/16]);
double result =		double result =
fputil::polyeval(xsq, 1.0, -0x1.55555555554c6p-3, 0x1.1111111085e65p-7,		fputil::polyeval(xsq, 1.0, -0x1.55555555554c6p-3, 0x1.1111111085e65p-7,
-0x1.a019f70fb4d4fp-13, 0x1.718d179815e74p-19);		-0x1.a019f70fb4d4fp-13, 0x1.718d179815e74p-19);
return xd * result;		return xd * result;
}		}

bool x_sign = xbits.get_sign();		using ExceptChecker = typename fputil::ExceptionChecker<float, N_EXCEPTS>;

int64_t k;
xd = static_cast<double>(x);

if (x_abs < FAST_PASS_BOUND) {
using ExceptChecker =
typename fputil::ExceptionChecker<float, N_EXCEPT_SMALL>;
{		{
float result;		float result;
if (ExceptChecker::check_odd_func(SmallExcepts, x_abs, x_sign, result)) {		if (ExceptChecker::check_odd_func(SinfExcepts, x_abs, xbits.get_sign(),
		result))
return result;		return result;
}		}
}

		int k;
		double y;

		if (likely(x_abs < FAST_PASS_BOUND)) {
k = small_range_reduction(xd, y);		k = small_range_reduction(xd, y);
} else {		} else {
// x is inf or nan.		// x is inf or nan.
if (unlikely(x_abs >= 0x7f80'0000U)) {		if (unlikely(x_abs >= 0x7f80'0000U)) {
if (x_abs == 0x7f80'0000U) {		if (x_abs == 0x7f80'0000U) {
errno = EDOM;		errno = EDOM;
fputil::set_except(FE_INVALID);		fputil::set_except(FE_INVALID);
}		}
return x +		return x +
FPBits::build_nan(1 << (fputil::MantissaWidth<float>::VALUE - 1));		FPBits::build_nan(1 << (fputil::MantissaWidth<float>::VALUE - 1));
}		}

using ExceptChecker =
typename fputil::ExceptionChecker<float, N_EXCEPT_LARGE>;
{
float result;
if (ExceptChecker::check_odd_func(LargeExcepts, x_abs, x_sign, result))
return result;
}

k = large_range_reduction(xd, xbits.get_exponent(), y);		k = large_range_reduction(xd, xbits.get_exponent(), y);
}		}

// After range reduction, k = round(x / pi) and y = (x/pi) - k.		// After range reduction, k = round(x * 16 / pi) and y = (x * 16 / pi) - k.
// So k is an integer and -0.5 <= y <= 0.5.		// So k is an integer and -0.5 <= y <= 0.5.
// Then sin(x) = sin(ypi + kpi)		// Then sin(x) = sin((k + y)*pi/16)
// = (-1)^(k & 1) * sin(y*pi)		// = sin(ypi/16) cos(kpi/16) + cos(ypi/16) * sin(k*pi/16)
// ~ (-1)^(k & 1) * y * P(y^2)
// where y*P(y^2) is a degree-15 minimax polynomial generated by Sollya
// with: > P = fpminimax(sin(x*pi)/x, [\|0, 2, 4, 6, 8, 10, 12, 14\|],
// [\|D...\|], [0, 0.5]);

constexpr double SIGN[2] = {1.0, -1.0};

double ysq = y * y;		double ysq = y * y;
double result =
y * fputil::polyeval(ysq, 0x1.921fb54442d17p1, -0x1.4abbce625bd4bp2,
0x1.466bc67750a3fp1, -0x1.32d2cce1612b5p-1,
0x1.507832417bce6p-4, -0x1.e3062119b6071p-8,
0x1.e89c7aa14122dp-12, -0x1.625b1709dece6p-16);

return SIGN[k & 1] * result;		// Degree-6 minimax even polynomial for sin(y*pi/16)/y generated by Sollya
// }		// with:
		// > Q = fpminimax(sin(y*pi/16)/y, [\|0, 2, 4, 6\|], [\|D...\|], [0, 0.5]);
		double sin_y =
		fputil::polyeval(ysq, 0x1.921fb54442d17p-3, -0x1.4abbce6256adp-10,
		0x1.466bc5a5ac6b3p-19, -0x1.32bdcb4207562p-29);
		// Degree-8 minimax even polynomial for cos(y*pi/16) generated by Sollya with:
		// > P = fpminimax(cos(x*pi/16), [\|0, 2, 4, 6, 8\|], [\|1, D...\|], [0, 0.5]);
		// Note that cosm1_y = cos(y*pi/16) - 1.
		double cosm1_y =
		ysq * fputil::polyeval(ysq, -0x1.3bd3cc9be45dcp-6, 0x1.03c1f081b08ap-14,
		-0x1.55d3c6fb0fb6ep-24, 0x1.e1d3d60f58873p-35);

		double sin_k = SIN_K_PI_OVER_16[k & 31];
		// cos(k * pi/16) = sin(k * pi/16 + pi/2) = sin((k + 8) * pi/16).
		// cos_k = y * cos(k * pi/16)
		double cos_k = y * SIN_K_PI_OVER_16[(k + 8) & 31];

		// Combine the results with the sine of sum formula:
		// sin(x) = sin((k + y)*pi/16)
		// = sin(ypi/16) cos(kpi/16) + cos(ypi/16) * sin(k*pi/16)
		// = sin_y * cos_k + (1 + cosm1_y) * sin_k
		// = sin_y * cos_k + (cosm1_y * sin_k + sin_k)
		return fputil::multiply_add(sin_y, cos_k,
		fputil::multiply_add(cosm1_y, sin_k, sin_k));
}		}

} // namespace __llvm_libc		} // namespace __llvm_libc

utils/bazel/llvm-project-overlay/libc/BUILD.bazel

	Show First 20 Lines • Show All 649 Lines • ▼ Show 20 Lines
	)			)

	libc_math_function(			libc_math_function(
	name = "sinf",			name = "sinf",
	additional_deps = [			additional_deps = [
	":__support_fputil_fma",			":__support_fputil_fma",
	":__support_fputil_multiply_add",			":__support_fputil_multiply_add",
	":__support_fputil_polyeval",			":__support_fputil_polyeval",
				":common_constants",
	":range_reduction",			":range_reduction",
	],			],
	)			)

	libc_math_function(			libc_math_function(
	name = "sqrt",			name = "sqrt",
	additional_deps = [			additional_deps = [
	":__support_fputil_sqrt",			":__support_fputil_sqrt",
	▲ Show 20 Lines • Show All 428 Lines • Show Last 20 Lines