This is an archive of the discontinued LLVM Phabricator instance.

Paths

Table of Contentst

-
libc/
-
src/math/generic/
-
math/
-
generic/
-
CMakeLists.txt
-
common_constants.h
-
common_constants.cpp
-
sinf.cpp
-
test/src/math/
-
src/
-
math/
-
exhaustive/
-
CMakeLists.txt
-
sinf_test.cpp
-
sinf_test.cpp

Differential D123154

[libc] Implement sinf function that is correctly rounded to all rounding modes.
ClosedPublic

Authored by lntue on Apr 5 2022, 1:14 PM.

Download Raw Diff

Details

Reviewers

michaelrj
sivachandra
zimmermann6

Commits

rGd883a4ad02d8: [libc] Implement sinf function that is correctly rounded to all rounding modes.

Summary

Implement sinf function that is correctly rounded to all rounding modes.

We use a simple range reduction for pi/16 < |x| : Let k = round(x / pi) and y = (x/pi) - k. So k is an integer and -0.5 <= y <= 0.5.

Then

sin(x) = sin(y*pi + k*pi)
          = (-1)^(k & 1) * sin(y*pi)
          ~ (-1)^(k & 1) * y * P(y^2)

where `y*P(y^2)` is a degree-15 minimax polynomial generated by Sollya with:

> P = fpminimax(sin(x*pi)/x, [|0, 2, 4, 6, 8, 10, 12, 14|], [|D...|], [0, 0.5]);

Performance benchmark using perf tool from CORE-MATH project

(https://gitlab.inria.fr/core-math/core-math/-/tree/master) on Ryzen 1700:
Before this patch (not correctly rounded):

$ CORE_MATH_PERF_MODE="rdtsc" ./perf.sh sinf
CORE-MATH reciprocal throughput   : 17.892
System LIBC reciprocal throughput : 25.559
LIBC reciprocal throughput        : 29.381

After this patch (correctly rounded):

$ CORE_MATH_PERF_MODE="rdtsc" ./perf.sh sinf
CORE-MATH reciprocal throughput   : 17.896
System LIBC reciprocal throughput : 25.740

LIBC reciprocal throughput        : 27.872
LIBC reciprocal throughput        : 20.012     (with `-msse4.2` flag)
LIBC reciprocal throughput        : 14.244     (with `-mfma` flag)

Diff Detail

Repository: rG LLVM Github Monorepo

Event Timeline

lntue created this revision.Apr 5 2022, 1:14 PM

Herald added projects: Restricted Project, Restricted Project. · View Herald TranscriptApr 5 2022, 1:14 PM

Herald added subscribers: libc-commits, ecnelises, tschuett, mgorny. · View Herald Transcript

lntue requested review of this revision.Apr 5 2022, 1:14 PM

Harbormaster completed remote builds in B158055: Diff 420614.Apr 5 2022, 1:30 PM

Simplify the range reduction logic and improve performance for |x| < 2^50.

Harbormaster completed remote builds in B158129: Diff 420720.Apr 5 2022, 11:54 PM

lntue edited the summary of this revision. (Show Details)Apr 6 2022, 12:04 AM

lntue added reviewers: michaelrj, sivachandra, zimmermann6.

I get an error for rounding up:

Using llvm-libc
MPFR library: 4.1.0       
MPFR header:  4.1.0 (based on 4.1.0)
Checking function sinf with MPFR_RNDU
libm wrong by up to 3.40e-11 ulp(s) [1] for x=-0x1.47d0fep+34
sin      gives -0x1p+0
mpfr_sin gives -0x1.fffffep-1
Total: errors=1 (0.00%) errors2=0 maxerr=3.40e-11 ulp(s)

This revision now requires changes to proceed.Apr 6 2022, 12:36 AM

Add the exceptional input: x = -0x1.47d0fep+34f.

As a note of why x = 0x1.47d0fep+34f with FE_DOWNWARD passed but
x = -0x1.47d0fep34f with FE_UPWARD failed, the main difference is that
ysq = y*y differs by 1 ULP, breaking the symmetry of sin(-x) = -sin(x)
in the evaluation chain.

Harbormaster completed remote builds in B158210: Diff 420836.Apr 6 2022, 7:14 AM

all tests pass now, and I get the following figures (first CORE-MATH, 2nd GNU libc, 2rd LLVM libc):

$ LIBM=/users/zimmerma/svn/core-math/libllvmlibc.a ./perf.sh sinf
38.997
26.503
33.990

This revision is now accepted and ready to land.Apr 6 2022, 7:51 AM

Add support for non-FMA targets and simplify large range branch.

Harbormaster completed remote builds in B174431: Diff 443322.Jul 8 2022, 12:32 PM

Update math status page.

Harbormaster completed remote builds in B174434: Diff 443326.Jul 8 2022, 12:42 PM

I confirm the new version is correctly rounded (on the machine I tried it). However I find different figures for the number of cycles:

zimmerma@biscotte:~/svn/core-math$ LIBM=/localdisk/zimmerma/llvm-project/build/projects/libc/lib/libllvmlibc.a CORE_MATH_PERF_MODE=rdtsc ./perf.sh sinf
GNU libc version: 2.31
GNU libc release: stable
16.781
23.443
32.737

This is on a AMD EPYC 7282 with gcc version 10.2.1 and clang 11.0.1-2 (I guess llvm-libc is compiled with clang). This gives 17 cycles for the core-math routine, and 33 cycles for the llvm-libc one.

lntue mentioned this in D129776: [libc] Add nearest integer instructions to fputil..Jul 14 2022, 7:45 AM

Improve the performance with round to nearest integer instructions. The parts
adding nearest_integer.h are refactored to https://reviews.llvm.org/D129776.

Harbormaster completed remote builds in B175407: Diff 444660.Jul 14 2022, 7:54 AM

lntue edited the summary of this revision. (Show Details)Jul 14 2022, 8:07 AM

Update bazel overlay.

Harbormaster completed remote builds in B175418: Diff 444675.Jul 14 2022, 8:31 AM

lntue mentioned this in rG0f782b84cba5: [libc] Add nearest integer instructions to fputil..Jul 14 2022, 10:20 AM

Sync to HEAD.

Harbormaster completed remote builds in B175441: Diff 444713.Jul 14 2022, 10:38 AM

Add extra exceptional values from AARCH64.

Harbormaster completed remote builds in B175733: Diff 445107.Jul 15 2022, 12:52 PM

lntue mentioned this in D129918: [libc] Add utility classes for checking exceptional values..Jul 15 2022, 8:33 PM

I confirm the latest version is correctly rounded for all rounding modes on the machine I tried (AMD EPYC 7282 with gcc 10.2.1 and clang 11.0.1-2).
For the reciprocal throughput I get:

zimmerma@biscotte:/tmp/core-math$ LIBM=/localdisk/zimmerma/llvm-project/build/projects/libc/lib/libllvmlibc.a CORE_MATH_LAUNCHER="/localdisk/zimmerma/glibc-2.35/install/lib/ld-linux-x86-64.so.2 --library-path /localdisk/zimmerma/glibc-2.35/install/lib" CORE_MATH_PERF_MODE=rdtsc ./perf.sh sinf
GNU libc version: 2.35
GNU libc release: stable
16.705
23.636
13.989

i.e., 16.7 cycles for core-math, 23.6 cycles for glibc 2.35, and 14.0 cycles for llvm-libc. Good work!
For the latency the figures are worse than glibc:

zimmerma@biscotte:/tmp/core-math$ LIBM=/localdisk/zimmerma/llvm-project/build/projects/libc/lib/libllvmlibc.a CORE_MATH_LAUNCHER="/localdisk/zimmerma/glibc-2.35/install/lib/ld-linux-x86-64.so.2 --library-path /localdisk/zimmerma/glibc-2.35/install/lib" CORE_MATH_PERF_MODE=rdtsc PERF_ARGS=--latency ./perf.sh sinf
GNU libc version: 2.35
GNU libc release: stable
47.926
57.338
62.912

santoshn added a subscriber: santoshn.Jul 20 2022, 7:28 PM

santoshn added inline comments.

libc/src/math/generic/range_reduction.h
33–40 ↗	(On Diff #445107)	I was reviewing this code. I am not able to understand how ONE_OVER_PI_28_LSB_EXP[2] == -86. The third entry in ONE_OVER_PI_28 (i.e., 0x1.3f84ebp-62) has an exponent of -62 and there are 27 fraction bits. So the exponent of the LSB should be -89. Similarly ONE_OVER_PI_28_LSB_EXP[5] is -167? Similarly ONE_OVER_PI_28_LSB_EXP[6] is -206? Similarly ONE_OVER_PI_28_LSB_EXP[7] is -236?

lntue added inline comments.Jul 21 2022, 6:31 AM

libc/src/math/generic/range_reduction.h
33–40 ↗	(On Diff #445107)	I think for `ONE_OVER_PI_28_LSB_EXP[2]` I over-counted a bit. The exact value `0x1.3f84eb p-62` actually has the last 1 bit at 24th bit after the decimal point (bits 25-27 are all 0's), so the correct value of `ONE_VER_PI_28_LSB_EXP[2]` should be `-62 - 24 = -66`. Similarly for other values.

santoshn added inline comments.Jul 21 2022, 7:06 AM

libc/src/math/generic/range_reduction.h
33–40 ↗	(On Diff #445107)	Tue, Thanks for the clarification. The last hex digit in ONE_OVER_PI_28_LSP[2] is "b", which is 1011. We are using 28-bits of precision for each entry of OVER_OVER_PI_28 I get -62-27 = -89.

santoshn added inline comments.Jul 21 2022, 7:39 AM

libc/src/math/generic/range_reduction.h
33–40 ↗	(On Diff #445107)	The ONE_OVER_PI_28_LSB_EXP array according to me for the double values with a precision of 28 for ONE_OVER_PI_28 should be. static const double one_over_pi_28_exp[8] = { -29, // -2 -27 = -29 -60, // -33 -27 = -60 -89, // -62 -27 = -89, last hex digit is b which is 1011 -118, // -92 - 26 = -118, last hex digit is 2, which is 0010 -147, // -121 -26 = -147, last hex digit is a, which is 1010 -174, // -150-24 = -174, last hex digit is 8, which is 1000 -204, // -179-25 = -204, last hex digit is c, which is 1100 -234 // -209 -25 = -234, last hex digit is c, which is 1100 }; In the RLIBM project, we have a similar version range reduction for sinf. We preform two levels of range reduction. The first level of range reduction is exactly similar to the approach here. Second level approximates sinpi(x), with two polynomials sinpi and cospi, similar to the description in Section 2 of our PLDI 2021 paper: https://people.cs.rutgers.edu/~sn349/papers/rlibm32-pldi-2021-preprint.pdf Hence, we think the performance of this sinf function can be significantly improved (close to 2X improvement). We are doing final stages of testing for our sinf polynomial, which consist of two polynomials: a 3 term, degree-5 polynomial (sinpi) and a 3-term, degree-4 polynomial (cospi) and a table of 512 double values. We will share the link to the implementation here once we are done with our testing. Hope it can be incorporated.

lntue added inline comments.Jul 21 2022, 8:02 AM

libc/src/math/generic/range_reduction.h
33–40 ↗	(On Diff #445107)	Yes, the last hex digit is `b`, but it has only 6 hex digits, and hence `-24` instead of `-27`; a full 7 hex digits will make it `-28` instead. It would be great to be able to reduce the polynomial's degree + exceptional cases. Thanks for letting me know the progress! I'm looking forward to it!

Tue,

Thanks for the information. Here is the correctly rounded sinf from our RLIBM project: https://github.com/rutgers-apl/The-RLIBM-Project/tree/main/experimental/sin

We know it works for round-to-nearest and produces correct results for all inputs. From our testing, it is close to 2X faster than the current LLVM version and about 25% faster than the CORE-MATH version with respect to latency.

Let me know if you have any questions.

In D123154#3669380, @santoshn wrote:

Tue,

Thanks for the information. Here is the correctly rounded sinf from our RLIBM project: https://github.com/rutgers-apl/The-RLIBM-Project/tree/main/experimental/sin

We know it works for round-to-nearest and produces correct results for all inputs. From our testing, it is close to 2X faster than the current LLVM version and about 25% faster than the CORE-MATH version with respect to latency.

Let me know if you have any questions.

Thanks for the link! It's nice to see that we can bring down the overall latency and it has no exceptional values!

In the earlier iteration of this patch, I've implemented some thing similar with the same degree using k*pi/256 range reduction instead of k*pi/512, with around 5-6 exceptional values: https://reviews.llvm.org/D123154?vs=on&id=420614#toc.
You might want to try with better polynomial generator to see if it can reduce the number of exceptional values, or maybe even bring down the number of sub-intervals to 128?

In this patch I went with the other extreme for range reduction. When I have more time, I'd definitely playing around with various ranges of second range reduction to find the balance between look-up table size, polynomials' degrees, and number of exceptional values.

Thanks!

Update comments and fix bazel builds.

Harbormaster completed remote builds in B176997: Diff 446816.Jul 22 2022, 7:04 AM

Closed by commit rGd883a4ad02d8: [libc] Implement sinf function that is correctly rounded to all rounding modes. (authored by lntue). · Explain WhyJul 22 2022, 7:07 AM

This revision was automatically updated to reflect the committed changes.

lntue added a commit: rGd883a4ad02d8: [libc] Implement sinf function that is correctly rounded to all rounding modes..

Revision Contents

Path

Size

libc/

src/

math/

generic/

3 lines

12 lines

59 lines

271 lines

test/

src/

math/

exhaustive/

CMakeLists.txt

4 lines

sinf_test.cpp

74 lines

sinf_test.cpp

42 lines

Diff 420614

libc/src/math/generic/CMakeLists.txt

	Show First 20 Lines • Show All 71 Lines • ▼ Show 20 Lines

	add_entrypoint_object(			add_entrypoint_object(
	sinf			sinf
	SRCS			SRCS
	sinf.cpp			sinf.cpp
	HDRS			HDRS
	../sinf.h			../sinf.h
	DEPENDS			DEPENDS
	.sincosf_utils			.common_constants
	libc.include.math			libc.include.math
	libc.src.errno.errno			libc.src.errno.errno
	libc.src.__support.FPUtil.fputil			libc.src.__support.FPUtil.fputil
	COMPILE_OPTIONS			COMPILE_OPTIONS
	-O3			-O3
				-mfma
	)			)

	add_entrypoint_object(			add_entrypoint_object(
	sincosf			sincosf
	SRCS			SRCS
	sincosf.cpp			sincosf.cpp
	HDRS			HDRS
	../sincosf.h			../sincosf.h
	▲ Show 20 Lines • Show All 993 Lines • Show Last 20 Lines

libc/src/math/generic/common_constants.h

	Show All 25 Lines
	extern const double EXP_M1[195];			extern const double EXP_M1[195];

	// Lookup table for exp(m * 2^(-7)) with m = 0, ..., 127.			// Lookup table for exp(m * 2^(-7)) with m = 0, ..., 127.
	// Table is generated with Sollya as follow:			// Table is generated with Sollya as follow:
	// > display = hexadecimal;			// > display = hexadecimal;
	// > for i from 0 to 127 do { D(exp(i / 128)); };			// > for i from 0 to 127 do { D(exp(i / 128)); };
	extern const double EXP_M2[128];			extern const double EXP_M2[128];

				// Lookup table for sin(k * pi / 256) with k = 0, ..., 128.
				// Table is generated with Sollya as follow:
				// > display = hexadecimal;
				// > for i from 0 to 128 do { D(sin(i * pi / 256)); };
				extern const double SIN_K_PI_OVER_256[129];

				// Digits of 256/pi, generated by Sollya with:
				// > a0 = D(256/pi);
				// > a1 = D(256/pi - a0);
				// > a2 = D(256/pi - a0 - a1);
				// > a3 = D(256/pi - a0 - a1 - a2);
				extern const double TWOFIFTYSIX_OVER_PI[4];
	} // namespace __llvm_libc			} // namespace __llvm_libc

	#endif // LLVM_LIBC_SRC_MATH_GENERIC_COMMON_CONSTANTS_H			#endif // LLVM_LIBC_SRC_MATH_GENERIC_COMMON_CONSTANTS_H

libc/src/math/generic/common_constants.cpp

Show First 20 Lines • Show All 220 Lines • ▼ Show 20 Lines	const double EXP_M2[128] = {
0x1.30aaa04e80d05p1, 0x1.330e587b62b28p1, 0x1.3576dce33feadp1,		0x1.30aaa04e80d05p1, 0x1.330e587b62b28p1, 0x1.3576dce33feadp1,
0x1.37e437282d4eep1, 0x1.3a5670ff972edp1, 0x1.3ccd9432682b4p1,		0x1.37e437282d4eep1, 0x1.3a5670ff972edp1, 0x1.3ccd9432682b4p1,
0x1.3f49aa9d30590p1, 0x1.41cabe304cb34p1, 0x1.4450d8f00edd4p1,		0x1.3f49aa9d30590p1, 0x1.41cabe304cb34p1, 0x1.4450d8f00edd4p1,
0x1.46dc04f4e5338p1, 0x1.496c4c6b832dap1, 0x1.4c01b9950a111p1,		0x1.46dc04f4e5338p1, 0x1.496c4c6b832dap1, 0x1.4c01b9950a111p1,
0x1.4e9c56c731f5dp1, 0x1.513c2e6c731d7p1, 0x1.53e14b042f9cap1,		0x1.4e9c56c731f5dp1, 0x1.513c2e6c731d7p1, 0x1.53e14b042f9cap1,
0x1.568bb722dd593p1, 0x1.593b7d72305bbp1,		0x1.568bb722dd593p1, 0x1.593b7d72305bbp1,
};		};

		// Lookup table for sin(k * pi / 256) with k = 0, ..., 128.
		// Table is generated with Sollya as follow:
		// > display = hexadecimal;
		// > for i from 0 to 128 do { D(sin(i * pi / 256)); };
		const double SIN_K_PI_OVER_256[129] = {
		0x0.0000000000000p+0, 0x1.921d1fcdec784p-7, 0x1.92155f7a3667ep-6,
		0x1.2d865759455cdp-5, 0x1.91f65f10dd814p-5, 0x1.f656e79f820e0p-5,
		0x1.2d52092ce19f6p-4, 0x1.5f6d00a9aa419p-4, 0x1.917a6bc29b42cp-4,
		0x1.c3785c79ec2d5p-4, 0x1.f564e56a9730ep-4, 0x1.139f0cedaf577p-3,
		0x1.2c8106e8e613ap-3, 0x1.45576b1293e5ap-3, 0x1.5e214448b3fc6p-3,
		0x1.76dd9de50bf31p-3, 0x1.8f8b83c69a60bp-3, 0x1.a82a025b00451p-3,
		0x1.c0b826a7e4f63p-3, 0x1.d934fe5454311p-3, 0x1.f19f97b215f1bp-3,
		0x1.04fb80e37fdaep-2, 0x1.111d262b1f677p-2, 0x1.1d3443f4cdb3ep-2,
		0x1.294062ed59f06p-2, 0x1.35410c2e18152p-2, 0x1.4135c94176601p-2,
		0x1.4d1e24278e76ap-2, 0x1.58f9a75ab1fddp-2, 0x1.64c7ddd3f27c6p-2,
		0x1.7088530fa459fp-2, 0x1.7c3a9311dcce7p-2, 0x1.87de2a6aea963p-2,
		0x1.9372a63bc93d7p-2, 0x1.9ef7943a8ed8ap-2, 0x1.aa6c82b6d3fcap-2,
		0x1.b5d1009e15cc0p-2, 0x1.c1249d8011ee7p-2, 0x1.cc66e9931c45ep-2,
		0x1.d79775b86e389p-2, 0x1.e2b5d3806f63bp-2, 0x1.edc1952ef78d6p-2,
		0x1.f8ba4dbf89abap-2, 0x1.01cfc874c3eb7p-1, 0x1.073879922ffeep-1,
		0x1.0c9704d5d898fp-1, 0x1.11eb3541b4b23p-1, 0x1.1734d63dedb49p-1,
		0x1.1c73b39ae68c8p-1, 0x1.21a799933eb59p-1, 0x1.26d054cdd12dfp-1,
		0x1.2bedb25faf3eap-1, 0x1.30ff7fce17035p-1, 0x1.36058b10659f3p-1,
		0x1.3affa292050b9p-1, 0x1.3fed9534556d4p-1, 0x1.44cf325091dd6p-1,
		0x1.49a449b9b0939p-1, 0x1.4e6cabbe3e5e9p-1, 0x1.5328292a35596p-1,
		0x1.57d69348ceca0p-1, 0x1.5c77bbe65018cp-1, 0x1.610b7551d2cdfp-1,
		0x1.6591925f0783dp-1, 0x1.6a09e667f3bcdp-1, 0x1.6e74454eaa8afp-1,
		0x1.72d0837efff96p-1, 0x1.771e75f037261p-1, 0x1.7b5df226aafafp-1,
		0x1.7f8ece3571771p-1, 0x1.83b0e0bff976ep-1, 0x1.87c400fba2ebfp-1,
		0x1.8bc806b151741p-1, 0x1.8fbcca3ef940dp-1, 0x1.93a22499263fbp-1,
		0x1.9777ef4c7d742p-1, 0x1.9b3e047f38741p-1, 0x1.9ef43ef29af94p-1,
		0x1.a29a7a0462782p-1, 0x1.a63091b02fae2p-1, 0x1.a9b66290ea1a3p-1,
		0x1.ad2bc9e21d511p-1, 0x1.b090a58150200p-1, 0x1.b3e4d3ef55712p-1,
		0x1.b728345196e3ep-1, 0x1.ba5aa673590d2p-1, 0x1.bd7c0ac6f952ap-1,
		0x1.c08c426725549p-1, 0x1.c38b2f180bdb1p-1, 0x1.c678b3488739bp-1,
		0x1.c954b213411f5p-1, 0x1.cc1f0f3fcfc5cp-1, 0x1.ced7af43cc773p-1,
		0x1.d17e7743e35dcp-1, 0x1.d4134d14dc93ap-1, 0x1.d696173c9e68bp-1,
		0x1.d906bcf328d46p-1, 0x1.db6526238a09bp-1, 0x1.ddb13b6ccc23cp-1,
		0x1.dfeae622dbe2bp-1, 0x1.e212104f686e5p-1, 0x1.e426a4b2bc17ep-1,
		0x1.e6288ec48e112p-1, 0x1.e817bab4cd10dp-1, 0x1.e9f4156c62ddap-1,
		0x1.ebbd8c8df0b74p-1, 0x1.ed740e7684963p-1, 0x1.ef178a3e473c2p-1,
		0x1.f0a7efb9230d7p-1, 0x1.f2252f7763adap-1, 0x1.f38f3ac64e589p-1,
		0x1.f4e603b0b2f2dp-1, 0x1.f6297cff75cb0p-1, 0x1.f7599a3a12077p-1,
		0x1.f8764fa714ba9p-1, 0x1.f97f924c9099bp-1, 0x1.fa7557f08a517p-1,
		0x1.fb5797195d741p-1, 0x1.fc26470e19fd3p-1, 0x1.fce15fd6da67bp-1,
		0x1.fd88da3d12526p-1, 0x1.fe1cafcbd5b09p-1, 0x1.fe9cdad01883ap-1,
		0x1.ff095658e71adp-1, 0x1.ff621e3796d7ep-1, 0x1.ffa72effef75dp-1,
		0x1.ffd886084cd0dp-1, 0x1.fff62169b92dbp-1, 0x1.0000000000000p+0,
		};

		// Digits of 256/pi, generated by Sollya with:
		// > a0 = D(256/pi);
		// > a1 = D(256/pi - a0);
		// > a2 = D(256/pi - a0 - a1);
		// > a3 = D(256/pi - a0 - a1 - a2);
		const double TWOFIFTYSIX_OVER_PI[4] = {
		0x1.45f306dc9c883p6, -0x1.6b01ec5417056p-48, -0x1.6447e493ad4cep-102,
		0x1.e21c820ff28b2p-156};

} // namespace __llvm_libc		} // namespace __llvm_libc

libc/src/math/generic/sinf.cpp

	//===-- Single-precision sin function -------------------------------------===//			//===-- Single-precision sin function -------------------------------------===//
	//			//
	// Part of the LLVM Project, under the Apache License v2.0 with LLVM Exceptions.			// Part of the LLVM Project, under the Apache License v2.0 with LLVM Exceptions.
	// See https://llvm.org/LICENSE.txt for license information.			// See https://llvm.org/LICENSE.txt for license information.
	// SPDX-License-Identifier: Apache-2.0 WITH LLVM-exception			// SPDX-License-Identifier: Apache-2.0 WITH LLVM-exception
	//			//
	//===----------------------------------------------------------------------===//			//===----------------------------------------------------------------------===//

	#include "src/math/sinf.h"			#include "src/math/sinf.h"
	#include "math_utils.h"			#include "src/__support/FPUtil/BasicOperations.h"
	#include "sincosf_utils.h"			#include "src/__support/FPUtil/FEnvImpl.h"
				#include "src/__support/FPUtil/FMA.h"
				#include "src/__support/FPUtil/FPBits.h"
				#include "src/__support/FPUtil/PolyEval.h"
	#include "src/__support/common.h"			#include "src/__support/common.h"
	#include <math.h>			#include "src/math/generic/common_constants.h"

	#include <stdint.h>			#include <errno.h>

	namespace __llvm_libc {			namespace __llvm_libc {

	// Fast sinf implementation. Worst-case ULP is 0.5607, maximum relative			namespace {
	// error is 0.5303 * 2^-23. A single-step range reduction is used for			static inline void get_sin_cos(int64_t k, double &sin_k, double &cos_k) {
	// small values. Large inputs have their range reduced using fast integer			int idx = k & 0x007f;
	// arithmetic.			int quadrant = (k & 0x0180);
	LLVM_LIBC_FUNCTION(float, sinf, (float y)) {			switch (quadrant >> 7) {
	double x = y;			case 0:
	double s;			sin_k = SIN_K_PI_OVER_256[idx];
	int n;			cos_k = SIN_K_PI_OVER_256[128 - idx];
	const sincos_t *p = &SINCOSF_TABLE[0];			break;
				case 1:
	if (abstop12(y) < abstop12(PIO4)) {			sin_k = SIN_K_PI_OVER_256[128 - idx];
	s = x * x;			cos_k = -SIN_K_PI_OVER_256[idx];
				break;
	if (unlikely(abstop12(y) < abstop12(as_float(0x39800000)))) {			case 2:
	if (unlikely(abstop12(y) < abstop12(as_float(0x800000))))			sin_k = -SIN_K_PI_OVER_256[idx];
	// Force underflow for tiny y.			cos_k = -SIN_K_PI_OVER_256[128 - idx];
	force_eval<float>(s);			break;
	return y;			case 3:
				sin_k = -SIN_K_PI_OVER_256[128 - idx];
				cos_k = SIN_K_PI_OVER_256[idx];
				break;
				default:
				__builtin_unreachable();
				}
	}			}

	return sinf_poly(x, s, p, 0);			} // namespace
	} else if (likely(abstop12(y) < abstop12(120.0f))) {
	x = reduce_fast(x, p, &n);

	// Setup the signs for sin and cos.
	s = p->sign[n & 3];

	if (n & 2)			INLINE_FMA
	p = &SINCOSF_TABLE[1];			LLVM_LIBC_FUNCTION(float, sinf, (float x)) {
				using FPBits = typename fputil::FPBits<float>;
				FPBits xbits(x);

				uint32_t x_u = xbits.uintval();
				uint32_t x_abs = x_u & 0x7fff'ffffU;

				double xd = static_cast<double>(x);

				// Range reduction:
				// For \|x\| > pi/16, we perform range reduction as follows:
				// Find k and y such that:
				// x = (k + y) * pi/256
				// k is an integer
				// \|y\| < 1
				// This is done by performing the following computation:
				// k = round(x * 256/pi)
				// y = round(x * 256/pi - k)
				// The digits of 256/pi are stored using 4 doubles. The last double stores
				// digits ranging from 2^(-208) to 2^(-156) of 256/pi, so when multiplying
				// by the largest values of single precision, the resulting output should be
				// correct up to 2^(-208 + 128) ~ 2^-80. By the worst-case analysis of range
				// reduction, \|y\| >= 2^-38, so this should give us more than 40 bits of
				// accuracy. For the worst-case estimation of range reduction, see for
				// instances:
				// Elementary Functions by J-M. Muller, Chapter 11,
				// Handbook of Floating-Point Arithmetic by J-M. Muller et. al.,
				// Chapter 10.2.
				//
				// Once k and y are computed, we then deduce the answer by the sine of sum
				// formula:
				// sin(x) = sin((k + y)*pi/256)
				// = sin(ypi/256) cos(kpi/256) + cos(ypi/256) * sin(k*pi/256)
				// The values of sin(kpi/256) and cos(kpi/256) are precomputed and stored
				// using a vector of 129 doubles. Sin(ypi/256) and cos(ypi/256) are computed
				// using degree-5 and degree-6 minimax polynomials generated by Sollya
				// respectively,
				double y, sin_k, cos_k;
				int64_t k;

				// \|x\| < 2^46
				if (x_abs < 0x5680'0000U) {
				// \|x\| < 0x1.d12ed2p-12f
				if (x_abs < 0x39e8'9769U) {
				if (unlikely(x_abs == 0U)) {
				// For signed zeros.
				return x;
				}
				// When \|x\| < 2^-12, the relative error of the approximation sin(x) ~ x
				// is:
				// \|sin(x) - x\| / \|sin(x)\| < \|x^3\| / (6\|x\|)
				// = x^2 / 6
				// < 2^-25
				// < epsilon(1)/2.
				// So the correctly rounded values of sin(x) are:
				// = x - sign(x)*eps(x) if rounding mode = FE_TOWARDZERO,
				// or (rounding mode = FE_UPWARD and x is
				// negative),
				// = x otherwise.
				// To simplify the rounding decision and make it more efficient, we use
				// fma(x, -2^-25, x) instead.
				// An exhaustive test shows that this formula work correctly for all
				// rounding modes up to \|x\| < 0x1.c555dep-11f.
				return fputil::fma(x, -0x1.0p-25f, x);
				}

	return sinf_poly(x * s, x * x, p, n);			// \|x\| <= pi/16
	} else if (abstop12(y) < abstop12(INFINITY)) {			if (x_abs <= 0x3e49'0fdbU) {
	uint32_t xi = as_uint32_bits(y);			double xsq = xd * xd;
	int sign = xi >> 31;
				// Degree-9 polynomial approximation:
				// sin(x) ~ x + a_3 x^3 + a_5 x^5 + a_7 x^7 + a_9 x^9
				// = x (1 + a_3 x^2 + ... + a_9 x^8)
				// = x * P(x^2)
				// generated by Sollya with the following commands:
				// > display = hexadecimal;
				// > Q = fpminimax(sin(x)/x, [\|0, 2, 4, 6, 8\|], [\|1, D...\|], [0, pi/16]);
				double result = fputil::polyeval(
				xsq, 1.0, -0x1.55555555554c6p-3, 0x1.1111111085e65p-7,
				-0x1.a019f70fb4d4fp-13, 0x1.718d179815e74p-19);
				return xd * result;
				}

	x = reduce_large(xi, &n);			// Exceptional cases.
				switch (x_abs) {
				case 0x4619'9998U: // \|x\| = 0x1.33333p+13f
				if (xbits.get_sign()) {
				if (fputil::get_round() == FE_UPWARD)
				return 0x1.63f4bcp-2f;
				return 0x1.63f4bap-2f;
				}
				if (fputil::get_round() == FE_DOWNWARD)
				return -0x1.63f4bcp-2f;
				return -0x1.63f4bap-2f;
				case 0x4afd'ece4U: { // \|x\| = 0x1.fbd9c8p+22f;
				int rounding = fputil::get_round();
				if (xbits.get_sign()) {
				if (rounding == FE_DOWNWARD \|\| rounding == FE_TOWARDZERO)
				return 0x1.ff6dcp-1f;
				return 0x1.ff6dc2p-1f;
				}
				if (rounding == FE_UPWARD \|\| rounding == FE_TOWARDZERO)
				return -0x1.ff6dcp-1f;
				return -0x1.ff6dc2p-1f;
				}
				case 0x5239'47f6U: { // \|x\| = 0x1.728fecp+37f;
				int rounding = fputil::get_round();
				if (xbits.get_sign()) {
				if (rounding == FE_DOWNWARD \|\| rounding == FE_TOWARDZERO)
				return 0x1.24f23ap-1f;
				return 0x1.24f23cp-1f;
				}
				if (rounding == FE_UPWARD \|\| rounding == FE_TOWARDZERO)
				return -0x1.24f23ap-1f;
				return -0x1.24f23cp-1f;
				}
				}

	// Setup signs for sin and cos - include original sign.			// Since casting from float to integer is round-toward-zero, we add
	s = p->sign[(n + sign) & 3];			// sign(x)*0.5 before casting to make it round-to-nearest.
				k = static_cast<int64_t>(
				fputil::fma(xd, TWOFIFTYSIX_OVER_PI[0], xbits.get_sign() ? -0.5 : 0.5));
				y = fputil::fma(xd, TWOFIFTYSIX_OVER_PI[0], -static_cast<double>(k));
				y = fputil::fma(xd, TWOFIFTYSIX_OVER_PI[1], y);
				} else {
				// Exceptional values
				if (unlikely(x_abs == 0x6a19'76f1U \|\| x_abs == 0x6f79'be45U)) {
				// \|x\| = 0x1.32ede2p+85f or \|x\| = 0x1.f37c8ap+95f
				int rounding = fputil::get_round();
				if (xbits.get_sign()) {
				if (rounding == FE_UPWARD \|\| rounding == FE_TOWARDZERO)
				return -0x1.fffffep-1f;
				return -1.0f;
				}
				if (rounding == FE_DOWNWARD \|\| rounding == FE_TOWARDZERO)
				return 0x1.fffffep-1f;
				return 1.0f;
				}

	if ((n + sign) & 2)			// x is inf or nan.
	p = &SINCOSF_TABLE[1];			if (unlikely(x_abs >= 0x7f80'0000U)) {
				if (x_abs == 0x7f80'0000U)
				errno = EDOM;
				return x +
				FPBits::build_nan(1 << (fputil::MantissaWidth<float>::VALUE - 1));
				}

	return sinf_poly(x * s, x * x, p, n);			// 2^46 <= \|x\| < 2^98
				if (x_abs < 0x7080'0000U) {
				// - When x >= 2^46, double(xd*TWOFIFTYSIX_OVER_PI[0]) is an integer, so
				// we just need to keep its lowest 9 bits, so that when adding with the
				// lower part, it does not result in overflow when converting to integer.
				// - When x >= 2^55, the lowest 9 bit of the product
				// double(xd*TWOFIFTYSIX_OVER_PI[0]) can also be dropped.
				fputil::FPBits<double> prod(xd * TWOFIFTYSIX_OVER_PI[0]);
				prod.bits &= (x_abs < 0x5b00'0000U) ? (~0x1ffULL) : (~0ULL); // \|x\| < 2^55
				double truncated_prod =
				fputil::fma(xd, TWOFIFTYSIX_OVER_PI[0], -static_cast<double>(prod));
				k = static_cast<int64_t>(
				fputil::fma(xd, TWOFIFTYSIX_OVER_PI[1], truncated_prod) +
				(xbits.get_sign() ? -0.5 : 0.5));
				y = fputil::fma(xd, TWOFIFTYSIX_OVER_PI[1],
				truncated_prod - static_cast<double>(k));
				y = fputil::fma(xd, TWOFIFTYSIX_OVER_PI[2], y);
				y = fputil::fma(xd, TWOFIFTYSIX_OVER_PI[3], y);
				} else {
				// - When x >= 2^98, double(xdTWOFIFTYSIX_OVER_PI[1])4 is an integer, so
				// we just need to keep its lowest 11 bits, so that when adding with the
				// lower part, it does not result in overflow when converting to integer.
				// - When x >= 2^109, the lowest 11 bit of the product
				// double(xd*TWOFIFTYSIX_OVER_PI[1]) can also be dropped.
				fputil::FPBits<double> prod(xd * TWOFIFTYSIX_OVER_PI[1]);
				prod.bits &=
				(x_abs < 0x7600'0000U) ? (~0x0fffULL) : (~0ULL); // \|x\| < 2^109
				double truncated_prod =
				fputil::fma(xd, TWOFIFTYSIX_OVER_PI[1], -static_cast<double>(prod));
				k = static_cast<int64_t>(
				fputil::fma(xd, TWOFIFTYSIX_OVER_PI[2], truncated_prod) +
				(xbits.get_sign() ? -0.5 : 0.5));
				y = fputil::fma(xd, TWOFIFTYSIX_OVER_PI[2],
				truncated_prod - static_cast<double>(k));
				y = fputil::fma(xd, TWOFIFTYSIX_OVER_PI[3], y);
	}			}
				}

				double ysq = y * y;
				get_sin_cos(k, sin_k, cos_k);

	return invalid(y);			// Degree-5 minimax polynomial for sin(y * pi / 256) generated by Sollya with:
				// > Q = fpminimax(sin(x*pi/256)/x, [\|0, 2, 4\|], [\|D...\|], [0, 1]);
				double sin_y =
				y * fputil::polyeval(ysq, 0x1.921fb54442d18p-7, -0x1.4abbce62424a9p-22,
				0x1.466b4cf57923bp-39);
				// Degree-6 minimax polynomial for cos(y*pi/256) generated by Sollya with:
				// > P = fpminimax(cos(x*pi/256), [\|0, 2, 4, 6\|], [\|1, D...\|], [0, 1]);
				// Note that cosm1_y = cos(y * pi/256) - 1.
				double cosm1_y =
				ysq * fputil::polyeval(ysq, -0x1.3bd3cc9be45dep-14, 0x1.03c1f0819cac8p-30,
				-0x1.55d33e393617dp-48);
				// Combine the results with the sine of sum formula:
				// sin(x) = sin((k + y)*pi/256)
				// = sin(ypi/256) cos(kpi/256) + cos(ypi/256) * sin(k*pi/256)
				return fputil::fma(sin_y, cos_k, fputil::fma(cosm1_y, sin_k, sin_k));
	}			}

	} // namespace __llvm_libc			} // namespace __llvm_libc

libc/test/src/math/exhaustive/CMakeLists.txt

Show All 17 Lines	add_fp_unittest(
DEPENDS		DEPENDS
libc.include.math		libc.include.math
libc.src.math.sqrtf		libc.src.math.sqrtf
libc.src.__support.FPUtil.fputil		libc.src.__support.FPUtil.fputil
)		)

add_fp_unittest(		add_fp_unittest(
sinf_test		sinf_test
		NO_RUN_POSTBUILD
NEED_MPFR		NEED_MPFR
SUITE		SUITE
libc_math_exhaustive_tests		libc_math_exhaustive_tests
SRCS		SRCS
sinf_test.cpp		sinf_test.cpp
DEPENDS		DEPENDS
		.exhaustive_test
libc.include.math		libc.include.math
libc.src.math.sinf		libc.src.math.sinf
libc.src.__support.FPUtil.fputil		libc.src.__support.FPUtil.fputil
		LINK_OPTIONS
		-lpthread
)		)

add_fp_unittest(		add_fp_unittest(
cosf_test		cosf_test
NEED_MPFR		NEED_MPFR
SUITE		SUITE
libc_math_exhaustive_tests		libc_math_exhaustive_tests
SRCS		SRCS
▲ Show 20 Lines • Show All 145 Lines • Show Last 20 Lines

libc/test/src/math/exhaustive/sinf_test.cpp

	//===-- Exhaustive test for sinf ------------------------------------------===//			//===-- Exhaustive test for sinf ------------------------------------------===//
	//			//
	// Part of the LLVM Project, under the Apache License v2.0 with LLVM Exceptions.			// Part of the LLVM Project, under the Apache License v2.0 with LLVM Exceptions.
	// See https://llvm.org/LICENSE.txt for license information.			// See https://llvm.org/LICENSE.txt for license information.
	// SPDX-License-Identifier: Apache-2.0 WITH LLVM-exception			// SPDX-License-Identifier: Apache-2.0 WITH LLVM-exception
	//			//
	//===----------------------------------------------------------------------===//			//===----------------------------------------------------------------------===//

				#include "exhaustive_test.h"
	#include "src/__support/FPUtil/FPBits.h"			#include "src/__support/FPUtil/FPBits.h"
	#include "src/math/sinf.h"			#include "src/math/sinf.h"
	#include "utils/MPFRWrapper/MPFRUtils.h"			#include "utils/MPFRWrapper/MPFRUtils.h"
	#include <math.h>			#include "utils/UnitTest/FPMatcher.h"

				#include <thread>

	using FPBits = __llvm_libc::fputil::FPBits<float>;			using FPBits = __llvm_libc::fputil::FPBits<float>;

	namespace mpfr = __llvm_libc::testing::mpfr;			namespace mpfr = __llvm_libc::testing::mpfr;

	TEST(LlvmLibcsinffExhaustiveTest, AllValues) {			struct LlvmLibcSinfExhaustiveTest : public LlvmLibcExhaustiveTest<uint32_t> {
	uint32_t bits = 0;			bool check(uint32_t start, uint32_t stop,
				mpfr::RoundingMode rounding) override {
				mpfr::ForceRoundingMode r(rounding);
				uint32_t bits = start;
				bool result = true;
	do {			do {
	FPBits xbits(bits);			FPBits xbits(bits);
	float x = float(xbits);			float x = float(xbits);
	ASSERT_MPFR_MATCH(mpfr::Operation::Sin, x, __llvm_libc::sinf(x), 1.0);			result &= EXPECT_MPFR_MATCH(mpfr::Operation::Sin, x, __llvm_libc::sinf(x),
	} while (bits++ < 0xffff'ffffU);			0.5, rounding);
				} while (++bits < stop);
				return result;
				}
				};

				static const int NUM_THREADS = std::thread::hardware_concurrency();

				// Range: [0, +Inf);
				static constexpr uint32_t POS_START = 0x0000'0000U;
				static constexpr uint32_t POS_STOP = 0x7f80'0000U;

				TEST_F(LlvmLibcSinfExhaustiveTest, PostiveRangeRoundNearestTieToEven) {
				test_full_range(POS_START, POS_STOP, NUM_THREADS,
				mpfr::RoundingMode::Nearest);
				}

				TEST_F(LlvmLibcSinfExhaustiveTest, PostiveRangeRoundUp) {
				test_full_range(POS_START, POS_STOP, NUM_THREADS, mpfr::RoundingMode::Upward);
				}

				TEST_F(LlvmLibcSinfExhaustiveTest, PostiveRangeRoundDown) {
				test_full_range(POS_START, POS_STOP, NUM_THREADS,
				mpfr::RoundingMode::Downward);
				}

				TEST_F(LlvmLibcSinfExhaustiveTest, PostiveRangeRoundTowardZero) {
				test_full_range(POS_START, POS_STOP, NUM_THREADS,
				mpfr::RoundingMode::TowardZero);
				}

				// Range: (-Inf, 0];
				static constexpr uint32_t NEG_START = 0x8000'0000U;
				static constexpr uint32_t NEG_STOP = 0xff80'0000U;

				TEST_F(LlvmLibcSinfExhaustiveTest, NegativeRangeRoundNearestTieToEven) {
				test_full_range(NEG_START, NEG_STOP, NUM_THREADS,
				mpfr::RoundingMode::Nearest);
				}

				TEST_F(LlvmLibcSinfExhaustiveTest, NegativeRangeRoundUp) {
				test_full_range(NEG_START, NEG_STOP, NUM_THREADS, mpfr::RoundingMode::Upward);
				}

				TEST_F(LlvmLibcSinfExhaustiveTest, NegativeRangeRoundDown) {
				test_full_range(NEG_START, NEG_STOP, NUM_THREADS,
				mpfr::RoundingMode::Downward);
				}

				TEST_F(LlvmLibcSinfExhaustiveTest, NegativeRangeRoundTowardZero) {
				test_full_range(NEG_START, NEG_STOP, NUM_THREADS,
				mpfr::RoundingMode::TowardZero);
	}			}

libc/test/src/math/sinf_test.cpp

	Show First 20 Lines • Show All 45 Lines • ▼ Show 20 Lines

	TEST(LlvmLibcSinfTest, InFloatRange) {			TEST(LlvmLibcSinfTest, InFloatRange) {
	constexpr uint32_t COUNT = 1000000;			constexpr uint32_t COUNT = 1000000;
	constexpr uint32_t STEP = UINT32_MAX / COUNT;			constexpr uint32_t STEP = UINT32_MAX / COUNT;
	for (uint32_t i = 0, v = 0; i <= COUNT; ++i, v += STEP) {			for (uint32_t i = 0, v = 0; i <= COUNT; ++i, v += STEP) {
	float x = float(FPBits(v));			float x = float(FPBits(v));
	if (isnan(x) \|\| isinf(x))			if (isnan(x) \|\| isinf(x))
	continue;			continue;
	ASSERT_MPFR_MATCH(mpfr::Operation::Sin, x, __llvm_libc::sinf(x), 1.0);			ASSERT_MPFR_MATCH_ALL_ROUNDING(mpfr::Operation::Sin, x,
				__llvm_libc::sinf(x), 0.5);
	}			}
	}			}

	TEST(LlvmLibcSinfTest, SpecificBitPatterns) {			TEST(LlvmLibcSinfTest, SpecificBitPatterns) {
	float x = float(FPBits(uint32_t(0xc70d39a1)));			constexpr int N = 9;
	EXPECT_MPFR_MATCH(mpfr::Operation::Sin, x, __llvm_libc::sinf(x), 1.0);			constexpr uint32_t INPUTS[N] = {
				0x4049'0fdbU, // x = pi
				0x4619'9998U, // x = 0x1.33333p+13f
				0x4afd'ece4U, // x = 0x1.fbd9c8p+22f
				0x5239'47f6U, // x = 0x1.728fecp+37f
				0x55ca'fb2aU, // x = 0x1.95f654p+44f
				0x588e'f060U, // x = 0x1.1de0cp+50f
				0x6600'0001U, // x = 0x1.000002p+77f
				0x6a19'76f1U, // x = 0x1.32ede2p+85f
				0x6f79'be45U, // x = 0x1.f37c8ap+95f
				};

				for (int i = 0; i < N; ++i) {
				float x = float(FPBits(INPUTS[i]));
				EXPECT_MPFR_MATCH_ALL_ROUNDING(mpfr::Operation::Sin, x,
				__llvm_libc::sinf(x), 0.5);
				}
	}			}

	// For small values, sin(x) is x.			// For small values, sin(x) is x.
	TEST(LlvmLibcSinfTest, SmallValues) {			TEST(LlvmLibcSinfTest, SmallValues) {
	float x = float(FPBits(uint32_t(0x17800000)));			float x = float(FPBits(0x1780'0000U));
	float result = __llvm_libc::sinf(x);			EXPECT_MPFR_MATCH_ALL_ROUNDING(mpfr::Operation::Sin, x, __llvm_libc::sinf(x),
	EXPECT_MPFR_MATCH(mpfr::Operation::Sin, x, result, 1.0);			0.5);
	EXPECT_FP_EQ(x, result);
				x = float(FPBits(0x0040'0000U));
	x = float(FPBits(uint32_t(0x00400000)));			EXPECT_MPFR_MATCH_ALL_ROUNDING(mpfr::Operation::Sin, x, __llvm_libc::sinf(x),
	result = __llvm_libc::sinf(x);			0.5);
	EXPECT_MPFR_MATCH(mpfr::Operation::Sin, x, result, 1.0);
	EXPECT_FP_EQ(x, result);
	}			}

	// SDCOMP-26094: check sinf in the cases for which the range reducer			// SDCOMP-26094: check sinf in the cases for which the range reducer
	// returns values furthest beyond its nominal upper bound of pi/4.			// returns values furthest beyond its nominal upper bound of pi/4.
	TEST(LlvmLibcSinfTest, SDCOMP_26094) {			TEST(LlvmLibcSinfTest, SDCOMP_26094) {
	for (uint32_t v : SDCOMP26094_VALUES) {			for (uint32_t v : SDCOMP26094_VALUES) {
	float x = float(FPBits((v)));			float x = float(FPBits((v)));
	EXPECT_MPFR_MATCH(mpfr::Operation::Sin, x, __llvm_libc::sinf(x), 1.0);			EXPECT_MPFR_MATCH_ALL_ROUNDING(mpfr::Operation::Sin, x,
				__llvm_libc::sinf(x), 0.5);
	}			}
	}			}