This is an archive of the discontinued LLVM Phabricator instance.

Paths

Table of Contentst

-
libc/
-
config/
-
darwin/arm/
-
arm/
-
entrypoints.txt
-
linux/
-
aarch64/
-
entrypoints.txt
-
riscv64/
-
entrypoints.txt
-
x86_64/
-
entrypoints.txt
-
windows/
-
entrypoints.txt
-
docs/math/
-
math/
-
index.rst
-
spec/
-
stdc.td
-
src/
-
__support/FPUtil/
-
FPUtil/
-
PolyEval.h
-
double_double.h
-
dyadic_float.h
-
multiply_add.h
-
math/
-
CMakeLists.txt
-
exp.h
-
generic/
-
CMakeLists.txt
-
exp.cpp
-
test/src/math/
-
src/
-
math/
-
CMakeLists.txt
-
exp_test.cpp
-
log10_test.cpp

Differential D158551

[libc][math] Implement double precision exp function correctly rounded for all rounding modes.
ClosedPublic

Authored by lntue on Aug 22 2023, 2:07 PM.

Download Raw Diff

Details

Reviewers

michaelrj
sivachandra
cqlauter
zimmermann6

Commits

rG434bf1608445: [libc][math] Implement double precision exp function correctly rounded for all…

Summary

Implement double precision exp function correctly rounded for all
rounding modes. Using 4 stages:

Range reduction: reduce to exp(x) = 2^hi * 2^mid1 * 2^mid2 * exp(lo).
Use 64 + 64 LUT for 2^mid1 and 2^mid2, and use cubic Taylor polynomial to

approximate (exp(lo) - 1) / lo in double precision. Relative error in this
step is bounded by 1.5 * 2^-63.

If the rounding test fails, use degree-6 Taylor polynomial to approximate

exp(lo) in double-double precision. Relative error in this step is bounded by
2^-99.

If the rounding test still fails, use degree-7 Taylor polynomial to compute

exp(lo) in ~128-bit precision.

Diff Detail

Repository: rG LLVM Github Monorepo

Event Timeline

lntue created this revision.Aug 22 2023, 2:07 PM

Herald added projects: Restricted Project, Restricted Project. · View Herald TranscriptAug 22 2023, 2:07 PM

Herald added subscribers: luke, sunshaoce, frasercrmck and 21 others. · View Herald Transcript

lntue requested review of this revision.Aug 22 2023, 2:07 PM

Herald added a subscriber: wangpc. · View Herald TranscriptAug 22 2023, 2:07 PM

Harbormaster completed remote builds in B254191: Diff 552501.Aug 22 2023, 2:25 PM

Remove unused template.

Harbormaster completed remote builds in B254200: Diff 552512.Aug 22 2023, 2:43 PM

when I apply this patch on main (commit 96b5ea6), and I build libc.a, I don't find the symbol exp into libc.a:

zimmerma@biscotte:/localdisk/zimmerma/llvm-project$ nm /localdisk/zimmerma/llvm-project/build/projects/libc/lib/libc.a | grep -w exp
zimmerma@biscotte:/localdisk/zimmerma/llvm-project$

Maybe I did something wrong?

This revision now requires changes to proceed.Aug 23 2023, 1:21 AM

Hi Paul, do you mind sending me the build commands that you used and their
logs, and/or attaching the generated libc.a.

Thanks,

Tue

In D158551#4609925, @lntue wrote:
Hi Paul, do you mind sending me the build commands that you used and their
logs, and/or attaching the generated libc.a.

Thanks,
Tue

sure, I'll do that in a private mail.

after help from Tue I was able to use this patch with the core-math tools. All tests pass, and on my machine (AMD EPYC 7282) I get the following timings:

zimmerma@biscotte:/tmp/core-math$ LIBM=/localdisk/zimmerma/llvm-project/build/projects/libc/lib/libm.a CORE_MATH_PERF_MODE=rdtsc ./perf.sh exp
GNU libc version: 2.37
GNU libc release: stable
[####################] 100 %
Ntrial = 20 ; Min = 18.639 + 0.385 clc/call; Median-Min = 0.327 clc/call; Max = 21.116 clc/call;
[####################] 100 %
Ntrial = 20 ; Min = 13.164 + 0.253 clc/call; Median-Min = 0.280 clc/call; Max = 13.590 clc/call;
[####################] 100 %
Ntrial = 20 ; Min = 19.464 + 0.307 clc/call; Median-Min = 0.308 clc/call; Max = 19.828 clc/call;

and for latency:

zimmerma@biscotte:/tmp/core-math$ LIBM=/localdisk/zimmerma/llvm-project/build/projects/libc/lib/libm.a CORE_MATH_PERF_MODE=rdtsc PERF_ARGS=--latency ./perf.sh exp
GNU libc version: 2.37
GNU libc release: stable
[####################] 100 %
Ntrial = 20 ; Min = 49.458 + 0.339 clc/call; Median-Min = 0.269 clc/call; Max = 50.109 clc/call;
[####################] 100 %
Ntrial = 20 ; Min = 39.380 + 0.326 clc/call; Median-Min = 0.299 clc/call; Max = 39.948 clc/call;
[####################] 100 %
Ntrial = 20 ; Min = 54.919 + 0.347 clc/call; Median-Min = 0.310 clc/call; Max = 55.561 clc/call;

This revision is now accepted and ready to land.Aug 24 2023, 1:07 AM

Closed by commit rG434bf1608445: [libc][math] Implement double precision exp function correctly rounded for all… (authored by lntue). · Explain WhyAug 24 2023, 7:17 AM

This revision was automatically updated to reflect the committed changes.

lntue added a commit: rG434bf1608445: [libc][math] Implement double precision exp function correctly rounded for all….

lntue mentioned this in D158812: [libc][math] Implement double precision exp2 function correctly rounded for all rounding modes..Aug 24 2023, 9:46 PM

lntue mentioned this in rG8ca614aa22df: [libc][math] Implement double precision exp2 function correctly rounded for all….Aug 25 2023, 7:15 AM

lntue mentioned this in D159143: [libc][math] Implement double precision exp10 function correctly rounded for all rounding modes..Aug 29 2023, 3:17 PM

lntue mentioned this in rG76bb278ebb3e: [libc][math] Implement double precision exp10 function correctly rounded for….Aug 30 2023, 5:43 AM

Revision Contents

Path

Size

libc/

config/

darwin/

arm/

entrypoints.txt

1 line

linux/

aarch64/

entrypoints.txt

1 line

riscv64/

entrypoints.txt

1 line

x86_64/

entrypoints.txt

1 line

windows/

entrypoints.txt

1 line

docs/

math/

index.rst

4 lines

spec/

stdc.td

2 lines

src/

__support/

FPUtil/

7 lines

26 lines

80 lines

8 lines

math/

CMakeLists.txt

1 line

exp.h

18 lines

generic/

CMakeLists.txt

23 lines

exp.cpp

595 lines

test/

src/

math/

CMakeLists.txt

14 lines

exp_test.cpp

123 lines

log10_test.cpp

5 lines

Diff 552512

libc/config/darwin/arm/entrypoints.txt

Show First 20 Lines • Show All 123 Lines • ▼ Show 20 Lines	set(TARGET_LIBM_ENTRYPOINTS
libc.src.math.copysignf		libc.src.math.copysignf
libc.src.math.copysignl		libc.src.math.copysignl
libc.src.math.ceil		libc.src.math.ceil
libc.src.math.ceilf		libc.src.math.ceilf
libc.src.math.ceill		libc.src.math.ceill
libc.src.math.coshf		libc.src.math.coshf
libc.src.math.cosf		libc.src.math.cosf
libc.src.math.erff		libc.src.math.erff
		libc.src.math.exp
libc.src.math.expf		libc.src.math.expf
libc.src.math.exp10f		libc.src.math.exp10f
libc.src.math.exp2f		libc.src.math.exp2f
libc.src.math.expm1f		libc.src.math.expm1f
libc.src.math.fabs		libc.src.math.fabs
libc.src.math.fabsf		libc.src.math.fabsf
libc.src.math.fabsl		libc.src.math.fabsl
libc.src.math.fdim		libc.src.math.fdim
▲ Show 20 Lines • Show All 90 Lines • Show Last 20 Lines

libc/config/linux/aarch64/entrypoints.txt

Show First 20 Lines • Show All 237 Lines • ▼ Show 20 Lines	set(TARGET_LIBM_ENTRYPOINTS
libc.src.math.copysignf		libc.src.math.copysignf
libc.src.math.copysignl		libc.src.math.copysignl
libc.src.math.ceil		libc.src.math.ceil
libc.src.math.ceilf		libc.src.math.ceilf
libc.src.math.ceill		libc.src.math.ceill
libc.src.math.coshf		libc.src.math.coshf
libc.src.math.cosf		libc.src.math.cosf
libc.src.math.erff		libc.src.math.erff
		libc.src.math.exp
libc.src.math.expf		libc.src.math.expf
libc.src.math.exp10f		libc.src.math.exp10f
libc.src.math.exp2f		libc.src.math.exp2f
libc.src.math.expm1f		libc.src.math.expm1f
libc.src.math.fabs		libc.src.math.fabs
libc.src.math.fabsf		libc.src.math.fabsf
libc.src.math.fabsl		libc.src.math.fabsl
libc.src.math.fdim		libc.src.math.fdim
▲ Show 20 Lines • Show All 242 Lines • Show Last 20 Lines

libc/config/linux/riscv64/entrypoints.txt

Show First 20 Lines • Show All 246 Lines • ▼ Show 20 Lines	set(TARGET_LIBM_ENTRYPOINTS
libc.src.math.copysignf		libc.src.math.copysignf
libc.src.math.copysignl		libc.src.math.copysignl
libc.src.math.ceil		libc.src.math.ceil
libc.src.math.ceilf		libc.src.math.ceilf
libc.src.math.ceill		libc.src.math.ceill
libc.src.math.coshf		libc.src.math.coshf
libc.src.math.cosf		libc.src.math.cosf
libc.src.math.erff		libc.src.math.erff
		libc.src.math.exp
libc.src.math.expf		libc.src.math.expf
libc.src.math.exp10f		libc.src.math.exp10f
libc.src.math.exp2f		libc.src.math.exp2f
libc.src.math.expm1f		libc.src.math.expm1f
libc.src.math.fabs		libc.src.math.fabs
libc.src.math.fabsf		libc.src.math.fabsf
libc.src.math.fabsl		libc.src.math.fabsl
libc.src.math.fdim		libc.src.math.fdim
▲ Show 20 Lines • Show All 273 Lines • Show Last 20 Lines

libc/config/linux/x86_64/entrypoints.txt

Show First 20 Lines • Show All 250 Lines • ▼ Show 20 Lines	set(TARGET_LIBM_ENTRYPOINTS
libc.src.math.copysignl		libc.src.math.copysignl
libc.src.math.ceil		libc.src.math.ceil
libc.src.math.ceilf		libc.src.math.ceilf
libc.src.math.ceill		libc.src.math.ceill
libc.src.math.cos		libc.src.math.cos
libc.src.math.coshf		libc.src.math.coshf
libc.src.math.cosf		libc.src.math.cosf
libc.src.math.erff		libc.src.math.erff
		libc.src.math.exp
libc.src.math.expf		libc.src.math.expf
libc.src.math.exp10f		libc.src.math.exp10f
libc.src.math.exp2f		libc.src.math.exp2f
libc.src.math.expm1f		libc.src.math.expm1f
libc.src.math.fabs		libc.src.math.fabs
libc.src.math.fabsf		libc.src.math.fabsf
libc.src.math.fabsl		libc.src.math.fabsl
libc.src.math.fdim		libc.src.math.fdim
▲ Show 20 Lines • Show All 274 Lines • Show Last 20 Lines

libc/config/windows/entrypoints.txt

Show First 20 Lines • Show All 122 Lines • ▼ Show 20 Lines	set(TARGET_LIBM_ENTRYPOINTS
libc.src.math.copysignl		libc.src.math.copysignl
libc.src.math.ceil		libc.src.math.ceil
libc.src.math.ceilf		libc.src.math.ceilf
libc.src.math.ceill		libc.src.math.ceill
libc.src.math.cos		libc.src.math.cos
libc.src.math.cosf		libc.src.math.cosf
libc.src.math.coshf		libc.src.math.coshf
libc.src.math.erff		libc.src.math.erff
		libc.src.math.exp
libc.src.math.expf		libc.src.math.expf
libc.src.math.exp10f		libc.src.math.exp10f
libc.src.math.exp2f		libc.src.math.exp2f
libc.src.math.expm1f		libc.src.math.expm1f
libc.src.math.fabs		libc.src.math.fabs
libc.src.math.fabsf		libc.src.math.fabsf
libc.src.math.fabsl		libc.src.math.fabsl
libc.src.math.fdim		libc.src.math.fdim
▲ Show 20 Lines • Show All 93 Lines • Show Last 20 Lines

libc/docs/math/index.rst

	Show First 20 Lines • Show All 346 Lines • ▼ Show 20 Lines
	\| erfl \| \| \| \| \| \| \| \| \| \| \| \| \|			\| erfl \| \| \| \| \| \| \| \| \| \| \| \| \|
	+------------+---------+---------+---------+---------+---------+---------+---------+---------+---------+---------+---------+---------+			+------------+---------+---------+---------+---------+---------+---------+---------+---------+---------+---------+---------+---------+
	\| erfc \| \| \| \| \| \| \| \| \| \| \| \| \|			\| erfc \| \| \| \| \| \| \| \| \| \| \| \| \|
	+------------+---------+---------+---------+---------+---------+---------+---------+---------+---------+---------+---------+---------+			+------------+---------+---------+---------+---------+---------+---------+---------+---------+---------+---------+---------+---------+
	\| erfcf \| \| \| \| \| \| \| \| \| \| \| \| \|			\| erfcf \| \| \| \| \| \| \| \| \| \| \| \| \|
	+------------+---------+---------+---------+---------+---------+---------+---------+---------+---------+---------+---------+---------+			+------------+---------+---------+---------+---------+---------+---------+---------+---------+---------+---------+---------+---------+
	\| erfcl \| \| \| \| \| \| \| \| \| \| \| \| \|			\| erfcl \| \| \| \| \| \| \| \| \| \| \| \| \|
	+------------+---------+---------+---------+---------+---------+---------+---------+---------+---------+---------+---------+---------+			+------------+---------+---------+---------+---------+---------+---------+---------+---------+---------+---------+---------+---------+
	\| exp \| \| \| \| \| \| \| \| \| \| \| \| \|			\| exp \| \|check\| \| \|check\| \| \| \|check\| \| \|check\| \| \| \| \|check\| \| \| \| \| \|
	+------------+---------+---------+---------+---------+---------+---------+---------+---------+---------+---------+---------+---------+			+------------+---------+---------+---------+---------+---------+---------+---------+---------+---------+---------+---------+---------+
	\| expf \| \|check\| \| \|check\| \| \| \|check\| \| \|check\| \| \| \| \|check\| \| \| \| \| \|			\| expf \| \|check\| \| \|check\| \| \| \|check\| \| \|check\| \| \| \| \|check\| \| \| \| \| \|
	+------------+---------+---------+---------+---------+---------+---------+---------+---------+---------+---------+---------+---------+			+------------+---------+---------+---------+---------+---------+---------+---------+---------+---------+---------+---------+---------+
	\| expl \| \| \| \| \| \| \| \| \| \| \| \| \|			\| expl \| \| \| \| \| \| \| \| \| \| \| \| \|
	+------------+---------+---------+---------+---------+---------+---------+---------+---------+---------+---------+---------+---------+			+------------+---------+---------+---------+---------+---------+---------+---------+---------+---------+---------+---------+---------+
	\| exp10 \| \| \| \| \| \| \| \| \| \| \| \| \|			\| exp10 \| \| \| \| \| \| \| \| \| \| \| \| \|
	+------------+---------+---------+---------+---------+---------+---------+---------+---------+---------+---------+---------+---------+			+------------+---------+---------+---------+---------+---------+---------+---------+---------+---------+---------+---------+---------+
	\| exp10f \| \|check\| \| \|check\| \| \| \|check\| \| \|check\| \| \| \| \|check\| \| \| \| \| \|			\| exp10f \| \|check\| \| \|check\| \| \| \|check\| \| \|check\| \| \| \| \|check\| \| \| \| \| \|
	▲ Show 20 Lines • Show All 114 Lines • ▼ Show 20 Lines
	acosh \|check\|			acosh \|check\|
	asin \|check\|			asin \|check\|
	asinh \|check\|			asinh \|check\|
	atan \|check\|			atan \|check\|
	atanh \|check\|			atanh \|check\|
	cos \|check\| large			cos \|check\| large
	cosh \|check\|			cosh \|check\|
	erf \|check\|			erf \|check\|
	exp \|check\|			exp \|check\| \|check\|
	exp10 \|check\|			exp10 \|check\|
	exp2 \|check\|			exp2 \|check\|
	expm1 \|check\|			expm1 \|check\|
	fma \|check\| \|check\|			fma \|check\| \|check\|
	hypot \|check\| \|check\|			hypot \|check\| \|check\|
	log \|check\| \|check\|			log \|check\| \|check\|
	log10 \|check\| \|check\|			log10 \|check\| \|check\|
	log1p \|check\| \|check\|			log1p \|check\| \|check\|
	▲ Show 20 Lines • Show All 105 Lines • Show Last 20 Lines

libc/spec/stdc.td

Show First 20 Lines • Show All 428 Lines • ▼ Show 20 Lines	HeaderSpec Math = HeaderSpec<
FunctionSpec<"cosf", RetValSpec<FloatType>, [ArgSpec<FloatType>]>,		FunctionSpec<"cosf", RetValSpec<FloatType>, [ArgSpec<FloatType>]>,
FunctionSpec<"sin", RetValSpec<DoubleType>, [ArgSpec<DoubleType>]>,		FunctionSpec<"sin", RetValSpec<DoubleType>, [ArgSpec<DoubleType>]>,
FunctionSpec<"sinf", RetValSpec<FloatType>, [ArgSpec<FloatType>]>,		FunctionSpec<"sinf", RetValSpec<FloatType>, [ArgSpec<FloatType>]>,
FunctionSpec<"tan", RetValSpec<DoubleType>, [ArgSpec<DoubleType>]>,		FunctionSpec<"tan", RetValSpec<DoubleType>, [ArgSpec<DoubleType>]>,
FunctionSpec<"tanf", RetValSpec<FloatType>, [ArgSpec<FloatType>]>,		FunctionSpec<"tanf", RetValSpec<FloatType>, [ArgSpec<FloatType>]>,

FunctionSpec<"erff", RetValSpec<FloatType>, [ArgSpec<FloatType>]>,		FunctionSpec<"erff", RetValSpec<FloatType>, [ArgSpec<FloatType>]>,

		FunctionSpec<"exp", RetValSpec<DoubleType>, [ArgSpec<DoubleType>]>,
FunctionSpec<"expf", RetValSpec<FloatType>, [ArgSpec<FloatType>]>,		FunctionSpec<"expf", RetValSpec<FloatType>, [ArgSpec<FloatType>]>,

FunctionSpec<"exp2f", RetValSpec<FloatType>, [ArgSpec<FloatType>]>,		FunctionSpec<"exp2f", RetValSpec<FloatType>, [ArgSpec<FloatType>]>,
FunctionSpec<"expm1f", RetValSpec<FloatType>, [ArgSpec<FloatType>]>,		FunctionSpec<"expm1f", RetValSpec<FloatType>, [ArgSpec<FloatType>]>,

FunctionSpec<"remainderf", RetValSpec<FloatType>, [ArgSpec<FloatType>, ArgSpec<FloatType>]>,		FunctionSpec<"remainderf", RetValSpec<FloatType>, [ArgSpec<FloatType>, ArgSpec<FloatType>]>,
FunctionSpec<"remainder", RetValSpec<DoubleType>, [ArgSpec<DoubleType>, ArgSpec<DoubleType>]>,		FunctionSpec<"remainder", RetValSpec<DoubleType>, [ArgSpec<DoubleType>, ArgSpec<DoubleType>]>,
FunctionSpec<"remainderl", RetValSpec<LongDoubleType>, [ArgSpec<LongDoubleType>, ArgSpec<LongDoubleType>]>,		FunctionSpec<"remainderl", RetValSpec<LongDoubleType>, [ArgSpec<LongDoubleType>, ArgSpec<LongDoubleType>]>,

FunctionSpec<"remquof", RetValSpec<FloatType>, [ArgSpec<FloatType>, ArgSpec<FloatType>, ArgSpec<IntPtr>]>,		FunctionSpec<"remquof", RetValSpec<FloatType>, [ArgSpec<FloatType>, ArgSpec<FloatType>, ArgSpec<IntPtr>]>,
▲ Show 20 Lines • Show All 702 Lines • Show Last 20 Lines

libc/src/__support/FPUtil/PolyEval.h

	//===-- Common header for PolyEval implementations --------------- C++ --===//			//===-- Common header for PolyEval implementations --------------- C++ --===//
	//			//
	// Part of the LLVM Project, under the Apache License v2.0 with LLVM Exceptions.			// Part of the LLVM Project, under the Apache License v2.0 with LLVM Exceptions.
	// See https://llvm.org/LICENSE.txt for license information.			// See https://llvm.org/LICENSE.txt for license information.
	// SPDX-License-Identifier: Apache-2.0 WITH LLVM-exception			// SPDX-License-Identifier: Apache-2.0 WITH LLVM-exception
	//			//
	//===----------------------------------------------------------------------===//			//===----------------------------------------------------------------------===//

	#ifndef LLVM_LIBC_SRC_SUPPORT_FPUTIL_POLYEVAL_H			#ifndef LLVM_LIBC_SRC_SUPPORT_FPUTIL_POLYEVAL_H
	#define LLVM_LIBC_SRC_SUPPORT_FPUTIL_POLYEVAL_H			#define LLVM_LIBC_SRC_SUPPORT_FPUTIL_POLYEVAL_H

	#include "multiply_add.h"			#include "multiply_add.h"
				#include "src/__support/CPP/type_traits.h"
	#include "src/__support/common.h"			#include "src/__support/common.h"

	// Evaluate polynomial using Horner's Scheme:			// Evaluate polynomial using Horner's Scheme:
	// With polyeval(x, a_0, a_1, ..., a_n) = a_n * x^n + ... + a_1 * x + a_0, we			// With polyeval(x, a_0, a_1, ..., a_n) = a_n * x^n + ... + a_1 * x + a_0, we
	// evaluated it as: a_0 + x * (a_1 + x * ( ... (a_(n-1) + x * a_n) ... ) ) ).			// evaluated it as: a_0 + x * (a_1 + x * ( ... (a_(n-1) + x * a_n) ... ) ) ).
	// We will use FMA instructions if available.			// We will use FMA instructions if available.
	// Example: to evaluate x^3 + 2x^2 + 3x + 4, call			// Example: to evaluate x^3 + 2x^2 + 3x + 4, call
	// polyeval( x, 4.0, 3.0, 2.0, 1.0 )			// polyeval( x, 4.0, 3.0, 2.0, 1.0 )

	namespace __llvm_libc {			namespace __llvm_libc {
	namespace fputil {			namespace fputil {

	template <typename T> LIBC_INLINE T polyeval(T, T a0) { return a0; }			template <typename T> LIBC_INLINE T polyeval(const T &, const T &a0) {
				return a0;
				}

	template <typename T, typename... Ts>			template <typename T, typename... Ts>
	LIBC_INLINE T polyeval(T x, T a0, Ts... a) {			LIBC_INLINE T polyeval(const T &x, const T &a0, const Ts &...a) {
	return multiply_add(x, polyeval(x, a...), a0);			return multiply_add(x, polyeval(x, a...), a0);
	}			}

	} // namespace fputil			} // namespace fputil
	} // namespace __llvm_libc			} // namespace __llvm_libc

	#endif // LLVM_LIBC_SRC_SUPPORT_FPUTIL_POLYEVAL_H			#endif // LLVM_LIBC_SRC_SUPPORT_FPUTIL_POLYEVAL_H

libc/src/__support/FPUtil/double_double.h

Show All 25 Lines	LIBC_INLINE constexpr DoubleDouble exact_add(double a, double b) {
DoubleDouble r{0.0, 0.0};		DoubleDouble r{0.0, 0.0};
r.hi = a + b;		r.hi = a + b;
double t = r.hi - a;		double t = r.hi - a;
r.lo = b - t;		r.lo = b - t;
return r;		return r;
}		}

// Assumption: \|a.hi\| >= \|b.hi\|		// Assumption: \|a.hi\| >= \|b.hi\|
LIBC_INLINE constexpr DoubleDouble add(DoubleDouble a, DoubleDouble b) {		LIBC_INLINE constexpr DoubleDouble add(const DoubleDouble &a,
		const DoubleDouble &b) {
DoubleDouble r = exact_add(a.hi, b.hi);		DoubleDouble r = exact_add(a.hi, b.hi);
double lo = a.lo + b.lo;		double lo = a.lo + b.lo;
return exact_add(r.hi, r.lo + lo);		return exact_add(r.hi, r.lo + lo);
}		}

// Assumption: \|a.hi\| >= \|b\|		// Assumption: \|a.hi\| >= \|b\|
LIBC_INLINE constexpr DoubleDouble add(DoubleDouble a, double b) {		LIBC_INLINE constexpr DoubleDouble add(const DoubleDouble &a, double b) {
DoubleDouble r = exact_add(a.hi, b);		DoubleDouble r = exact_add(a.hi, b);
return exact_add(r.hi, r.lo + a.lo);		return exact_add(r.hi, r.lo + a.lo);
}		}

// Velkamp's Splitting for double precision.		// Velkamp's Splitting for double precision.
LIBC_INLINE constexpr DoubleDouble split(double a) {		LIBC_INLINE constexpr DoubleDouble split(double a) {
DoubleDouble r{0.0, 0.0};		DoubleDouble r{0.0, 0.0};
// Splitting constant = 2^ceil(prec(double)/2) + 1 = 2^27 + 1.		// Splitting constant = 2^ceil(prec(double)/2) + 1 = 2^27 + 1.
Show All 20 Lines	#else
double t2 = as.hi * bs.lo + t1;		double t2 = as.hi * bs.lo + t1;
double t3 = as.lo * bs.hi + t2;		double t3 = as.lo * bs.hi + t2;
r.lo = as.lo * bs.lo + t3;		r.lo = as.lo * bs.lo + t3;
#endif // LIBC_TARGET_CPU_HAS_FMA		#endif // LIBC_TARGET_CPU_HAS_FMA

return r;		return r;
}		}

LIBC_INLINE DoubleDouble quick_mult(DoubleDouble a, DoubleDouble b) {		LIBC_INLINE DoubleDouble quick_mult(double a, const DoubleDouble &b) {
		DoubleDouble r = exact_mult(a, b.hi);
		r.lo = multiply_add(a, b.lo, r.lo);
		return r;
		}

		LIBC_INLINE DoubleDouble quick_mult(const DoubleDouble &a,
		const DoubleDouble &b) {
DoubleDouble r = exact_mult(a.hi, b.hi);		DoubleDouble r = exact_mult(a.hi, b.hi);
double t1 = fputil::multiply_add(a.hi, b.lo, r.lo);		double t1 = multiply_add(a.hi, b.lo, r.lo);
double t2 = fputil::multiply_add(a.lo, b.hi, t1);		double t2 = multiply_add(a.lo, b.hi, t1);
r.lo = t2;		r.lo = t2;
return r;		return r;
}		}

		// Assuming \|c\| >= \|a * b\|.
		template <>
		LIBC_INLINE DoubleDouble multiply_add<DoubleDouble>(const DoubleDouble &a,
		const DoubleDouble &b,
		const DoubleDouble &c) {
		return add(c, quick_mult(a, b));
		}

} // namespace __llvm_libc::fputil		} // namespace __llvm_libc::fputil

#endif // LLVM_LIBC_SRC_SUPPORT_FPUTIL_DOUBLEDOUBLE_H		#endif // LLVM_LIBC_SRC_SUPPORT_FPUTIL_DOUBLEDOUBLE_H

libc/src/__support/FPUtil/dyadic_float.h

Show First 20 Lines • Show All 76 Lines • ▼ Show 20 Lines	template <size_t Bits> struct DyadicFloat {

// Used for aligning exponents. Output might not be normalized.		// Used for aligning exponents. Output might not be normalized.
DyadicFloat &shift_right(int shift_length) {		DyadicFloat &shift_right(int shift_length) {
exponent += shift_length;		exponent += shift_length;
mantissa >>= static_cast<size_t>(shift_length);		mantissa >>= static_cast<size_t>(shift_length);
return *this;		return *this;
}		}

// Assume that it is already normalized and output is also normal.		// Assume that it is already normalized and output is not underflow.
// Output is rounded correctly with respect to the current rounding mode.		// Output is rounded correctly with respect to the current rounding mode.
// TODO(lntue): Test or add support for denormal output.		// TODO(lntue): Add support for underflow.
// TODO(lntue): Test or add specialization for x86 long double.		// TODO(lntue): Test or add specialization for x86 long double.
template <typename T, typename = cpp::enable_if_t<		template <typename T, typename = cpp::enable_if_t<
cpp::is_floating_point_v<T> &&		cpp::is_floating_point_v<T> &&
(FloatProperties<T>::MANTISSA_WIDTH < Bits),		(FloatProperties<T>::MANTISSA_WIDTH < Bits),
void>>		void>>
explicit operator T() const {		explicit operator T() const {
// TODO(lntue): Do we need to treat signed zeros properly?		// TODO(lntue): Do we need to treat signed zeros properly?
if (mantissa.is_zero())		if (mantissa.is_zero())
return 0.0;		return 0.0;

// Assume that it is normalized, and output is also normal.		// Assume that it is normalized, and output is also normal.
constexpr size_t PRECISION = FloatProperties<T>::MANTISSA_WIDTH + 1;		constexpr size_t PRECISION = FloatProperties<T>::MANTISSA_WIDTH + 1;
using output_bits_t = typename FPBits<T>::UIntType;		using output_bits_t = typename FPBits<T>::UIntType;

MantissaType m_hi(mantissa >> (Bits - PRECISION));		int exp_hi = exponent + static_cast<int>((Bits - 1) +
auto d_hi = FPBits<T>::create_value(		FloatProperties<T>::EXPONENT_BIAS);
sign, exponent + (Bits - 1) + FloatProperties<T>::EXPONENT_BIAS,
output_bits_t(m_hi) & FloatProperties<T>::MANTISSA_MASK);

const MantissaType round_mask = MantissaType(1) << (Bits - PRECISION - 1);		bool denorm = false;
		uint32_t shift = Bits - PRECISION;
		if (LIBC_UNLIKELY(exp_hi <= 0)) {
		// Output is denormal.
		denorm = true;
		shift = (Bits - PRECISION) + static_cast<uint32_t>(1 - exp_hi);

		exp_hi = FloatProperties<T>::EXPONENT_BIAS;
		}

		int exp_lo = exp_hi - PRECISION - 1;

		MantissaType m_hi(mantissa >> shift);

		T d_hi = FPBits<T>::create_value(sign, exp_hi,
		output_bits_t(m_hi) &
		FloatProperties<T>::MANTISSA_MASK)
		.get_val();

		const MantissaType round_mask = MantissaType(1) << (shift - 1);
const MantissaType sticky_mask = round_mask - MantissaType(1);		const MantissaType sticky_mask = round_mask - MantissaType(1);

bool round_bit = !(mantissa & round_mask).is_zero();		bool round_bit = !(mantissa & round_mask).is_zero();
bool sticky_bit = !(mantissa & sticky_mask).is_zero();		bool sticky_bit = !(mantissa & sticky_mask).is_zero();
int round_and_sticky = int(round_bit) * 2 + int(sticky_bit);		int round_and_sticky = int(round_bit) * 2 + int(sticky_bit);
auto d_lo = FPBits<T>::create_value(sign,
exponent + (Bits - PRECISION - 2) +		T d_lo;
FloatProperties<T>::EXPONENT_BIAS,		if (LIBC_UNLIKELY(exp_lo <= 0)) {
output_bits_t(0));		// d_lo is denormal, but the output is normal.
		int scale_up_exponent = 2 * PRECISION;
		T scale_up_factor =
		FPBits<T>::create_value(
		sign, FloatProperties<T>::EXPONENT_BIAS + scale_up_exponent,
		output_bits_t(0))
		.get_val();
		T scale_down_factor =
		FPBits<T>::create_value(
		sign, FloatProperties<T>::EXPONENT_BIAS - scale_up_exponent,
		output_bits_t(0))
		.get_val();

		d_lo = FPBits<T>::create_value(sign, exp_lo + scale_up_exponent,
		output_bits_t(0))
		.get_val();

		return multiply_add(d_lo, T(round_and_sticky), d_hi * scale_up_factor) *
		scale_down_factor;
		}

		d_lo = FPBits<T>::create_value(sign, exp_lo, output_bits_t(0)).get_val();

// Still correct without FMA instructions if `d_lo` is not underflow.		// Still correct without FMA instructions if `d_lo` is not underflow.
return multiply_add(d_lo.get_val(), T(round_and_sticky), d_hi.get_val());		T r = multiply_add(d_lo, T(round_and_sticky), d_hi);

		if (LIBC_UNLIKELY(denorm)) {
		// Output is denormal, simply clear the exponent field.
		output_bits_t clear_exp = output_bits_t(exp_hi)
		<< FloatProperties<T>::MANTISSA_WIDTH;
		output_bits_t r_bits = FPBits<T>(r).uintval() - clear_exp;
		return FPBits<T>(r_bits).get_val();
		}

		return r;
}		}

explicit operator MantissaType() const {		explicit operator MantissaType() const {
if (mantissa.is_zero())		if (mantissa.is_zero())
return 0;		return 0;

MantissaType new_mant = mantissa;		MantissaType new_mant = mantissa;
if (exponent > 0) {		if (exponent > 0) {
▲ Show 20 Lines • Show All 93 Lines • ▼ Show 20 Lines	if (result.mantissa.val[DyadicFloat<Bits>::MantissaType::WORDCOUNT - 1] >>
0)		0)
result.shift_left(1);		result.shift_left(1);
} else {		} else {
result.mantissa = (typename DyadicFloat<Bits>::MantissaType)(0);		result.mantissa = (typename DyadicFloat<Bits>::MantissaType)(0);
}		}
return result;		return result;
}		}

		// Simple polynomial approximation.
		template <size_t Bits>
		constexpr DyadicFloat<Bits> multiply_add(const DyadicFloat<Bits> &a,
		const DyadicFloat<Bits> &b,
		const DyadicFloat<Bits> &c) {
		return quick_add(c, quick_mul(a, b));
		}

// Simple exponentiation implementation for printf. Only handles positive		// Simple exponentiation implementation for printf. Only handles positive
// exponents, since division isn't implemented.		// exponents, since division isn't implemented.
template <size_t Bits>		template <size_t Bits>
constexpr DyadicFloat<Bits> pow_n(DyadicFloat<Bits> a, uint32_t power) {		constexpr DyadicFloat<Bits> pow_n(DyadicFloat<Bits> a, uint32_t power) {
DyadicFloat<Bits> result = 1.0;		DyadicFloat<Bits> result = 1.0;
DyadicFloat<Bits> cur_power = a;		DyadicFloat<Bits> cur_power = a;

while (power > 0) {		while (power > 0) {
Show All 19 Lines

libc/src/__support/FPUtil/multiply_add.h

	Show All 14 Lines

	namespace __llvm_libc {			namespace __llvm_libc {
	namespace fputil {			namespace fputil {

	// Implement a simple wrapper for multiply-add operation:			// Implement a simple wrapper for multiply-add operation:
	// multiply_add(x, y, z) = x*y + z			// multiply_add(x, y, z) = x*y + z
	// which uses FMA instructions to speed up if available.			// which uses FMA instructions to speed up if available.

	template <typename T> LIBC_INLINE T multiply_add(T x, T y, T z) {			template <typename T>
				LIBC_INLINE T multiply_add(const T &x, const T &y, const T &z) {
	return x * y + z;			return x * y + z;
	}			}

	} // namespace fputil			} // namespace fputil
	} // namespace __llvm_libc			} // namespace __llvm_libc

	#if defined(LIBC_TARGET_CPU_HAS_FMA)			#if defined(LIBC_TARGET_CPU_HAS_FMA)

	// FMA instructions are available.			// FMA instructions are available.
	#include "FMA.h"			#include "FMA.h"

	namespace __llvm_libc {			namespace __llvm_libc {
	namespace fputil {			namespace fputil {

	template <> LIBC_INLINE float multiply_add<float>(float x, float y, float z) {			LIBC_INLINE float multiply_add(float x, float y, float z) {
	return fma(x, y, z);			return fma(x, y, z);
	}			}

	template <>			LIBC_INLINE double multiply_add(double x, double y, double z) {
	LIBC_INLINE double multiply_add<double>(double x, double y, double z) {
	return fma(x, y, z);			return fma(x, y, z);
	}			}

	} // namespace fputil			} // namespace fputil
	} // namespace __llvm_libc			} // namespace __llvm_libc

	#endif // LIBC_TARGET_CPU_HAS_FMA			#endif // LIBC_TARGET_CPU_HAS_FMA

	#endif // LLVM_LIBC_SRC_SUPPORT_FPUTIL_MULTIPLY_ADD_H			#endif // LLVM_LIBC_SRC_SUPPORT_FPUTIL_MULTIPLY_ADD_H

libc/src/math/CMakeLists.txt

	Show First 20 Lines • Show All 73 Lines • ▼ Show 20 Lines

	add_math_entrypoint_object(cos)			add_math_entrypoint_object(cos)
	add_math_entrypoint_object(cosf)			add_math_entrypoint_object(cosf)
	add_math_entrypoint_object(cosh)			add_math_entrypoint_object(cosh)
	add_math_entrypoint_object(coshf)			add_math_entrypoint_object(coshf)

	add_math_entrypoint_object(erff)			add_math_entrypoint_object(erff)

				add_math_entrypoint_object(exp)
	add_math_entrypoint_object(expf)			add_math_entrypoint_object(expf)

	add_math_entrypoint_object(exp2f)			add_math_entrypoint_object(exp2f)

	add_math_entrypoint_object(exp10f)			add_math_entrypoint_object(exp10f)

	add_math_entrypoint_object(expm1f)			add_math_entrypoint_object(expm1f)

	▲ Show 20 Lines • Show All 129 Lines • Show Last 20 Lines

libc/src/math/exp.h

This file was added.

				//===-- Implementation header for exp ---------------------------- C++ --===//
				//
				// Part of the LLVM Project, under the Apache License v2.0 with LLVM Exceptions.
				// See https://llvm.org/LICENSE.txt for license information.
				// SPDX-License-Identifier: Apache-2.0 WITH LLVM-exception
				//
				//===----------------------------------------------------------------------===//

				#ifndef LLVM_LIBC_SRC_MATH_EXP_H
				#define LLVM_LIBC_SRC_MATH_EXP_H

				namespace __llvm_libc {

				double exp(double x);

				} // namespace __llvm_libc

				#endif // LLVM_LIBC_SRC_MATH_EXP_H

libc/src/math/generic/CMakeLists.txt

Show First 20 Lines • Show All 543 Lines • ▼ Show 20 Lines	DEPENDS
libc.src.__support.FPUtil.polyeval		libc.src.__support.FPUtil.polyeval
libc.src.__support.macros.optimization		libc.src.__support.macros.optimization
libc.include.math		libc.include.math
COMPILE_OPTIONS		COMPILE_OPTIONS
-O3		-O3
)		)

add_entrypoint_object(		add_entrypoint_object(
		exp
		SRCS
		exp.cpp
		HDRS
		../exp.h
		DEPENDS
		.common_constants
		libc.src.__support.FPUtil.basic_operations
		libc.src.__support.FPUtil.fenv_impl
		libc.src.__support.FPUtil.fp_bits
		libc.src.__support.FPUtil.multiply_add
		libc.src.__support.FPUtil.nearest_integer
		libc.src.__support.FPUtil.polyeval
		libc.src.__support.FPUtil.rounding_mode
		libc.src.__support.macros.optimization
		libc.include.errno
		libc.src.errno.errno
		libc.include.math
		COMPILE_OPTIONS
		-O3
		)

		add_entrypoint_object(
expf		expf
SRCS		SRCS
expf.cpp		expf.cpp
HDRS		HDRS
../expf.h		../expf.h
DEPENDS		DEPENDS
.common_constants		.common_constants
libc.src.__support.FPUtil.basic_operations		libc.src.__support.FPUtil.basic_operations
▲ Show 20 Lines • Show All 1,022 Lines • Show Last 20 Lines

libc/src/math/generic/exp.cpp

This file was added.

				//===-- Double-precision e^x function -------------------------------------===//
				//
				// Part of the LLVM Project, under the Apache License v2.0 with LLVM Exceptions.
				// See https://llvm.org/LICENSE.txt for license information.
				// SPDX-License-Identifier: Apache-2.0 WITH LLVM-exception
				//
				//===----------------------------------------------------------------------===//

				#include "src/math/exp.h"
				#include "common_constants.h" // Lookup tables EXP_M1 and EXP_M2.
				#include "src/__support/CPP/bit.h"
				#include "src/__support/CPP/optional.h"
				#include "src/__support/FPUtil/FEnvImpl.h"
				#include "src/__support/FPUtil/FPBits.h"
				#include "src/__support/FPUtil/PolyEval.h"
				#include "src/__support/FPUtil/double_double.h"
				#include "src/__support/FPUtil/dyadic_float.h"
				#include "src/__support/FPUtil/multiply_add.h"
				#include "src/__support/FPUtil/nearest_integer.h"
				#include "src/__support/FPUtil/rounding_mode.h"
				#include "src/__support/common.h"
				#include "src/__support/macros/optimization.h" // LIBC_UNLIKELY

				#include <errno.h>

				namespace __llvm_libc {

				using fputil::DoubleDouble;
				using Float128 = typename fputil::DyadicFloat<128>;

				// 2^12 * log2(e)
				constexpr double LOG2_E = 0x1.71547652b82fep+0;

				// Error bounds:
				// Errors when using double precision.
				constexpr double ERR_D = 0x1.8p-63;
				// Errors when using double-double precision.
				constexpr double ERR_DD = 0x1.0p-99;

				struct TripleDouble {
				double hi = 0.0;
				double mid = 0.0;
				double lo = 0.0;
				};

				// -2^-12 * log(2)
				// > a = -2^-12 * log(2);
				// > b = round(a, 30, RN);
				// > c = round(a - b, 30, RN);
				// > d = round(a - b - c, D, RN);
				// Errors < 1.5 * 2^-133
				constexpr double MLOG_2_EXP2_M12_HI = -0x1.62e42ffp-13;
				constexpr double MLOG_2_EXP2_M12_MID = 0x1.718432a1b0e26p-47;
				constexpr double MLOG_2_EXP2_M12_MID_30 = 0x1.718432ap-47;
				constexpr double MLOG_2_EXP2_M12_LO = 0x1.b0e2633fe0685p-79;

				// 2^(k * 2^-6), for k = 0..63.
				constexpr TripleDouble EXP_MID1[64] = {
				{0x1p0, 0, 0},
				{0x1.02c9a3e778061p0, -0x1.19083535b085dp-56, -0x1.9085b0a3d74d5p-110},
				{0x1.059b0d3158574p0, 0x1.d73e2a475b465p-55, 0x1.05ff94f8d257ep-110},
				{0x1.0874518759bc8p0, 0x1.186be4bb284ffp-57, 0x1.15820d96b414fp-111},
				{0x1.0b5586cf9890fp0, 0x1.8a62e4adc610bp-54, -0x1.67c9bd6ebf74cp-108},
				{0x1.0e3ec32d3d1a2p0, 0x1.03a1727c57b53p-59, -0x1.5aa76994e9ddbp-113},
				{0x1.11301d0125b51p0, -0x1.6c51039449b3ap-54, 0x1.9d58b988f562dp-109},
				{0x1.1429aaea92dep0, -0x1.32fbf9af1369ep-54, -0x1.2fe7bb4c76416p-108},
				{0x1.172b83c7d517bp0, -0x1.19041b9d78a76p-55, 0x1.4f2406aa13ffp-109},
				{0x1.1a35beb6fcb75p0, 0x1.e5b4c7b4968e4p-55, 0x1.ad36183926ae8p-111},
				{0x1.1d4873168b9aap0, 0x1.e016e00a2643cp-54, 0x1.ea62d0881b918p-110},
				{0x1.2063b88628cd6p0, 0x1.dc775814a8495p-55, -0x1.781dbc16f1ea4p-111},
				{0x1.2387a6e756238p0, 0x1.9b07eb6c70573p-54, -0x1.4d89f9af532ep-109},
				{0x1.26b4565e27cddp0, 0x1.2bd339940e9d9p-55, 0x1.277393a461b77p-110},
				{0x1.29e9df51fdee1p0, 0x1.612e8afad1255p-55, 0x1.de5448560469p-111},
				{0x1.2d285a6e4030bp0, 0x1.0024754db41d5p-54, -0x1.ee9d8f8cb9307p-110},
				{0x1.306fe0a31b715p0, 0x1.6f46ad23182e4p-55, 0x1.7b7b2f09cd0d9p-110},
				{0x1.33c08b26416ffp0, 0x1.32721843659a6p-54, -0x1.406a2ea6cfc6bp-108},
				{0x1.371a7373aa9cbp0, -0x1.63aeabf42eae2p-54, 0x1.87e3e12516bfap-108},
				{0x1.3a7db34e59ff7p0, -0x1.5e436d661f5e3p-56, 0x1.9b0b1ff17c296p-111},
				{0x1.3dea64c123422p0, 0x1.ada0911f09ebcp-55, -0x1.808ba68fa8fb7p-109},
				{0x1.4160a21f72e2ap0, -0x1.ef3691c309278p-58, -0x1.32b43eafc6518p-114},
				{0x1.44e086061892dp0, 0x1.89b7a04ef80dp-59, -0x1.0ac312de3d922p-114},
				{0x1.486a2b5c13cdp0, 0x1.3c1a3b69062fp-56, 0x1.e1eebae743acp-111},
				{0x1.4bfdad5362a27p0, 0x1.d4397afec42e2p-56, 0x1.c06c7745c2b39p-113},
				{0x1.4f9b2769d2ca7p0, -0x1.4b309d25957e3p-54, -0x1.1aa1fd7b685cdp-112},
				{0x1.5342b569d4f82p0, -0x1.07abe1db13cadp-55, 0x1.fa733951f214cp-111},
				{0x1.56f4736b527dap0, 0x1.9bb2c011d93adp-54, -0x1.ff86852a613ffp-111},
				{0x1.5ab07dd485429p0, 0x1.6324c054647adp-54, -0x1.744ee506fdafep-109},
				{0x1.5e76f15ad2148p0, 0x1.ba6f93080e65ep-54, -0x1.95f9ab75fa7d6p-108},
				{0x1.6247eb03a5585p0, -0x1.383c17e40b497p-54, 0x1.5d8e757cfb991p-111},
				{0x1.6623882552225p0, -0x1.bb60987591c34p-54, 0x1.4a337f4dc0a3bp-108},
				{0x1.6a09e667f3bcdp0, -0x1.bdd3413b26456p-54, 0x1.57d3e3adec175p-108},
				{0x1.6dfb23c651a2fp0, -0x1.bbe3a683c88abp-57, 0x1.a59f88abbe778p-115},
				{0x1.71f75e8ec5f74p0, -0x1.16e4786887a99p-55, -0x1.269796953a4c3p-109},
				{0x1.75feb564267c9p0, -0x1.0245957316dd3p-54, -0x1.8f8e7fa19e5e8p-108},
				{0x1.7a11473eb0187p0, -0x1.41577ee04992fp-55, -0x1.4217a932d10d4p-113},
				{0x1.7e2f336cf4e62p0, 0x1.05d02ba15797ep-56, 0x1.70a1427f8fcdfp-112},
				{0x1.82589994cce13p0, -0x1.d4c1dd41532d8p-54, 0x1.0f6ad65cbbac1p-112},
				{0x1.868d99b4492edp0, -0x1.fc6f89bd4f6bap-54, -0x1.f16f65181d921p-109},
				{0x1.8ace5422aa0dbp0, 0x1.6e9f156864b27p-54, -0x1.30644a7836333p-110},
				{0x1.8f1ae99157736p0, 0x1.5cc13a2e3976cp-55, 0x1.3bf26d2b85163p-114},
				{0x1.93737b0cdc5e5p0, -0x1.75fc781b57ebcp-57, 0x1.697e257ac0db2p-111},
				{0x1.97d829fde4e5p0, -0x1.d185b7c1b85d1p-54, 0x1.7edb9d7144b6fp-108},
				{0x1.9c49182a3f09p0, 0x1.c7c46b071f2bep-56, 0x1.6376b7943085cp-110},
				{0x1.a0c667b5de565p0, -0x1.359495d1cd533p-54, 0x1.354084551b4fbp-109},
				{0x1.a5503b23e255dp0, -0x1.d2f6edb8d41e1p-54, -0x1.bfd7adfd63f48p-111},
				{0x1.a9e6b5579fdbfp0, 0x1.0fac90ef7fd31p-54, 0x1.8b16ae39e8cb9p-109},
				{0x1.ae89f995ad3adp0, 0x1.7a1cd345dcc81p-54, 0x1.a7fbc3ae675eap-108},
				{0x1.b33a2b84f15fbp0, -0x1.2805e3084d708p-57, 0x1.2babc0edda4d9p-111},
				{0x1.b7f76f2fb5e47p0, -0x1.5584f7e54ac3bp-56, 0x1.aa64481e1ab72p-111},
				{0x1.bcc1e904bc1d2p0, 0x1.23dd07a2d9e84p-55, 0x1.9a164050e1258p-109},
				{0x1.c199bdd85529cp0, 0x1.11065895048ddp-55, 0x1.99e51125928dap-110},
				{0x1.c67f12e57d14bp0, 0x1.2884dff483cadp-54, -0x1.fc44c329d5cb2p-109},
				{0x1.cb720dcef9069p0, 0x1.503cbd1e949dbp-56, 0x1.d8765566b032ep-110},
				{0x1.d072d4a07897cp0, -0x1.cbc3743797a9cp-54, -0x1.e7044039da0f6p-108},
				{0x1.d5818dcfba487p0, 0x1.2ed02d75b3707p-55, -0x1.ab053b05531fcp-111},
				{0x1.da9e603db3285p0, 0x1.c2300696db532p-54, 0x1.7f6246f0ec615p-108},
				{0x1.dfc97337b9b5fp0, -0x1.1a5cd4f184b5cp-54, 0x1.b7225a944efd6p-108},
				{0x1.e502ee78b3ff6p0, 0x1.39e8980a9cc8fp-55, 0x1.1e92cb3c2d278p-109},
				{0x1.ea4afa2a490dap0, -0x1.e9c23179c2893p-54, -0x1.fc0f242bbf3dep-109},
				{0x1.efa1bee615a27p0, 0x1.dc7f486a4b6bp-54, 0x1.f6dd5d229ff69p-108},
				{0x1.f50765b6e454p0, 0x1.9d3e12dd8a18bp-54, -0x1.4019bffc80ef3p-110},
				{0x1.fa7c1819e90d8p0, 0x1.74853f3a5931ep-55, 0x1.dc060c36f7651p-112},
				};

				// 2^(k * 2^-12), for k = 0..63.
				constexpr TripleDouble EXP_MID2[64] = {
				{0x1p0, 0, 0},
				{0x1.000b175effdc7p0, 0x1.ae8e38c59c72ap-54, 0x1.39726694630e3p-108},
				{0x1.00162f3904052p0, -0x1.7b5d0d58ea8f4p-58, 0x1.e5e06ddd31156p-112},
				{0x1.0021478e11ce6p0, 0x1.4115cb6b16a8ep-54, 0x1.5a0768b51f609p-111},
				{0x1.002c605e2e8cfp0, -0x1.d7c96f201bb2fp-55, 0x1.d008403605217p-111},
				{0x1.003779a95f959p0, 0x1.84711d4c35e9fp-54, 0x1.89bc16f765708p-109},
				{0x1.0042936faa3d8p0, -0x1.0484245243777p-55, -0x1.4535b7f8c1e2dp-109},
				{0x1.004dadb113dap0, -0x1.4b237da2025f9p-54, -0x1.8ba92f6b25456p-108},
				{0x1.0058c86da1c0ap0, -0x1.5e00e62d6b30dp-56, -0x1.30c72e81f4294p-113},
				{0x1.0063e3a559473p0, 0x1.a1d6cedbb9481p-54, -0x1.34a5384e6f0b9p-110},
				{0x1.006eff583fc3dp0, -0x1.4acf197a00142p-54, 0x1.f8d0580865d2ep-108},
				{0x1.007a1b865a8cap0, -0x1.eaf2ea42391a5p-57, -0x1.002bcb3ae9a99p-111},
				{0x1.0085382faef83p0, 0x1.da93f90835f75p-56, 0x1.c3c5aedee9851p-111},
				{0x1.00905554425d4p0, -0x1.6a79084ab093cp-55, 0x1.7217851d1ec6ep-109},
				{0x1.009b72f41a12bp0, 0x1.86364f8fbe8f8p-54, -0x1.80cbca335a7c3p-110},
				{0x1.00a6910f3b6fdp0, -0x1.82e8e14e3110ep-55, -0x1.706bd4eb22595p-110},
				{0x1.00b1afa5abcbfp0, -0x1.4f6b2a7609f71p-55, -0x1.b55dd523f3c08p-111},
				{0x1.00bcceb7707ecp0, -0x1.e1a258ea8f71bp-56, 0x1.90a1e207cced1p-110},
				{0x1.00c7ee448ee02p0, 0x1.4362ca5bc26f1p-56, 0x1.78d0472db37c5p-110},
				{0x1.00d30e4d0c483p0, 0x1.095a56c919d02p-54, -0x1.bcd4db3cb52fep-109},
				{0x1.00de2ed0ee0f5p0, -0x1.406ac4e81a645p-57, -0x1.cf1b131575ec2p-112},
				{0x1.00e94fd0398ep0, 0x1.b5a6902767e09p-54, -0x1.6aaa1fa7ff913p-112},
				{0x1.00f4714af41d3p0, -0x1.91b2060859321p-54, 0x1.68f236dff3218p-110},
				{0x1.00ff93412315cp0, 0x1.427068ab22306p-55, -0x1.e8bb58067e60ap-109},
				{0x1.010ab5b2cbd11p0, 0x1.c1d0660524e08p-54, 0x1.d4cd5e1d71fdfp-108},
				{0x1.0115d89ff3a8bp0, -0x1.e7bdfb3204be8p-54, 0x1.e4ecf350ebe88p-108},
				{0x1.0120fc089ff63p0, 0x1.843aa8b9cbbc6p-55, 0x1.6a2aa2c89c4f8p-109},
				{0x1.012c1fecd613bp0, -0x1.34104ee7edae9p-56, 0x1.1ca368a20ed05p-110},
				{0x1.0137444c9b5b5p0, -0x1.2b6aeb6176892p-56, 0x1.edb1095d925cfp-114},
				{0x1.01426927f5278p0, 0x1.a8cd33b8a1bb3p-56, -0x1.488c78eded75fp-111},
				{0x1.014d8e7ee8d2fp0, 0x1.2edc08e5da99ap-56, -0x1.7480f5ea1b3c9p-113},
				{0x1.0158b4517bb88p0, 0x1.57ba2dc7e0c73p-55, -0x1.ae45989a04dd5p-111},
				{0x1.0163da9fb3335p0, 0x1.b61299ab8cdb7p-54, 0x1.bf48007d80987p-109},
				{0x1.016f0169949edp0, -0x1.90565902c5f44p-54, 0x1.1aa91a059292cp-109},
				{0x1.017a28af25567p0, 0x1.70fc41c5c2d53p-55, 0x1.b6663292855f5p-110},
				{0x1.018550706ab62p0, 0x1.4b9a6e145d76cp-54, 0x1.e7fbca6793d94p-108},
				{0x1.019078ad6a19fp0, -0x1.008eff5142bf9p-56, -0x1.5b9f5c7de3b93p-110},
				{0x1.019ba16628de2p0, -0x1.77669f033c7dep-54, 0x1.4638bf2f6acabp-110},
				{0x1.01a6ca9aac5f3p0, -0x1.09bb78eeead0ap-54, -0x1.ab237b9a069c5p-109},
				{0x1.01b1f44af9f9ep0, 0x1.371231477ece5p-54, 0x1.3ab358be97cefp-108},
				{0x1.01bd1e77170b4p0, 0x1.5e7626621eb5bp-56, -0x1.4027b2294bb64p-110},
				{0x1.01c8491f08f08p0, -0x1.bc72b100828a5p-54, 0x1.656394426c99p-111},
				{0x1.01d37442d507p0, -0x1.ce39cbbab8bbep-57, 0x1.bf9785189bdd8p-111},
				{0x1.01de9fe280ac8p0, 0x1.16996709da2e2p-55, 0x1.7c12f86114fe3p-109},
				{0x1.01e9cbfe113efp0, -0x1.c11f5239bf535p-55, -0x1.653d5d24b5d28p-109},
				{0x1.01f4f8958c1c6p0, 0x1.e1d4eb5edc6b3p-55, 0x1.04a0cdc1d86d7p-109},
				{0x1.020025a8f6a35p0, -0x1.afb99946ee3fp-54, 0x1.c678c46149782p-109},
				{0x1.020b533856324p0, -0x1.8f06d8a148a32p-54, 0x1.48524e1e9df7p-108},
				{0x1.02168143b0281p0, -0x1.2bf310fc54eb6p-55, 0x1.9953ea727ff0bp-109},
				{0x1.0221afcb09e3ep0, -0x1.c95a035eb4175p-54, -0x1.ccfbbec22d28ep-108},
				{0x1.022cdece68c4fp0, -0x1.491793e46834dp-54, 0x1.9e2bb6e181de1p-108},
				{0x1.02380e4dd22adp0, -0x1.3e8d0d9c49091p-56, 0x1.f17609ae29308p-110},
				{0x1.02433e494b755p0, -0x1.314aa16278aa3p-54, -0x1.c7dc2c476bfb8p-110},
				{0x1.024e6ec0da046p0, 0x1.48daf888e9651p-55, -0x1.fab994971d4a3p-109},
				{0x1.02599fb483385p0, 0x1.56dc8046821f4p-55, 0x1.848b62cbdd0afp-109},
				{0x1.0264d1244c719p0, 0x1.45b42356b9d47p-54, -0x1.bf603ba715d0cp-109},
				{0x1.027003103b10ep0, -0x1.082ef51b61d7ep-56, 0x1.89434e751e1aap-110},
				{0x1.027b357854772p0, 0x1.2106ed0920a34p-56, -0x1.03b54fd64e8acp-110},
				{0x1.0286685c9e059p0, -0x1.fd4cf26ea5d0fp-54, 0x1.7785ea0acc486p-109},
				{0x1.02919bbd1d1d8p0, -0x1.09f8775e78084p-54, -0x1.ce447fdb35ff9p-109},
				{0x1.029ccf99d720ap0, 0x1.64cbba902ca27p-58, 0x1.5b884aab5642ap-112},
				{0x1.02a803f2d170dp0, 0x1.4383ef231d207p-54, -0x1.cfb3e46d7c1cp-108},
				{0x1.02b338c811703p0, 0x1.4a47a505b3a47p-54, -0x1.0d40cee4b81afp-112},
				{0x1.02be6e199c811p0, 0x1.e47120223467fp-54, 0x1.6ae7d36d7c1f7p-109},
				};

				// Polynomial approximations with double precision:
				// Return expm1(dx) / x ~ 1 + dx / 2 + dx^2 / 6 + dx^3 / 24.
				// For \|dx\| < 2^-13 + 2^-30:
				// \| output - expm1(dx) / dx \| < 2^-51.
				LIBC_INLINE double poly_approx_d(double dx) {
				// dx^2
				double dx2 = dx * dx;
				// c0 = 1 + dx / 2
				double c0 = fputil::multiply_add(dx, 0.5, 1.0);
				// c1 = 1/6 + dx / 24
				double c1 =
				fputil::multiply_add(dx, 0x1.5555555555555p-5, 0x1.5555555555555p-3);
				// p = dx^2 * c1 + c0 = 1 + dx / 2 + dx^2 / 6 + dx^3 / 24
				double p = fputil::multiply_add(dx2, c1, c0);
				return p;
				}

				// Polynomial approximation with double-double precision:
				// Return exp(dx) ~ 1 + dx + dx^2 / 2 + ... + dx^6 / 720
				// For \|dx\| < 2^-13 + 2^-30:
				// \| output - exp(dx) \| < 2^-101
				DoubleDouble poly_approx_dd(const DoubleDouble &dx) {
				// Taylor polynomial.
				constexpr DoubleDouble COEFFS[] = {
				{0, 0x1p0}, // 1
				{0, 0x1p0}, // 1
				{0, 0x1p-1}, // 1/2
				{0x1.5555555555555p-57, 0x1.5555555555555p-3}, // 1/6
				{0x1.5555555555555p-59, 0x1.5555555555555p-5}, // 1/24
				{0x1.1111111111111p-63, 0x1.1111111111111p-7}, // 1/120
				{-0x1.f49f49f49f49fp-65, 0x1.6c16c16c16c17p-10}, // 1/720
				};

				DoubleDouble p = fputil::polyeval(dx, COEFFS[0], COEFFS[1], COEFFS[2],
				COEFFS[3], COEFFS[4], COEFFS[5], COEFFS[6]);
				return p;
				}

				// Polynomial approximation with 128-bit precision:
				// Return exp(dx) ~ 1 + dx + dx^2 / 2 + ... + dx^7 / 5040
				// For \|dx\| < 2^-13 + 2^-30:
				// \| output - exp(dx) \| < 2^-126.
				Float128 poly_approx_f128(const Float128 &dx) {
				using MType = typename Float128::MantissaType;

				constexpr Float128 COEFFS_128[]{
				{false, -127, MType({0, 0x8000000000000000})}, // 1.0
				{false, -127, MType({0, 0x8000000000000000})}, // 1.0
				{false, -128, MType({0, 0x8000000000000000})}, // 0.5
				{false, -130, MType({0xaaaaaaaaaaaaaaab, 0xaaaaaaaaaaaaaaaa})}, // 1/6
				{false, -132, MType({0xaaaaaaaaaaaaaaab, 0xaaaaaaaaaaaaaaaa})}, // 1/24
				{false, -134, MType({0x8888888888888889, 0x8888888888888888})}, // 1/120
				{false, -137, MType({0x60b60b60b60b60b6, 0xb60b60b60b60b60b})}, // 1/720
				{false, -140, MType({0x00b00b00b00b00b0, 0xb00b00b00b00b00b})}, // 1/5040
				};

				Float128 p = fputil::polyeval(dx, COEFFS_128[0], COEFFS_128[1], COEFFS_128[2],
				COEFFS_128[3], COEFFS_128[4], COEFFS_128[5],
				COEFFS_128[6], COEFFS_128[7]);
				return p;
				}

				// Compute exp(x) using 128-bit precision.
				// TODO(lntue): investigate triple-double precision implementation for this
				// step.
				Float128 exp_f128(double x, double kd, int idx1, int idx2) {
				// Recalculate dx:

				double t1 = fputil::multiply_add(kd, MLOG_2_EXP2_M12_HI, x); // exact
				double t2 = kd * MLOG_2_EXP2_M12_MID_30; // exact
				double t3 = kd * MLOG_2_EXP2_M12_LO; // Error < 2^-133

				Float128 dx = fputil::quick_add(
				Float128(t1), fputil::quick_add(Float128(t2), Float128(t3)));

				// TODO: Skip recalculating exp_mid1 and exp_mid2.
				Float128 exp_mid1 =
				fputil::quick_add(Float128(EXP_MID1[idx1].hi),
				fputil::quick_add(Float128(EXP_MID1[idx1].mid),
				Float128(EXP_MID1[idx1].lo)));

				Float128 exp_mid2 =
				fputil::quick_add(Float128(EXP_MID2[idx2].hi),
				fputil::quick_add(Float128(EXP_MID2[idx2].mid),
				Float128(EXP_MID2[idx2].lo)));

				Float128 exp_mid = fputil::quick_mul(exp_mid1, exp_mid2);

				Float128 p = poly_approx_f128(dx);

				Float128 r = fputil::quick_mul(exp_mid, p);

				r.exponent += static_cast<int>(kd) >> 12;

				return r;
				}

				// Compute exp(x) with double-double precision.
				DoubleDouble exp_double_double(double x, double kd,
				const DoubleDouble &exp_mid) {
				// Recalculate dx:
				// dx = x - k * 2^-12 * log(2)
				double t1 = fputil::multiply_add(kd, MLOG_2_EXP2_M12_HI, x); // exact
				double t2 = kd * MLOG_2_EXP2_M12_MID_30; // exact
				double t3 = kd * MLOG_2_EXP2_M12_LO; // Error < 2^-130

				DoubleDouble dx = fputil::exact_add(t1, t2);
				dx.lo += t3;

				// Degree-6 Taylor polynomial approximation in double-double precision.
				// \| p - exp(x) \| < 2^-100.
				DoubleDouble p = poly_approx_dd(dx);

				// Error bounds: 2^-99.
				DoubleDouble r = fputil::quick_mult(exp_mid, p);

				return r;
				}

				// Rounding tests when the output might be denormal.
				cpp::optional<double> ziv_test_denorm(int hi, double mid, double lo,
				double err) {
				using FloatProp = typename fputil::FloatProperties<double>;

				// Scaling factor = 1/(min normal number) = 2^1022
				int64_t exp_hi = static_cast<int64_t>(hi + 1022) << FloatProp::MANTISSA_WIDTH;
				double mid_hi = cpp::bit_cast<double>(exp_hi + cpp::bit_cast<int64_t>(mid));

				// Extra errors from another rounding step.
				err += 0x1.0p-52;

				double lo_u = lo + err;
				double lo_l = lo - err;
				double mid_lo_u =
				cpp::bit_cast<double>(exp_hi + cpp::bit_cast<int64_t>(lo_u));
				double mid_lo_l =
				cpp::bit_cast<double>(exp_hi + cpp::bit_cast<int64_t>(lo_l));

				// By adding 2^-511, the results will have similar rounding points as denormal
				// outputs.
				double upper = (mid_hi + mid_lo_u);
				double lower = (mid_hi + mid_lo_l);

				uint64_t scale_down = 0;

				if (upper < 1.0) {
				// Upper bound is in denormal range, need extra rounding.
				upper += 1.0;
				lower += 1.0;
				scale_down = 0x3FF0'0000'0000'0000; // 1.0
				}

				if (LIBC_LIKELY(upper == lower)) {
				return cpp::bit_cast<double>(cpp::bit_cast<uint64_t>(upper) - scale_down);
				}

				return cpp::nullopt;
				}

				// Check for exceptional cases when
				// \|x\| < 2^-53
				double set_exceptional(double x) {
				using FPBits = typename fputil::FPBits<double>;
				using FloatProp = typename fputil::FloatProperties<double>;
				FPBits xbits(x);

				uint64_t x_u = xbits.uintval();
				uint64_t x_abs = x_u & FloatProp::EXP_MANT_MASK;

				// \|x\| < 2^-53
				if (x_abs <= 0x3ca0'0000'0000'0000ULL) {
				// exp(x) ~ 1 + x
				return 1 + x;
				}

				// x <= log(2^-1075) \|\| x >= 0x1.6232bdd7abcd3p+9 or inf/nan.

				// x <= log(2^-1075) or -inf/nan
				if (x_u >= 0xc087'4910'd52d'3052ULL) {
				// exp(-Inf) = 0
				if (xbits.is_inf())
				return 0.0;

				// exp(nan) = nan
				if (xbits.is_nan())
				return x;

				if (fputil::quick_get_round() == FE_UPWARD)
				return static_cast<double>(FPBits(FPBits::MIN_SUBNORMAL));
				fputil::set_errno_if_required(ERANGE);
				fputil::raise_except_if_required(FE_UNDERFLOW);
				return 0.0;
				}

				// x >= round(log(MAX_NORMAL), D, RU) = 0x1.62e42fefa39fp+9 or +inf/nan
				// x is finite
				if (x_u < 0x7ff0'0000'0000'0000ULL) {
				int rounding = fputil::quick_get_round();
				if (rounding == FE_DOWNWARD \|\| rounding == FE_TOWARDZERO)
				return static_cast<double>(FPBits(FPBits::MAX_NORMAL));

				fputil::set_errno_if_required(ERANGE);
				fputil::raise_except_if_required(FE_OVERFLOW);
				}
				// x is +inf or nan
				return x + static_cast<double>(FPBits::inf());
				}

				LLVM_LIBC_FUNCTION(double, exp, (double x)) {
				using FPBits = typename fputil::FPBits<double>;
				using FloatProp = typename fputil::FloatProperties<double>;
				FPBits xbits(x);

				uint64_t x_u = xbits.uintval();

				// Upper bound: max normal number = 2^1023 * (2 - 2^-52)
				// > round(log (2^1023 ( 2 - 2^-52 )), D, RU) = 0x1.62e42fefa39fp+9
				// > round(log (2^1023 ( 2 - 2^-52 )), D, RD) = 0x1.62e42fefa39efp+9
				// > round(log (2^1023 ( 2 - 2^-52 )), D, RN) = 0x1.62e42fefa39efp+9
				// > round(exp(0x1.62e42fefa39fp+9), D, RN) = infty

				// Lower bound: min denormal number / 2 = 2^-1075
				// > round(log(2^-1075), D, RN) = -0x1.74910d52d3052p9

				// Another lower bound: min normal number = 2^-1022
				// > round(log(2^-1022), D, RN) = -0x1.6232bdd7abcd2p9

				// x < log(2^-1075) or x >= 0x1.6232bdd7abcd3p+9 or \|x\| < 2^-53.
				if (LIBC_UNLIKELY(x_u >= 0xc0874910d52d3052 \|\|
				(x_u < 0xbca0000000000000 && x_u >= 0x40862e42fefa39f0) \|\|
				x_u < 0x3ca0000000000000)) {
				return set_exceptional(x);
				}

				// Now log(2^-1022) <= x <= -2^-53 or 2^-53 <= x < log(2^1023 * (2 - 2^-52))

				// Range reduction:
				// Let x = log(2) * (hi + mid1 + mid2) + lo
				// in which:
				// hi is an integer
				// mid1 * 2^6 is an integer
				// mid2 * 2^12 is an integer
				// then:
				// exp(x) = 2^hi * 2^(mid1) * 2^(mid2) * exp(lo).
				// With this formula:
				// - multiplying by 2^hi is exact and cheap, simply by adding the exponent
				// field.
				// - 2^(mid1) and 2^(mid2) are stored in 2 x 64-element tables.
				// - exp(lo) ~ 1 + lo + a0 * lo^2 + ...
				//
				// They can be defined by:
				// hi + mid1 + mid2 = 2^(-12) * round(2^12 * log_2(e) * x)
				// If we store L2E = round(log2(e), D, RN), then:
				// log2(e) - L2E ~ 1.5 * 2^(-56)
				// So the errors when computing in double precision is:
				// \| x * 2^12 * log_2(e) - D(x * 2^12 * L2E) \| <=
				// <= \| x * 2^12 * log_2(e) - x * 2^12 * L2E \| +
				// + \| x * 2^12 * L2E - D(x * 2^12 * L2E) \|
				// <= 2^12 * ( \|x\| * 1.5 * 2^-56 + eps(x)) for RN
				// 2^12 * ( \|x\| * 1.5 * 2^-56 + 2*eps(x)) for other rounding modes.
				// So if:
				// hi + mid1 + mid2 = 2^(-12) * round(x * 2^12 * L2E) is computed entirely
				// in double precision, the reduced argument:
				// lo = x - log(2) * (hi + mid1 + mid2) is bounded by:
				// \|lo\| <= 2^-13 + (\|x\| * 1.5 * 2^-56 + 2*eps(x))
				// < 2^-13 + (1.5 * 2^9 * 1.5 * 2^-56 + 2*2^(9 - 52))
				// < 2^-13 + 2^-41
				//

				// The following trick computes the round(x * L2E) more efficiently
				// than using the rounding instructions, with the tradeoff for less accuracy,
				// and hence a slightly larger range for the reduced argument `lo`.
				//
				// To be precise, since \|x\| < \|log(2^-1075)\| < 1.5 * 2^9,
				// \|x * 2^12 * L2E\| < 1.5 * 2^9 * 1.5 < 2^23,
				// So we can fit the rounded result round(x * 2^12 * L2E) in int32_t.
				// Thus, the goal is to be able to use an additional addition and fixed width
				// shift to get an int32_t representing round(x * 2^12 * L2E).
				//
				// Assuming int32_t using 2-complement representation, since the mantissa part
				// of a double precision is unsigned with the leading bit hidden, if we add an
				// extra constant C = 2^e1 + 2^e2 with e1 > e2 >= 2^25 to the product, the
				// part that are < 2^e2 in resulted mantissa of (x2^12L2E + C) can be
				// considered as a proper 2-complement representations of x2^12L2E.
				//
				// One small problem with this approach is that the sum (x2^12L2E + C) in
				// double precision is rounded to the least significant bit of the dorminant
				// factor C. In order to minimize the rounding errors from this addition, we
				// want to minimize e1. Another constraint that we want is that after
				// shifting the mantissa so that the least significant bit of int32_t
				// corresponds to the unit bit of (x2^12L2E), the sign is correct without
				// any adjustment. So combining these 2 requirements, we can choose
				// C = 2^33 + 2^32, so that the sign bit corresponds to 2^31 bit, and hence
				// after right shifting the mantissa, the resulting int32_t has correct sign.
				// With this choice of C, the number of mantissa bits we need to shift to the
				// right is: 52 - 33 = 19.
				//
				// Moreover, since the integer right shifts are equivalent to rounding down,
				// we can add an extra 0.5 so that it will become round-to-nearest, tie-to-
				// +infinity. So in particular, we can compute:
				// hmm = x * 2^12 * L2E + C,
				// where C = 2^33 + 2^32 + 2^-1, then if
				// k = int32_t(lower 51 bits of double(x * 2^12 * L2E + C) >> 19),
				// the reduced argument:
				// lo = x - log(2) * 2^-12 * k is bounded by:
				// \|lo\| <= 2^-13 + 2^-41 + 2^-12*2^-19
				// = 2^-13 + 2^-31 + 2^-41.
				//
				// Finally, notice that k only uses the mantissa of x * 2^12 * L2E, so the
				// exponent 2^12 is not needed. So we can simply define
				// C = 2^(33 - 12) + 2^(32 - 12) + 2^(-13 - 12), and
				// k = int32_t(lower 51 bits of double(x * L2E + C) >> 19).

				// Rounding errors <= 2^-31 + 2^-41.
				double tmp = fputil::multiply_add(x, LOG2_E, 0x1.8000'0000'4p21);
				int k = static_cast<int>(cpp::bit_cast<uint64_t>(tmp) >> 19);
				double kd = static_cast<double>(k);

				uint32_t idx1 = (k >> 6) & 0x3f;
				uint32_t idx2 = k & 0x3f;
				int hi = k >> 12;

				bool denorm = (hi <= -1022);

				DoubleDouble exp_mid1{EXP_MID1[idx1].mid, EXP_MID1[idx1].hi};
				DoubleDouble exp_mid2{EXP_MID2[idx2].mid, EXP_MID2[idx2].hi};

				DoubleDouble exp_mid = fputil::quick_mult(exp_mid1, exp_mid2);

				// \|x - (hi + mid1 + mid2) * log(2) - dx\| < 2^11 * eps(M_LOG_2_EXP2_M12.lo)
				// = 2^11 * 2^-13 * 2^-52
				// = 2^-54.
				// \|dx\| < 2^-13 + 2^-30.
				double lo_h = fputil::multiply_add(kd, MLOG_2_EXP2_M12_HI, x); // exact
				double dx = fputil::multiply_add(kd, MLOG_2_EXP2_M12_MID, lo_h);

				// We use the degree-4 Taylor polynomial to approximate exp(lo):
				// exp(lo) ~ 1 + lo + lo^2 / 2 + lo^3 / 6 + lo^4 / 24 = 1 + lo * P(lo)
				// So that the errors are bounded by:
				// \|P(lo) - expm1(lo)/lo\| < \|lo\|^4 / 64 < 2^(-13 * 4) / 64 = 2^-58
				// Let P_ be an evaluation of P where all intermediate computations are in
				// double precision. Using either Horner's or Estrin's schemes, the evaluated
				// errors can be bounded by:
				// \|P_(dx) - P(dx)\| < 2^-51
				// => \|dx * P_(dx) - expm1(lo) \| < 1.5 * 2^-64
				// => 2^(mid1 + mid2) * \|dx * P_(dx) - expm1(lo)\| < 1.5 * 2^-63.
				// Since we approximate
				// 2^(mid1 + mid2) ~ exp_mid.hi + exp_mid.lo,
				// We use the expression:
				// (exp_mid.hi + exp_mid.lo) * (1 + dx * P_(dx)) ~
				// ~ exp_mid.hi + (exp_mid.hi * dx * P_(dx) + exp_mid.lo)
				// with errors bounded by 1.5 * 2^-63.

				double mid_lo = dx * exp_mid.hi;

				// Approximate expm1(dx)/dx ~ 1 + dx / 2 + dx^2 / 6 + dx^3 / 24.
				double p = poly_approx_d(dx);

				double lo = fputil::multiply_add(p, mid_lo, exp_mid.lo);

				if (LIBC_UNLIKELY(denorm)) {
				if (auto r = ziv_test_denorm(hi, exp_mid.hi, lo, ERR_D);
				LIBC_LIKELY(r.has_value()))
				return r.value();
				} else {
				double upper = exp_mid.hi + (lo + ERR_D);
				double lower = exp_mid.hi + (lo - ERR_D);

				if (LIBC_LIKELY(upper == lower)) {
				// to multiply by 2^hi, a fast way is to simply add hi to the exponent
				// field.
				int64_t exp_hi = static_cast<int64_t>(hi) << FloatProp::MANTISSA_WIDTH;
				double r = cpp::bit_cast<double>(exp_hi + cpp::bit_cast<int64_t>(upper));
				return r;
				}
				}

				// Use double-double
				DoubleDouble r_dd = exp_double_double(x, kd, exp_mid);

				if (LIBC_UNLIKELY(denorm)) {
				if (auto r = ziv_test_denorm(hi, r_dd.hi, r_dd.lo, ERR_DD);
				LIBC_LIKELY(r.has_value()))
				return r.value();
				} else {
				double upper_dd = r_dd.hi + (r_dd.lo + ERR_DD);
				double lower_dd = r_dd.hi + (r_dd.lo - ERR_DD);

				if (LIBC_LIKELY(upper_dd == lower_dd)) {
				int64_t exp_hi = static_cast<int64_t>(hi) << FloatProp::MANTISSA_WIDTH;
				double r =
				cpp::bit_cast<double>(exp_hi + cpp::bit_cast<int64_t>(upper_dd));
				return r;
				}
				}

				// Use 128-bit precision
				Float128 r_f128 = exp_f128(x, kd, idx1, idx2);

				return static_cast<double>(r_f128);
				}

				} // namespace __llvm_libc

libc/test/src/math/CMakeLists.txt

Show First 20 Lines • Show All 586 Lines • ▼ Show 20 Lines	add_fp_unittest(
DEPENDS		DEPENDS
libc.src.errno.errno		libc.src.errno.errno
libc.include.math		libc.include.math
libc.src.math.expf		libc.src.math.expf
libc.src.__support.FPUtil.fp_bits		libc.src.__support.FPUtil.fp_bits
)		)

add_fp_unittest(		add_fp_unittest(
		exp_test
		NEED_MPFR
		SUITE
		libc_math_unittests
		SRCS
		exp_test.cpp
		DEPENDS
		libc.src.errno.errno
		libc.include.math
		libc.src.math.exp
		libc.src.__support.FPUtil.fp_bits
		)

		add_fp_unittest(
exp2f_test		exp2f_test
NEED_MPFR		NEED_MPFR
SUITE		SUITE
libc_math_unittests		libc_math_unittests
SRCS		SRCS
exp2f_test.cpp		exp2f_test.cpp
DEPENDS		DEPENDS
libc.src.errno.errno		libc.src.errno.errno
▲ Show 20 Lines • Show All 1,046 Lines • Show Last 20 Lines

libc/test/src/math/exp_test.cpp

This file was added.

				//===-- Unittests for exp -------------------------------------------------===//
				//
				// Part of the LLVM Project, under the Apache License v2.0 with LLVM Exceptions.
				// See https://llvm.org/LICENSE.txt for license information.
				// SPDX-License-Identifier: Apache-2.0 WITH LLVM-exception
				//
				//===----------------------------------------------------------------------===//

				#include "src/__support/FPUtil/FPBits.h"
				#include "src/errno/libc_errno.h"
				#include "src/math/exp.h"
				#include "test/UnitTest/FPMatcher.h"
				#include "test/UnitTest/Test.h"
				#include "utils/MPFRWrapper/MPFRUtils.h"
				#include <math.h>

				#include <errno.h>
				#include <stdint.h>

				namespace mpfr = __llvm_libc::testing::mpfr;
				using __llvm_libc::testing::tlog;

				DECLARE_SPECIAL_CONSTANTS(double)

				TEST(LlvmLibcExpTest, SpecialNumbers) {
				EXPECT_FP_EQ(aNaN, __llvm_libc::exp(aNaN));
				EXPECT_FP_EQ(inf, __llvm_libc::exp(inf));
				EXPECT_FP_EQ_ALL_ROUNDING(zero, __llvm_libc::exp(neg_inf));
				EXPECT_FP_EQ_WITH_EXCEPTION(zero, __llvm_libc::exp(-0x1.0p20), FE_UNDERFLOW);
				EXPECT_FP_EQ_WITH_EXCEPTION(inf, __llvm_libc::exp(0x1.0p20), FE_OVERFLOW);
				EXPECT_FP_EQ_ALL_ROUNDING(1.0, __llvm_libc::exp(0.0));
				EXPECT_FP_EQ_ALL_ROUNDING(1.0, __llvm_libc::exp(-0.0));
				}

				TEST(LlvmLibcExpTest, TrickyInputs) {
				constexpr int N = 14;
				constexpr uint64_t INPUTS[N] = {
				0x3FD79289C6E6A5C0,
				0x3FD05DE80A173EA0, // 0x1.05de80a173eap-2
				0xbf1eb7a4cb841fcc, // -0x1.eb7a4cb841fccp-14
				0xbf19a61fb925970d,
				0x3fda7b764e2cf47a, // 0x1.a7b764e2cf47ap-2
				0xc04757852a4b93aa, // -0x1.757852a4b93aap+5
				0x4044c19e5712e377, // x=0x1.4c19e5712e377p+5
				0xbf19a61fb925970d, // x=-0x1.9a61fb925970dp-14
				0xc039a74cdab36c28, // x=-0x1.9a74cdab36c28p+4
				0xc085b3e4e2e3bba9, // x=-0x1.5b3e4e2e3bba9p+9
				0xc086960d591aec34, // x=-0x1.6960d591aec34p+9
				0xc086232c09d58d91, // x=-0x1.6232c09d58d91p+9
				0xc0874910d52d3051, // x=-0x1.74910d52d3051p9
				0xc0867a172ceb0990, // x=-0x1.67a172ceb099p+9
				};
				for (int i = 0; i < N; ++i) {
				double x = double(FPBits(INPUTS[i]));
				EXPECT_MPFR_MATCH_ALL_ROUNDING(mpfr::Operation::Exp, x, __llvm_libc::exp(x),
				0.5);
				}
				}

				TEST(LlvmLibcExpTest, InDoubleRange) {
				constexpr uint64_t COUNT = 1'231;
				uint64_t START = __llvm_libc::fputil::FPBits<double>(0.25).uintval();
				uint64_t STOP = __llvm_libc::fputil::FPBits<double>(4.0).uintval();
				uint64_t STEP = (STOP - START) / COUNT;

				auto test = [&](mpfr::RoundingMode rounding_mode) {
				mpfr::ForceRoundingMode __r(rounding_mode);
				if (!__r.success)
				return;

				uint64_t fails = 0;
				uint64_t count = 0;
				uint64_t cc = 0;
				double mx, mr = 0.0;
				double tol = 0.5;

				for (uint64_t i = 0, v = START; i <= COUNT; ++i, v += STEP) {
				double x = FPBits(v).get_val();
				if (isnan(x) \|\| isinf(x) \|\| x < 0.0)
				continue;
				libc_errno = 0;
				double result = __llvm_libc::exp(x);
				++cc;
				if (isnan(result) \|\| isinf(result))
				continue;

				++count;
				// ASSERT_MPFR_MATCH(mpfr::Operation::Log, x, result, 0.5);
				if (!TEST_MPFR_MATCH_ROUNDING_SILENTLY(mpfr::Operation::Exp, x, result,
				0.5, rounding_mode)) {
				++fails;
				while (!TEST_MPFR_MATCH_ROUNDING_SILENTLY(mpfr::Operation::Exp, x,
				result, tol, rounding_mode)) {
				mx = x;
				mr = result;

				if (tol > 1000.0)
				break;

				tol *= 2.0;
				}
				}
				}
				tlog << " Exp failed: " << fails << "/" << count << "/" << cc
				<< " tests.\n";
				tlog << " Max ULPs is at most: " << static_cast<uint64_t>(tol) << ".\n";
				if (fails) {
				EXPECT_MPFR_MATCH(mpfr::Operation::Exp, mx, mr, 0.5, rounding_mode);
				}
				};

				tlog << " Test Rounding To Nearest...\n";
				test(mpfr::RoundingMode::Nearest);

				tlog << " Test Rounding Downward...\n";
				test(mpfr::RoundingMode::Downward);

				tlog << " Test Rounding Upward...\n";
				test(mpfr::RoundingMode::Upward);

				tlog << " Test Rounding Toward Zero...\n";
				test(mpfr::RoundingMode::TowardZero);
				}

libc/test/src/math/log10_test.cpp

Show All 27 Lines	TEST(LlvmLibcLog10Test, SpecialNumbers) {
EXPECT_FP_IS_NAN_WITH_EXCEPTION(__llvm_libc::log10(neg_inf), FE_INVALID);		EXPECT_FP_IS_NAN_WITH_EXCEPTION(__llvm_libc::log10(neg_inf), FE_INVALID);
EXPECT_FP_EQ_WITH_EXCEPTION(neg_inf, __llvm_libc::log10(0.0), FE_DIVBYZERO);		EXPECT_FP_EQ_WITH_EXCEPTION(neg_inf, __llvm_libc::log10(0.0), FE_DIVBYZERO);
EXPECT_FP_EQ_WITH_EXCEPTION(neg_inf, __llvm_libc::log10(-0.0), FE_DIVBYZERO);		EXPECT_FP_EQ_WITH_EXCEPTION(neg_inf, __llvm_libc::log10(-0.0), FE_DIVBYZERO);
EXPECT_FP_IS_NAN_WITH_EXCEPTION(__llvm_libc::log10(-1.0), FE_INVALID);		EXPECT_FP_IS_NAN_WITH_EXCEPTION(__llvm_libc::log10(-1.0), FE_INVALID);
EXPECT_FP_EQ_ALL_ROUNDING(zero, __llvm_libc::log10(1.0));		EXPECT_FP_EQ_ALL_ROUNDING(zero, __llvm_libc::log10(1.0));
}		}

TEST(LlvmLibcLog10Test, TrickyInputs) {		TEST(LlvmLibcLog10Test, TrickyInputs) {
constexpr int N = 35;		constexpr int N = 36;
constexpr uint64_t INPUTS[N] = {		constexpr uint64_t INPUTS[N] = {
0x3ff0000000000000, // x = 1.0		0x3ff0000000000000, // x = 1.0
0x4024000000000000, // x = 10.0		0x4024000000000000, // x = 10.0
0x4059000000000000, // x = 10^2		0x4059000000000000, // x = 10^2
0x408f400000000000, // x = 10^3		0x408f400000000000, // x = 10^3
0x40c3880000000000, // x = 10^4		0x40c3880000000000, // x = 10^4
0x40f86a0000000000, // x = 10^5		0x40f86a0000000000, // x = 10^5
0x412e848000000000, // x = 10^6		0x412e848000000000, // x = 10^6
Show All 11 Lines	constexpr uint64_t INPUTS[N] = {
0x43abc16d674ec800, // x = 10^18		0x43abc16d674ec800, // x = 10^18
0x43e158e460913d00, // x = 10^19		0x43e158e460913d00, // x = 10^19
0x4415af1d78b58c40, // x = 10^20		0x4415af1d78b58c40, // x = 10^20
0x444b1ae4d6e2ef50, // x = 10^21		0x444b1ae4d6e2ef50, // x = 10^21
0x4480f0cf064dd592, // x = 10^22		0x4480f0cf064dd592, // x = 10^22
0x3fefffffffef06ad, 0x3fefde0f22c7d0eb, 0x225e7812faadb32f,		0x3fefffffffef06ad, 0x3fefde0f22c7d0eb, 0x225e7812faadb32f,
0x3fee1076964c2903, 0x3fdfe93fff7fceb0, 0x3ff012631ad8df10,		0x3fee1076964c2903, 0x3fdfe93fff7fceb0, 0x3ff012631ad8df10,
0x3fefbfdaa448ed98, 0x44b0c9705a25ce02, 0x2c88d301065c7f9b,		0x3fefbfdaa448ed98, 0x44b0c9705a25ce02, 0x2c88d301065c7f9b,
0x30160580e7268a99, 0x5ca04103b7eaa345, 0x19ad77dc4a40093f};		0x30160580e7268a99, 0x5ca04103b7eaa345, 0x19ad77dc4a40093f,
		0x0000449fb5c8a96e};
for (int i = 0; i < N; ++i) {		for (int i = 0; i < N; ++i) {
double x = double(FPBits(INPUTS[i]));		double x = double(FPBits(INPUTS[i]));
EXPECT_MPFR_MATCH_ALL_ROUNDING(mpfr::Operation::Log10, x,		EXPECT_MPFR_MATCH_ALL_ROUNDING(mpfr::Operation::Log10, x,
__llvm_libc::log10(x), 0.5);		__llvm_libc::log10(x), 0.5);
}		}
}		}

TEST(LlvmLibcLog10Test, AllExponents) {		TEST(LlvmLibcLog10Test, AllExponents) {
▲ Show 20 Lines • Show All 68 Lines • Show Last 20 Lines

This is an archive of the discontinued LLVM Phabricator instance.

[libc][math] Implement double precision exp function correctly rounded for all rounding modes.ClosedPublic

Details

Diff Detail

Event Timeline

Revision Contents

Diff 552512

libc/config/darwin/arm/entrypoints.txt

libc/config/linux/aarch64/entrypoints.txt

libc/config/linux/riscv64/entrypoints.txt

libc/config/linux/x86_64/entrypoints.txt

libc/config/windows/entrypoints.txt

libc/docs/math/index.rst

libc/spec/stdc.td

libc/src/__support/FPUtil/PolyEval.h

libc/src/__support/FPUtil/double_double.h

libc/src/__support/FPUtil/dyadic_float.h

libc/src/__support/FPUtil/multiply_add.h

libc/src/math/CMakeLists.txt

libc/src/math/exp.h

libc/src/math/generic/CMakeLists.txt

libc/src/math/generic/exp.cpp

libc/test/src/math/CMakeLists.txt

libc/test/src/math/exp_test.cpp

libc/test/src/math/log10_test.cpp

[libc][math] Implement double precision exp function correctly rounded for all rounding modes.
ClosedPublic