This is an archive of the discontinued LLVM Phabricator instance.

Paths

Table of Contentst

-
libc/src/math/generic/
-
src/
-
math/
-
generic/
-
expf.cpp
-
expm1f.cpp

Differential D127951

[libc][math] New algorithm for expf/expm1f/exp2f and for new functions sinhf/coshf.
Needs RevisionPublic

Authored by orex on Jun 16 2022, 3:49 AM.

Download Raw Diff

Details

Reviewers

lntue
sivachandra
zimmermann6

Summary

A new common algorithms for expf/expm1f/exp2f/sinhf/coshf introduced:
1) Lookup tables size for expf/expm1f/exp2f reduced 12 times!
2) Common algorithm for all 5 functions. The same code for expf/expm1f/sinhf/coshf.
3) Improved precision: number of exceptional cases reduced from 9 to 2 (+1 sinhf).
4) More reliable algorithm. It uses pure mathematic and do not rely on Sollya
fitting. Easy change of lookup table size, for example.
5) Perf tests shows similar performance with previous implementation.
6) Core-math performance tests below (glibc 2.31)

	expf	expm1f	exp2f	sinhf	coshf
glibc	10.519	42.483	10.377	64.084	23.605
this	14.696	14.518	20.374	29.922	33.572
prev	15.003	12.291	27.201	---	---

Diff Detail

Repository: rG LLVM Github Monorepo

Event Timeline

orex created this revision.Jun 16 2022, 3:49 AM

Herald added projects: Restricted Project, Restricted Project. · View Herald TranscriptJun 16 2022, 3:49 AM

Herald added subscribers: libc-commits, ecnelises, tschuett, mgorny. · View Herald Transcript

orex edited the summary of this revision. (Show Details)Jun 16 2022, 3:49 AM

Harbormaster completed remote builds in B170228: Diff 437496.Jun 16 2022, 3:53 AM

orex edited the summary of this revision. (Show Details)Jun 16 2022, 4:48 AM

orex published this revision for review.Jun 16 2022, 5:03 AM

lntue added inline comments.Jun 16 2022, 7:11 AM

libc/src/math/generic/common_constants.cpp
106 ↗	(On Diff #437496)	Use hexadecimal floats for constants.
libc/src/math/generic/common_constants.h
20–21 ↗	(On Diff #437496)	Use all caps for constants.
libc/src/math/generic/exp2f.cpp
22 ↗	(On Diff #437496)	Use hexadecimal floats for constants and provide how are these constants generated. Also use caps and more descriptive names.
73 ↗	(On Diff #437496)	Maybe you can just inline `exval1`, `exval2`, and `exval_mask` here, with the comment explaining how `exval_mask` is obtained.
83–89 ↗	(On Diff #437496)	Add comments explain the range reduction computations in detail.
91 ↗	(On Diff #437496)	Use hexadecimal floats for constants. Also you might want to try to combine a bit to take advantage of FMA when it's available, either multiply_add(dx, polyeval(...), l2hdx); or polyeval(dx, l2hdx, l2l, ...); Those 2 are actually the same, and when there is no FMA. Performance tests should show if it improves when FMA is available or not.
98 ↗	(On Diff #437496)	Maybe combining this to `multiply_add(ml, pe, pe + ml + 1.0` to take advantage of FMA if available.
libc/test/src/math/CMakeLists.txt
1199 ↗	(On Diff #437496)	Indentation.

Added sinhf/coshf

Harbormaster completed remote builds in B172775: Diff 441028.Jun 29 2022, 8:59 AM

orex retitled this revision from [libc][math] New common algorithm for expf/expm1f/exp2f. to [libc][math] New algorithm for expf/expm1f/exp2f and for new functions sinhf/coshf..Jun 29 2022, 10:00 AM

orex edited the summary of this revision. (Show Details)

Fix build problem.

Harbormaster completed remote builds in B172969: Diff 441306.Jun 30 2022, 1:47 AM

Rebased to latest main.

orex added a reviewer: zimmermann6.Jul 29 2022, 8:56 AM

Harbormaster completed remote builds in B178295: Diff 448644.Jul 29 2022, 8:57 AM

Paul (@zimmermann6) and Tue (@lntue),

can you check performance of these two functions expf/expm1f on your systems? I have an idea to use the functions instead of standard in MinRelSize compiling option. Reducing the size can be useful for embedded systems, for example. What do you think?

sorry for the delay. It seems this does not compile properly with current main:

/localdisk/zimmerma/llvm-project/libc/src/math/generic/expm1f.cpp:11:10: fatal error: 'expxf.h' file not found
#include "expxf.h"
         ^~~~~~~~~

This revision now requires changes to proceed.Sep 21 2022, 12:09 AM

Revision Contents

Path

Size

libc/

src/

math/

generic/

expf.cpp

54 lines

expm1f.cpp

142 lines

Diff 448644

libc/src/math/generic/expf.cpp

	//===-- Single-precision e^x function -------------------------------------===//			//===-- Single-precision e^x function -------------------------------------===//
	//			//
	// Part of the LLVM Project, under the Apache License v2.0 with LLVM Exceptions.			// Part of the LLVM Project, under the Apache License v2.0 with LLVM Exceptions.
	// See https://llvm.org/LICENSE.txt for license information.			// See https://llvm.org/LICENSE.txt for license information.
	// SPDX-License-Identifier: Apache-2.0 WITH LLVM-exception			// SPDX-License-Identifier: Apache-2.0 WITH LLVM-exception
	//			//
	//===----------------------------------------------------------------------===//			//===----------------------------------------------------------------------===//

	#include "src/math/expf.h"			#include "src/math/expf.h"
	#include "common_constants.h" // Lookup tables EXP_M1 and EXP_M2.			#include "common_constants.h" // Lookup tables EXP_M1 and EXP_M2.
				#include "expxf.h"
	#include "src/__support/FPUtil/BasicOperations.h"			#include "src/__support/FPUtil/BasicOperations.h"
	#include "src/__support/FPUtil/FEnvImpl.h"			#include "src/__support/FPUtil/FEnvImpl.h"
	#include "src/__support/FPUtil/FPBits.h"			#include "src/__support/FPUtil/FPBits.h"
	#include "src/__support/FPUtil/PolyEval.h"			#include "src/__support/FPUtil/PolyEval.h"
	#include "src/__support/FPUtil/multiply_add.h"			#include "src/__support/FPUtil/multiply_add.h"
	#include "src/__support/FPUtil/nearest_integer.h"			#include "src/__support/FPUtil/nearest_integer.h"
	#include "src/__support/common.h"			#include "src/__support/common.h"

	#include <errno.h>			#include <errno.h>

	namespace __llvm_libc {			namespace __llvm_libc {

	LLVM_LIBC_FUNCTION(float, expf, (float x)) {			LLVM_LIBC_FUNCTION(float, expf, (float x)) {
	using FPBits = typename fputil::FPBits<float>;			using FPBits = typename fputil::FPBits<float>;
	FPBits xbits(x);			FPBits xbits(x);

	uint32_t x_u = xbits.uintval();			uint32_t x_u = xbits.uintval();
	uint32_t x_abs = x_u & 0x7fff'ffffU;			uint32_t x_abs = x_u & 0x7fff'ffffU;

	// Exceptional values
	if (unlikely(x_u == 0xc236'bd8cU)) { // x = -0x1.6d7b18p+5f
	return 0x1.108a58p-66f - x * 0x1.0p-95f;
	}

	// When \|x\| >= 89, \|x\| < 2^-25, or x is nan			// When \|x\| >= 89, \|x\| < 2^-25, or x is nan
	if (unlikely(x_abs >= 0x42b2'0000U \|\| x_abs <= 0x3280'0000U)) {			if (unlikely(x_abs >= 0x42b2'0000U \|\| x_abs <= 0x3280'0000U)) {
	// \|x\| < 2^-25			// \|x\| < 2^-25
	if (xbits.get_unbiased_exponent() <= 101) {			if (xbits.get_unbiased_exponent() <= 101) {
	return 1.0f + x;			return 1.0f + x;
	}			}

	// When x < log(2^-150) or nan			// When x < log(2^-150) or nan
	if (xbits.uintval() >= 0xc2cf'f1b5U) {			if (xbits.uintval() >= 0xc2cf'f1b5U) {
	// exp(-Inf) = 0			// exp(-Inf) = 0
	if (xbits.is_inf())			if (xbits.is_inf()) {
	return 0.0f;			return 0.0f;
				}
	// exp(nan) = nan			// exp(nan) = nan
	if (xbits.is_nan())			if (xbits.is_nan())
	return x;			return x;
	if (fputil::get_round() == FE_UPWARD)
	return static_cast<float>(FPBits(FPBits::MIN_SUBNORMAL));			if (unlikely(fputil::get_round() == FE_UPWARD))
				return FPBits(FPBits::MIN_SUBNORMAL).get_val();
	errno = ERANGE;			errno = ERANGE;
	return 0.0f;			return 0.0f;
	}			}
	// x >= 89 or nan			// x >= 89 or nan
	if (!xbits.get_sign() && (xbits.uintval() >= 0x42b2'0000)) {			if (!xbits.get_sign() && (xbits.uintval() >= 0x42b2'0000)) {
	// x is finite			// x is finite
	if (xbits.uintval() < 0x7f80'0000U) {			if (xbits.uintval() < 0x7f80'0000U) {
	int rounding = fputil::get_round();			int rounding = fputil::get_round();
	if (rounding == FE_DOWNWARD \|\| rounding == FE_TOWARDZERO)			if (unlikely(rounding == FE_DOWNWARD \|\| rounding == FE_TOWARDZERO))
	return static_cast<float>(FPBits(FPBits::MAX_NORMAL));			return FPBits(FPBits::MAX_NORMAL).get_val();

	errno = ERANGE;			errno = ERANGE;
	}			}
	// x is +inf or nan			// x is +inf or nan
	return x + static_cast<float>(FPBits::inf());			return x + FPBits::inf().get_val();
	}			}
	}			}
	// For -104 < x < 89, to compute exp(x), we perform the following range
	// reduction: find hi, mid, lo such that:			auto ep = __llvm_libc::exp_eval(x);
	// x = hi + mid + lo, in which			return fputil::multiply_add(ep.mult_exp, ep.r, ep.mult_exp);
	// hi is an integer,
	// mid * 2^7 is an integer
	// -2^(-8) <= lo < 2^-8.
	// In particular,
	// hi + mid = round(x * 2^7) * 2^(-7).
	// Then,
	// exp(x) = exp(hi + mid + lo) = exp(hi) * exp(mid) * exp(lo).
	// We store exp(hi) and exp(mid) in the lookup tables EXP_M1 and EXP_M2
	// respectively. exp(lo) is computed using a degree-4 minimax polynomial
	// generated by Sollya.

	// x_hi = (hi + mid) * 2^7 = round(x * 2^7).
	float kf = fputil::nearest_integer(x * 0x1.0p7f);
	// Subtract (hi + mid) from x to get lo.
	double xd = static_cast<double>(fputil::multiply_add(kf, -0x1.0p-7f, x));
	int x_hi = static_cast<int>(kf);
	x_hi += 104 << 7;
	// hi = x_hi >> 7
	double exp_hi = EXP_M1[x_hi >> 7];
	// mid * 2^7 = x_hi & 0x0000'007fU;
	double exp_mid = EXP_M2[x_hi & 0x7f];
	// Degree-4 minimax polynomial generated by Sollya with the following
	// commands:
	// > display = hexadecimal;
	// > Q = fpminimax(expm1(x)/x, 3, [\|D...\|], [-2^-8, 2^-8]);
	// > Q;
	double exp_lo =
	fputil::polyeval(xd, 0x1p0, 0x1.ffffffffff777p-1, 0x1.000000000071cp-1,
	0x1.555566668e5e7p-3, 0x1.55555555ef243p-5);
	return static_cast<float>(exp_hi * exp_mid * exp_lo);
	}			}

	} // namespace __llvm_libc			} // namespace __llvm_libc

libc/src/math/generic/expm1f.cpp

	//===-- Single-precision e^x - 1 function ---------------------------------===//			//===-- Single-precision e^x - 1 function ---------------------------------===//
	//			//
	// Part of the LLVM Project, under the Apache License v2.0 with LLVM Exceptions.			// Part of the LLVM Project, under the Apache License v2.0 with LLVM Exceptions.
	// See https://llvm.org/LICENSE.txt for license information.			// See https://llvm.org/LICENSE.txt for license information.
	// SPDX-License-Identifier: Apache-2.0 WITH LLVM-exception			// SPDX-License-Identifier: Apache-2.0 WITH LLVM-exception
	//			//
	//===----------------------------------------------------------------------===//			//===----------------------------------------------------------------------===//

	#include "src/math/expm1f.h"			#include "src/math/expm1f.h"
	#include "common_constants.h" // Lookup tables EXP_M1 and EXP_M2.			#include "common_constants.h" // Lookup tables EXP_M1 and EXP_M2.
				#include "expxf.h"
	#include "src/__support/FPUtil/BasicOperations.h"			#include "src/__support/FPUtil/BasicOperations.h"
	#include "src/__support/FPUtil/FEnvImpl.h"			#include "src/__support/FPUtil/FEnvImpl.h"
	#include "src/__support/FPUtil/FMA.h"			#include "src/__support/FPUtil/FMA.h"
	#include "src/__support/FPUtil/FPBits.h"			#include "src/__support/FPUtil/FPBits.h"
	#include "src/__support/FPUtil/PolyEval.h"			#include "src/__support/FPUtil/PolyEval.h"
	#include "src/__support/FPUtil/multiply_add.h"			#include "src/__support/FPUtil/multiply_add.h"
	#include "src/__support/FPUtil/nearest_integer.h"			#include "src/__support/FPUtil/nearest_integer.h"
	#include "src/__support/common.h"			#include "src/__support/common.h"

	#include <errno.h>			#include <errno.h>

	namespace __llvm_libc {			namespace __llvm_libc {

	LLVM_LIBC_FUNCTION(float, expm1f, (float x)) {			LLVM_LIBC_FUNCTION(float, expm1f, (float x)) {
	using FPBits = typename fputil::FPBits<float>;			using FPBits = typename fputil::FPBits<float>;
	FPBits xbits(x);			FPBits xbits(x);

	uint32_t x_u = xbits.uintval();			uint32_t x_u = xbits.uintval();
	uint32_t x_abs = x_u & 0x7fff'ffffU;			uint32_t x_abs = x_u & 0x7fff'ffffU;

	// Exceptional value			// When \|x\| >= 89, \|x\| < 2^-25, or x is nan or x < -18
	if (unlikely(x_u == 0x3e35'bec5U)) { // x = 0x1.6b7d8ap-3f			if (unlikely(x_abs >= 0x42b2'0000U \|\| x_abs <= 0x3280'0000U \|\|
	int round_mode = fputil::get_round();			x_u >= 0xc190'0000U)) {
	if (round_mode == FE_TONEAREST \|\| round_mode == FE_UPWARD)			// \|x\| < 2^-25
	return 0x1.8dbe64p-3f;			if (xbits.get_unbiased_exponent() <= 101) {
	return 0x1.8dbe62p-3f;			return unlikely(x_abs == 0) ? x : (x + 0.5 * x * x);
	}			}

	#if !defined(LIBC_TARGET_HAS_FMA)			// When x < log(2^-150) or nan
	if (unlikely(x_u == 0xbdc1'c6cbU)) { // x = -0x1.838d96p-4f			if (xbits.uintval() >= 0xc180'0000U) {
	int round_mode = fputil::get_round();
	if (round_mode == FE_TONEAREST \|\| round_mode == FE_DOWNWARD)
	return -0x1.71c884p-4f;
	return -0x1.71c882p-4f;
	}
	#endif // LIBC_TARGET_HAS_FMA

	// When \|x\| > 25*log(2), or nan
	if (unlikely(x_abs >= 0x418a'a123U)) {
	// x < log(2^-25)
	if (xbits.get_sign()) {
	// exp(-Inf) = 0			// exp(-Inf) = 0
	if (xbits.is_inf())			if (xbits.is_inf()) {
	return -1.0f;			return -1.0f;
				}

	// exp(nan) = nan			// exp(nan) = nan
	if (xbits.is_nan())			if (xbits.is_nan())
	return x;			return x;
	int round_mode = fputil::get_round();
	if (round_mode == FE_UPWARD \|\| round_mode == FE_TOWARDZERO)			return -1.0f + opt_barrier(FPBits(FPBits::MIN_NORMAL).get_val());
	return -0x1.ffff'fep-1f; // -1.0f + 0x1.0p-24f			}
	return -1.0f;
	} else {
	// x >= 89 or nan			// x >= 89 or nan
	if (xbits.uintval() >= 0x42b2'0000) {			if (!xbits.get_sign() && (xbits.uintval() >= 0x42b2'0000)) {
				// x is finite
	if (xbits.uintval() < 0x7f80'0000U) {			if (xbits.uintval() < 0x7f80'0000U) {
	int rounding = fputil::get_round();			int rounding = fputil::get_round();
	if (rounding == FE_DOWNWARD \|\| rounding == FE_TOWARDZERO)			if (unlikely(rounding == FE_DOWNWARD \|\| rounding == FE_TOWARDZERO))
	return static_cast<float>(FPBits(FPBits::MAX_NORMAL));			return FPBits(FPBits::MAX_NORMAL).get_val();

	errno = ERANGE;			errno = ERANGE;
	}			}
	return x + static_cast<float>(FPBits::inf());			// x is +inf or nan
	}			return x + FPBits::inf().get_val();
	}			}
	}			}

	// \|x\| < 2^-4			auto ep = __llvm_libc::exp_eval(x);
	if (x_abs < 0x3d80'0000U) {			return fputil::multiply_add(ep.mult_exp, ep.r, ep.mult_exp - 1.0);
	// \|x\| < 2^-25
	if (x_abs < 0x3300'0000U) {
	// x = -0.0f
	if (unlikely(xbits.uintval() == 0x8000'0000U))
	return x;
	// When \|x\| < 2^-25, the relative error of the approximation e^x - 1 ~ x
	// is:
	// \|(e^x - 1) - x\| / \|e^x - 1\| < \|x^2\| / \|x\|
	// = \|x\|
	// < 2^-25
	// < epsilon(1)/2.
	// So the correctly rounded values of expm1(x) are:
	// = x + eps(x) if rounding mode = FE_UPWARD,
	// or (rounding mode = FE_TOWARDZERO and x is
	// negative),
	// = x otherwise.
	// To simplify the rounding decision and make it more efficient, we use
	// fma(x, x, x) ~ x + x^2 instead.
	// Note: to use the formula x + x^2 to decide the correct rounding, we
	// do need fma(x, x, x) to prevent underflow caused by x*x when \|x\| <
	// 2^-76. For targets without FMA instructions, we simply use double for
	// intermediate results as it is more efficient than using an emulated
	// version of FMA.
	#if defined(LIBC_TARGET_HAS_FMA)
	return fputil::fma(x, x, x);
	#else
	double xd = x;
	return static_cast<float>(fputil::multiply_add(xd, xd, xd));
	#endif // LIBC_TARGET_HAS_FMA
	}

	// 2^-25 <= \|x\| < 2^-4
	double xd = static_cast<double>(x);
	double xsq = xd * xd;
	// Degree-8 minimax polynomial generated by Sollya with:
	// > display = hexadecimal;
	// > P = fpminimax((expm1(x) - x)/x^2, 6, [\|D...\|], [-2^-4, 2^-4]);
	double r =
	fputil::polyeval(xd, 0x1p-1, 0x1.55555555557ddp-3, 0x1.55555555552fap-5,
	0x1.111110fcd58b7p-7, 0x1.6c16c1717660bp-10,
	0x1.a0241f0006d62p-13, 0x1.a01e3f8d3c06p-16);
	return static_cast<float>(fputil::multiply_add(r, xsq, xd));
	}

	// For -18 < x < 89, to compute expm1(x), we perform the following range
	// reduction: find hi, mid, lo such that:
	// x = hi + mid + lo, in which
	// hi is an integer,
	// mid * 2^7 is an integer
	// -2^(-8) <= lo < 2^-8.
	// In particular,
	// hi + mid = round(x * 2^7) * 2^(-7).
	// Then,
	// expm1(x) = exp(hi + mid + lo) - 1 = exp(hi) * exp(mid) * exp(lo) - 1.
	// We store exp(hi) and exp(mid) in the lookup tables EXP_M1 and EXP_M2
	// respectively. exp(lo) is computed using a degree-4 minimax polynomial
	// generated by Sollya.

	// x_hi = hi + mid.
	float kf = fputil::nearest_integer(x * 0x1.0p7f);
	int x_hi = static_cast<int>(kf);
	// Subtract (hi + mid) from x to get lo.
	double xd = static_cast<double>(fputil::multiply_add(kf, -0x1.0p-7f, x));
	x_hi += 104 << 7;
	// hi = x_hi >> 7
	double exp_hi = EXP_M1[x_hi >> 7];
	// lo = x_hi & 0x0000'007fU;
	double exp_mid = EXP_M2[x_hi & 0x7f];
	double exp_hi_mid = exp_hi * exp_mid;
	// Degree-4 minimax polynomial generated by Sollya with the following
	// commands:
	// > display = hexadecimal;
	// > Q = fpminimax(expm1(x)/x, 3, [\|D...\|], [-2^-8, 2^-8]);
	// > Q;
	double exp_lo =
	fputil::polyeval(xd, 0x1.0p0, 0x1.ffffffffff777p-1, 0x1.000000000071cp-1,
	0x1.555566668e5e7p-3, 0x1.55555555ef243p-5);
	return static_cast<float>(fputil::multiply_add(exp_hi_mid, exp_lo, -1.0));
	}			}

	} // namespace __llvm_libc			} // namespace __llvm_libc