This is an archive of the discontinued LLVM Phabricator instance.

[libc] Use nearest_integer instructions to improve expf performance.
ClosedPublic

Authored by lntue on Jul 25 2022, 9:26 AM.

Download Raw Diff

Details

Reviewers

michaelrj
sivachandra
orex
zimmermann6

Commits

rG91ee67206289: [libc] Use nearest_integer instructions to improve expf performance.

Summary

Use nearest_integer instructions to improve expf performance.

Performance tests with CORE-MATH's perf tool:

Before the patch:

$ ./perf.sh expf
LIBC-location: /home/lnt/experiment/llvm-project/build/projects/libc/lib/libllvmlibc.a
GNU libc version: 2.31
GNU libc release: stable
CORE-MATH reciprocal throughput   : 9.860
System LIBC reciprocal throughput : 7.728
LIBC reciprocal throughput        : 12.363

$ ./perf.sh expf --latency
LIBC-location: /home/lnt/experiment/llvm-project/build/projects/libc/lib/libllvmlibc.a
GNU libc version: 2.31
GNU libc release: stable
CORE-MATH latency   : 42.802
System LIBC latency : 35.941
LIBC latency        : 49.808

After the patch:

$ ./perf.sh expf
LIBC-location: /home/lnt/experiment/llvm/llvm-project/build/projects/libc/lib/libllvmlibc.a
GNU libc version: 2.31
GNU libc release: stable
CORE-MATH reciprocal throughput   : 9.441
System LIBC reciprocal throughput : 7.382
LIBC reciprocal throughput        : 8.843

$ ./perf.sh expf --latency
LIBC-location: /home/lnt/experiment/llvm/llvm-project/build/projects/libc/lib/libllvmlibc.a
GNU libc version: 2.31
GNU libc release: stable
CORE-MATH latency   : 44.192
System LIBC latency : 37.693
LIBC latency        : 44.145

Diff Detail

Repository: rG LLVM Github Monorepo

Event Timeline

lntue created this revision.Jul 25 2022, 9:26 AM

Herald added projects: Restricted Project, Restricted Project. · View Herald TranscriptJul 25 2022, 9:26 AM

Herald added subscribers: ecnelises, tschuett, mgorny. · View Herald Transcript

Update math status page.

lntue edited the summary of this revision. (Show Details)Jul 25 2022, 9:30 AM

Harbormaster completed remote builds in B177405: Diff 447373.Jul 25 2022, 9:34 AM

michaelrj added inline comments.Jul 25 2022, 10:47 AM

libc/src/math/generic/CMakeLists.txt
486	you should probably also explicitly include multiply add

Add multiply_add to dependency.

lntue marked an inline comment as done.Jul 25 2022, 11:14 AM

Harbormaster completed remote builds in B177432: Diff 447412.Jul 25 2022, 11:20 AM

I confirm it is still correctly rounded, and now faster than CORE-MATH. Nice work!

This revision is now accepted and ready to land.Jul 26 2022, 1:09 AM

Closed by commit rG91ee67206289: [libc] Use nearest_integer instructions to improve expf performance. (authored by lntue). · Explain WhyJul 26 2022, 6:11 AM

This revision was automatically updated to reflect the committed changes.

lntue added a commit: rG91ee67206289: [libc] Use nearest_integer instructions to improve expf performance..

Revision Contents

Path

Size

libc/

docs/

math.rst

2 lines

src/

math/

generic/

CMakeLists.txt

1 line

expf.cpp

12 lines

Diff 447373

libc/docs/math.rst

	Show First 20 Lines • Show All 193 Lines • ▼ Show 20 Lines

	+--------------+-------------------------------+-------------------------------+-------------------------------------+---------------------------------------------------------------------+			+--------------+-------------------------------+-------------------------------+-------------------------------------+---------------------------------------------------------------------+
	\| <Func> \| Reciprocal throughput (ns) \| Latency (ns) \| Testing ranges \| Testing configuration \|			\| <Func> \| Reciprocal throughput (ns) \| Latency (ns) \| Testing ranges \| Testing configuration \|
	\| +-----------+-------------------+-----------+-------------------+ +------------+-------------------------+--------------+---------------+			\| +-----------+-------------------+-----------+-------------------+ +------------+-------------------------+--------------+---------------+
	\| \| LLVM libc \| Reference (glibc) \| LLVM libc \| Reference (glibc) \| \| CPU \| OS \| Compiler \| Special flags \|			\| \| LLVM libc \| Reference (glibc) \| LLVM libc \| Reference (glibc) \| \| CPU \| OS \| Compiler \| Special flags \|
	+==============+===========+===================+===========+===================+=====================================+============+=========================+==============+===============+			+==============+===========+===================+===========+===================+=====================================+============+=========================+==============+===============+
	\| cosf \| 37 \| 32 \| 73 \| 72 \| :math:`[0, 2\pi]` \| Ryzen 1700 \| Ubuntu 20.04 LTS x86_64 \| Clang 12.0.0 \| \|			\| cosf \| 37 \| 32 \| 73 \| 72 \| :math:`[0, 2\pi]` \| Ryzen 1700 \| Ubuntu 20.04 LTS x86_64 \| Clang 12.0.0 \| \|
	+--------------+-----------+-------------------+-----------+-------------------+-------------------------------------+------------+-------------------------+--------------+---------------+			+--------------+-----------+-------------------+-----------+-------------------+-------------------------------------+------------+-------------------------+--------------+---------------+
	\| expf \| 14 \| 9 \| 58 \| 42 \| :math:`[-10, 10]` \| Ryzen 1700 \| Ubuntu 20.04 LTS x86_64 \| Clang 12.0.0 \| FMA \|			\| expf \| 9 \| 7 \| 44 \| 38 \| :math:`[-10, 10]` \| Ryzen 1700 \| Ubuntu 20.04 LTS x86_64 \| Clang 12.0.0 \| FMA \|
	+--------------+-----------+-------------------+-----------+-------------------+-------------------------------------+------------+-------------------------+--------------+---------------+			+--------------+-----------+-------------------+-----------+-------------------+-------------------------------------+------------+-------------------------+--------------+---------------+
	\| exp2f \| 25 \| 8 \| 81 \| 37 \| :math:`[-10, 10]` \| Ryzen 1700 \| Ubuntu 20.04 LTS x86_64 \| Clang 12.0.0 \| FMA \|			\| exp2f \| 25 \| 8 \| 81 \| 37 \| :math:`[-10, 10]` \| Ryzen 1700 \| Ubuntu 20.04 LTS x86_64 \| Clang 12.0.0 \| FMA \|
	+--------------+-----------+-------------------+-----------+-------------------+-------------------------------------+------------+-------------------------+--------------+---------------+			+--------------+-----------+-------------------+-----------+-------------------+-------------------------------------+------------+-------------------------+--------------+---------------+
	\| expm1f \| 14 \| 53 \| 59 \| 146 \| :math:`[-10, 10]` \| Ryzen 1700 \| Ubuntu 20.04 LTS x86_64 \| Clang 12.0.0 \| FMA \|			\| expm1f \| 14 \| 53 \| 59 \| 146 \| :math:`[-10, 10]` \| Ryzen 1700 \| Ubuntu 20.04 LTS x86_64 \| Clang 12.0.0 \| FMA \|
	+--------------+-----------+-------------------+-----------+-------------------+-------------------------------------+------------+-------------------------+--------------+---------------+			+--------------+-----------+-------------------+-----------+-------------------+-------------------------------------+------------+-------------------------+--------------+---------------+
	\| fmodf \| 73 \| 263 \| - \| - \| [MIN_NORMAL, MAX_NORMAL] \| i5 mobile \| Ubuntu 20.04 LTS x86_64 \| Clang 12.0.0 \| \|			\| fmodf \| 73 \| 263 \| - \| - \| [MIN_NORMAL, MAX_NORMAL] \| i5 mobile \| Ubuntu 20.04 LTS x86_64 \| Clang 12.0.0 \| \|
	\| +-----------+-------------------+-----------+-------------------+-------------------------------------+------------+-------------------------+--------------+---------------+			\| +-----------+-------------------+-----------+-------------------+-------------------------------------+------------+-------------------------+--------------+---------------+
	\| \| 9 \| 11 \| - \| - \| [0, MAX_SUBNORMAL] \| i5 mobile \| Ubuntu 20.04 LTS x86_64 \| Clang 12.0.0 \| \|			\| \| 9 \| 11 \| - \| - \| [0, MAX_SUBNORMAL] \| i5 mobile \| Ubuntu 20.04 LTS x86_64 \| Clang 12.0.0 \| \|
	Show All 27 Lines

libc/src/math/generic/CMakeLists.txt

Show First 20 Lines • Show All 477 Lines • ▼ Show 20 Lines	add_entrypoint_object(
expf		expf
SRCS		SRCS
expf.cpp		expf.cpp
HDRS		HDRS
../expf.h		../expf.h
DEPENDS		DEPENDS
.common_constants		.common_constants
libc.src.__support.FPUtil.fputil		libc.src.__support.FPUtil.fputil
		libc.src.__support.FPUtil.nearest_integer
		michaelrjUnsubmitted Done Reply Inline Actions you should probably also explicitly include multiply add michaelrj: you should probably also explicitly include multiply add
libc.src.__support.FPUtil.polyeval		libc.src.__support.FPUtil.polyeval
libc.include.math		libc.include.math
COMPILE_OPTIONS		COMPILE_OPTIONS
-O3		-O3
)		)

add_entrypoint_object(		add_entrypoint_object(
exp2f		exp2f
▲ Show 20 Lines • Show All 630 Lines • Show Last 20 Lines

libc/src/math/generic/expf.cpp

//===-- Single-precision e^x function -------------------------------------===//		//===-- Single-precision e^x function -------------------------------------===//
//		//
// Part of the LLVM Project, under the Apache License v2.0 with LLVM Exceptions.		// Part of the LLVM Project, under the Apache License v2.0 with LLVM Exceptions.
// See https://llvm.org/LICENSE.txt for license information.		// See https://llvm.org/LICENSE.txt for license information.
// SPDX-License-Identifier: Apache-2.0 WITH LLVM-exception		// SPDX-License-Identifier: Apache-2.0 WITH LLVM-exception
//		//
//===----------------------------------------------------------------------===//		//===----------------------------------------------------------------------===//

#include "src/math/expf.h"		#include "src/math/expf.h"
#include "common_constants.h" // Lookup tables EXP_M1 and EXP_M2.		#include "common_constants.h" // Lookup tables EXP_M1 and EXP_M2.
#include "src/__support/FPUtil/BasicOperations.h"		#include "src/__support/FPUtil/BasicOperations.h"
#include "src/__support/FPUtil/FEnvImpl.h"		#include "src/__support/FPUtil/FEnvImpl.h"
#include "src/__support/FPUtil/FMA.h"
#include "src/__support/FPUtil/FPBits.h"		#include "src/__support/FPUtil/FPBits.h"
#include "src/__support/FPUtil/PolyEval.h"		#include "src/__support/FPUtil/PolyEval.h"
		#include "src/__support/FPUtil/multiply_add.h"
		#include "src/__support/FPUtil/nearest_integer.h"
#include "src/__support/common.h"		#include "src/__support/common.h"

#include <errno.h>		#include <errno.h>

namespace __llvm_libc {		namespace __llvm_libc {

LLVM_LIBC_FUNCTION(float, expf, (float x)) {		LLVM_LIBC_FUNCTION(float, expf, (float x)) {
using FPBits = typename fputil::FPBits<float>;		using FPBits = typename fputil::FPBits<float>;
▲ Show 20 Lines • Show All 51 Lines • ▼ Show 20 Lines	LLVM_LIBC_FUNCTION(float, expf, (float x)) {
// hi + mid = round(x * 2^7) * 2^(-7).		// hi + mid = round(x * 2^7) * 2^(-7).
// Then,		// Then,
// exp(x) = exp(hi + mid + lo) = exp(hi) * exp(mid) * exp(lo).		// exp(x) = exp(hi + mid + lo) = exp(hi) * exp(mid) * exp(lo).
// We store exp(hi) and exp(mid) in the lookup tables EXP_M1 and EXP_M2		// We store exp(hi) and exp(mid) in the lookup tables EXP_M1 and EXP_M2
// respectively. exp(lo) is computed using a degree-4 minimax polynomial		// respectively. exp(lo) is computed using a degree-4 minimax polynomial
// generated by Sollya.		// generated by Sollya.

// x_hi = (hi + mid) * 2^7 = round(x * 2^7).		// x_hi = (hi + mid) * 2^7 = round(x * 2^7).
// The default rounding mode for float-to-int conversion in C++ is		float kf = fputil::nearest_integer(x * 0x1.0p7f);
// round-toward-zero. To make it round-to-nearest, we add (-1)^sign(x) * 0.5
// before conversion.
int x_hi = static_cast<int>(x * 0x1.0p7f + (xbits.get_sign() ? -0.5f : 0.5f));
// Subtract (hi + mid) from x to get lo.		// Subtract (hi + mid) from x to get lo.
x -= static_cast<float>(x_hi) * 0x1.0p-7f;		double xd = static_cast<double>(fputil::multiply_add(kf, -0x1.0p-7f, x));
double xd = static_cast<double>(x);		int x_hi = static_cast<int>(kf);
x_hi += 104 << 7;		x_hi += 104 << 7;
// hi = x_hi >> 7		// hi = x_hi >> 7
double exp_hi = EXP_M1[x_hi >> 7];		double exp_hi = EXP_M1[x_hi >> 7];
// mid * 2^7 = x_hi & 0x0000'007fU;		// mid * 2^7 = x_hi & 0x0000'007fU;
double exp_mid = EXP_M2[x_hi & 0x7f];		double exp_mid = EXP_M2[x_hi & 0x7f];
// Degree-4 minimax polynomial generated by Sollya with the following		// Degree-4 minimax polynomial generated by Sollya with the following
// commands:		// commands:
// > display = hexadecimal;		// > display = hexadecimal;
Show All 9 Lines