# [libc][math] Improve sinhf and coshf performance.ClosedPublic

Authored by lntue on Sep 14 2022, 10:02 PM.

# Details

Reviewers
 michaelrj sivachandra orex zimmermann6
Commits
rG1c89ae71ea69: [libc][math] Improve sinhf and coshf performance.
Summary

Optimize sinhf and coshf by computing exp(x) and exp(-x) simultaneously.

Currently sinhf and coshf are implemented using the following formulas:

sinh(x) = 0.5 *(exp(x) - 1) - 0.5*(exp(-x) - 1)
cosh(x) = 0.5*exp(x) + 0.5*exp(-x)

where exp(x) and exp(-x) are calculated separately using the formula:

exp(x) ~ 2^hi * 2^mid * exp(dx)
~ 2^hi * 2^mid * P(dx)

By expanding the polynomial P(dx) into even and odd parts

P(dx) = P_even(dx) + dx * P_odd(dx)

we can see that the computations of exp(x) and exp(-x) have many things in common,
namely:

exp(x)  ~ 2^(hi + mid) * (P_even(dx) + dx * P_odd(dx))
exp(-x) ~ 2^(-(hi + mid)) * (P_even(dx) - dx * P_odd(dx))

Expanding sinh(x) and cosh(x) with respect to the above formulas, we can compute
these two functions as follow in order to maximize the sharing parts:

sinh(x) = (e^x - e^(-x)) / 2
~ 0.5 * (P_even * (2^(hi + mid) - 2^(-(hi + mid))) +
dx * P_odd * (2^(hi + mid) + 2^(-(hi + mid))))
cosh(x) = (e^x + e^(-x)) / 2
~ 0.5 * (P_even * (2^(hi + mid) + 2^(-(hi + mid))) +
dx * P_odd * (2^(hi + mid) - 2^(-(hi + mid))))

So in this patch, we perform the following optimizations for sinhf and coshf:

1. Use the above formulas to maximize sharing intermediate results,
2. Apply similar optimizations from https://reviews.llvm.org/D133870

Performance benchmark using perf tool from the CORE-MATH project on Ryzen 1700:
For sinhf:

$CORE_MATH_PERF_MODE="rdtsc" ./perf.sh sinhf GNU libc version: 2.35 GNU libc release: stable CORE-MATH reciprocal throughput : 16.718 System LIBC reciprocal throughput : 63.151 BEFORE: LIBC reciprocal throughput : 90.116 LIBC reciprocal throughput : 28.554 (with -msse4.2 flag) LIBC reciprocal throughput : 22.577 (with -mfma flag) AFTER: LIBC reciprocal throughput : 36.482 LIBC reciprocal throughput : 16.955 (with -msse4.2 flag) LIBC reciprocal throughput : 13.943 (with -mfma flag)$ CORE_MATH_PERF_MODE="rdtsc" ./perf.sh sinhf --latency
GNU libc version: 2.35
GNU libc release: stable
CORE-MATH latency   : 48.821
System LIBC latency : 137.019

BEFORE
LIBC latency        : 97.122
LIBC latency        : 84.214    (with -msse4.2 flag)
LIBC latency        : 71.611    (with -mfma flag)

AFTER
LIBC latency        : 54.555
LIBC latency        : 50.865    (with -msse4.2 flag)
LIBC latency        : 48.700    (with -mfma flag)

For coshf:

$CORE_MATH_PERF_MODE="rdtsc" ./perf.sh coshf GNU libc version: 2.35 GNU libc release: stable CORE-MATH reciprocal throughput : 16.939 System LIBC reciprocal throughput : 19.695 BEFORE: LIBC reciprocal throughput : 52.845 LIBC reciprocal throughput : 29.174 (with -msse4.2 flag) LIBC reciprocal throughput : 22.553 (with -mfma flag) AFTER: LIBC reciprocal throughput : 37.169 LIBC reciprocal throughput : 17.805 (with -msse4.2 flag) LIBC reciprocal throughput : 14.691 (with -mfma flag)$ CORE_MATH_PERF_MODE="rdtsc" ./perf.sh coshf --latency
GNU libc version: 2.35
GNU libc release: stable
CORE-MATH latency   : 48.478
System LIBC latency : 48.044

BEFORE
LIBC latency        : 99.123
LIBC latency        : 85.595    (with -msse4.2 flag)
LIBC latency        : 72.776    (with -mfma flag)

AFTER
LIBC latency        : 57.760
LIBC latency        : 53.967    (with -msse4.2 flag)
LIBC latency        : 50.987    (with -mfma flag)

# Diff Detail

### Event Timeline

lntue created this revision.Sep 14 2022, 10:02 PM
Herald added projects: Restricted Project, Restricted Project. Sep 14 2022, 10:02 PM
Herald added subscribers: ecnelises, tschuett, mgorny.
lntue requested review of this revision.Sep 14 2022, 10:02 PM
lntue edited the summary of this revision. (Show Details)Sep 14 2022, 10:46 PM
zimmermann6 accepted this revision.Sep 15 2022, 1:40 AM

this is very clever! I confirm the speed improvement (and still correct rounding by exhaustive search).

This revision is now accepted and ready to land.Sep 15 2022, 1:40 AM
orex accepted this revision.Sep 15 2022, 1:42 AM
This revision was automatically updated to reflect the committed changes.