This is an archive of the discontinued LLVM Phabricator instance.

[libc] Implement tanf function correctly rounded for all rounding modes.
ClosedPublic

Authored by lntue on Aug 11 2022, 1:30 PM.

Details

Summary

Implement tanf function correctly rounded for all rounding modes.

We use the range reduction that is shared with sinf, cosf, and sincosf:

k = round(x * 32/pi) and y = x * (32/pi) - k.

Then we use the tangent of sum formula:

tan(x) = tan((k + y)* pi/32) = tan((k mod 32) * pi / 32 + y * pi/32)
       = (tan((k mod 32) * pi/32) + tan(y * pi/32)) / (1 - tan((k mod 32) * pi/32) * tan(y * pi/32))

We need to make a further reduction when k mod 32 >= 16 due to the pole at pi/2 of tan(x) function:

if (k mod 32 >= 16): k = k - 31, y = y - 1.0

And to compute the final result, we store tan(k * pi/32) for k = -15..15 in a table of 32 double values,
and evaluate tan(y * pi/32) with a degree-11 minimax odd polynomial generated by Sollya with:

>  P = fpminimax(tan(y * pi/32)/y, [|0, 2, 4, 6, 8, 10|], [|D...|], [0, 1.5]);

Performance benchmark using perf tool from the CORE-MATH project on Ryzen 1700:

$ CORE_MATH_PERF_MODE="rdtsc" ./perf.sh tanf
CORE-MATH reciprocal throughput   : 18.586
System LIBC reciprocal throughput : 50.068

LIBC reciprocal throughput        : 33.823
LIBC reciprocal throughput        : 25.161     (with `-msse4.2` flag)
LIBC reciprocal throughput        : 19.157     (with `-mfma` flag)

$ CORE_MATH_PERF_MODE="rdtsc" ./perf.sh tanf --latency
GNU libc version: 2.31
GNU libc release: stable
CORE-MATH latency   : 55.630
System LIBC latency : 106.264

LIBC latency        : 96.060
LIBC latency        : 90.727    (with `-msse4.2` flag)
LIBC latency        : 82.361    (with `-mfma` flag)

Diff Detail

Event Timeline

lntue created this revision.Aug 11 2022, 1:30 PM
Herald added projects: Restricted Project, Restricted Project. · View Herald TranscriptAug 11 2022, 1:30 PM
lntue requested review of this revision.Aug 11 2022, 1:30 PM
lntue edited the summary of this revision. (Show Details)Aug 11 2022, 7:56 PM
orex accepted this revision.Aug 12 2022, 1:39 AM
This revision is now accepted and ready to land.Aug 12 2022, 1:39 AM