Simplify tanf implementation and improve its performance.
Completely reuse the implementation of sinf, cosf, sincosf and use
the definition tan(x) = sin(x)/cos(x).
Performance benchmark using perf tool from the CORE-MATH project on Ryzen 1700:
$ CORE_MATH_PERF_MODE="rdtsc" ./perf.sh tanf GNU libc version: 2.35 GNU libc release: stable CORE-MATH reciprocal throughput : 18.558 System LIBC reciprocal throughput : 49.919 BEFORE: LIBC reciprocal throughput : 36.480 LIBC reciprocal throughput : 27.217 (with `-msse4.2` flag) LIBC reciprocal throughput : 20.205 (with `-mfma` flag) AFTER: LIBC reciprocal throughput : 30.337 LIBC reciprocal throughput : 21.072 (with `-msse4.2` flag) LIBC reciprocal throughput : 15.804 (with `-mfma` flag) $ CORE_MATH_PERF_MODE="rdtsc" ./perf.sh tanf --latency GNU libc version: 2.35 GNU libc release: stable CORE-MATH latency : 56.702 System LIBC latency : 107.206 BEFORE LIBC latency : 97.598 LIBC latency : 91.119 (with `-msse4.2` flag) LIBC latency : 82.655 (with `-mfma` flag) AFTER LIBC latency : 74.560 LIBC latency : 66.575 (with `-msse4.2` flag) LIBC latency : 61.636 (with `-mfma` flag)