# [libc][math] Implement erff function correctly rounded to all rounding modes.ClosedPublicActions

Authored by lntue on Jun 23 2023, 9:11 PM.

# Details

Reviewers
 michaelrj sivachandra renyichen zimmermann6
Commits
rGf320fefc4ad0: [libc][math] Implement erff function correctly rounded to all rounding modes.
Summary

Implement correctly rounded erff functions.

For x >= 4, erff(x) = 1 for FE_TONEAREST or FE_UPWARD, 0x1.ffffep-1 for FE_DOWNWARD or FE_TOWARDZERO.

For 0 <= x < 4, we divide into 32 sub-intervals of length 1/8, and use a degree-15 odd polynomial to approximate erff(x) in each sub-interval:

`erff(x) ~ x * (c0 + c1 * x^2 + c2 * x^4 + ... + c7 * x^14).`

For x < 0, we can use the same formula as above, since the odd part is factored out.

Performance tested with perf.sh tool from the CORE-MATH project on AMD Ryzen 9 5900X:

Reciprocal throughput (clock cycles / op)

```\$ ./perf.sh erff --path2
GNU libc version: 2.35
GNU libc release: stable
-- CORE-MATH reciprocal throughput --  with -march=native      (with FMA instructions)
[####################] 100 %
Ntrial = 20 ; Min = 11.790 + 0.182 clc/call; Median-Min = 0.154 clc/call; Max = 12.255 clc/call;
-- CORE-MATH reciprocal throughput --  with -march=x86-64-v2      (without FMA instructions)
[####################] 100 %
Ntrial = 20 ; Min = 14.205 + 0.151 clc/call; Median-Min = 0.159 clc/call; Max = 15.893 clc/call;

-- System LIBC reciprocal throughput --
[####################] 100 %
Ntrial = 20 ; Min = 45.519 + 0.445 clc/call; Median-Min = 0.552 clc/call; Max = 46.345 clc/call;

-- LIBC reciprocal throughput --  with -mavx2 -mfma     (with FMA instructions)
[####################] 100 %
Ntrial = 20 ; Min = 9.595 + 0.214 clc/call; Median-Min = 0.220 clc/call; Max = 9.887 clc/call;
-- LIBC reciprocal throughput --  with -msse4.2     (without FMA instructions)
[####################] 100 %
Ntrial = 20 ; Min = 10.223 + 0.190 clc/call; Median-Min = 0.222 clc/call; Max = 10.474 clc/call;```

and latency (clock cycles / op):

```\$ ./perf.sh erff --path2
GNU libc version: 2.35
GNU libc release: stable
-- CORE-MATH latency --  with -march=native      (with FMA instructions)
[####################] 100 %
Ntrial = 20 ; Min = 38.566 + 0.391 clc/call; Median-Min = 0.503 clc/call; Max = 39.170 clc/call;
-- CORE-MATH latency --  with -march=x86-64-v2      (without FMA instructions)
[####################] 100 %
Ntrial = 20 ; Min = 43.223 + 0.667 clc/call; Median-Min = 0.680 clc/call; Max = 43.913 clc/call;

-- System LIBC latency --
[####################] 100 %
Ntrial = 20 ; Min = 111.613 + 1.267 clc/call; Median-Min = 1.696 clc/call; Max = 113.444 clc/call;

-- LIBC latency --  with -mavx2 -mfma     (with FMA instructions)
[####################] 100 %
Ntrial = 20 ; Min = 40.138 + 0.410 clc/call; Median-Min = 0.536 clc/call; Max = 40.729 clc/call;
-- LIBC latency --  with -msse4.2     (without FMA instructions)
[####################] 100 %
Ntrial = 20 ; Min = 44.858 + 0.872 clc/call; Median-Min = 0.814 clc/call; Max = 46.019 clc/call;```

# Diff Detail

### Event Timeline

lntue created this revision.Jun 23 2023, 9:11 PM
Herald added projects: Restricted Project, Restricted Project. Jun 23 2023, 9:11 PM
Herald added subscribers: luke, frasercrmck, luismarques and 20 others.
lntue requested review of this revision.Jun 23 2023, 9:11 PM
Herald added a subscriber: wangpc. Jun 23 2023, 9:11 PM
lntue edited the summary of this revision. (Show Details)Jun 23 2023, 9:33 PM
Herald added a subscriber: pengfei. Jun 23 2023, 9:34 PM
lntue updated this revision to Diff 534154.Jun 23 2023, 9:41 PM

Fix bazel dependency.

lntue updated this revision to Diff 534650.Jun 26 2023, 10:52 AM

overall LGTM with a nit.

libc/test/src/math/erff_test.cpp
77

does this still need to be here?

lntue updated this revision to Diff 535387.Jun 28 2023, 7:08 AM

Remove unneeded assertion commented out in the unit test.

lntue marked an inline comment as done.Jun 28 2023, 7:08 AM
michaelrj accepted this revision.Jun 28 2023, 10:33 AM
This revision is now accepted and ready to land.Jun 28 2023, 10:33 AM
This revision was automatically updated to reflect the committed changes.

I'm not sure this is related to this ticket, but I cannot compile main any more under Ubuntu:

```In file included from /usr/bin/../lib/gcc/x86_64-linux-gnu/13/../../../../include/c++/13/bits/specfun.h:50:
/usr/bin/../lib/gcc/x86_64-linux-gnu/13/../../../../include/c++/13/tr1/ell_integral.tcc:709:14: error: no member named '__throw_domain_error' in namespace 'std'; did you mean '__throw_runtime_error'?
~~~~~^
/usr/bin/../lib/gcc/x86_64-linux-gnu/13/../../../../include/c++/13/bits/functexcept.h:131:3: note: '__throw_runtime_error' declared here
__throw_runtime_error(const char*)
^
fatal error: too many errors emitted, stopping now [-ferror-limit=]```

It seems clang-14 depends on libstdc++-12-dev, but uses features from libstdc++-13-dev...

lntue added a comment.Jul 4 2023, 10:08 PM

I'm not sure this is related to this ticket, but I cannot compile main any more under Ubuntu:

```In file included from /usr/bin/../lib/gcc/x86_64-linux-gnu/13/../../../../include/c++/13/bits/specfun.h:50:
/usr/bin/../lib/gcc/x86_64-linux-gnu/13/../../../../include/c++/13/tr1/ell_integral.tcc:709:14: error: no member named '__throw_domain_error' in namespace 'std'; did you mean '__throw_runtime_error'?
~~~~~^
/usr/bin/../lib/gcc/x86_64-linux-gnu/13/../../../../include/c++/13/bits/functexcept.h:131:3: note: '__throw_runtime_error' declared here
__throw_runtime_error(const char*)
^
fatal error: too many errors emitted, stopping now [-ferror-limit=]```

It seems clang-14 depends on libstdc++-12-dev, but uses features from libstdc++-13-dev...

@zimmermann6 :
I don't think it's related to this patch at all, but recently I had problems building with clang on ubuntu also. I found out the root cause for my problems was that after an update,
clang and clang++ versions were mismatched and some dependency were missing. After running apt install / update clang and clang++, the problems went away. You can try to see if it works on your system.