This is an archive of the discontinued LLVM Phabricator instance.

[libc][math] Implement double precision log function correctly rounded to all rounding modes.
ClosedPublic

Authored by lntue on May 8 2023, 11:06 AM.

Details

Summary

Implement double precision log function correctly rounded to all
rounding modes.

See https://reviews.llvm.org/D150014 for a more detail description of the algorithm.

Performance

  • For 0.5 <= x <= 2, the fast pass hitting rate is about 99.93%.
  • Reciprocal throughput from CORE-MATH's perf tool on Ryzen 5900X:
$ ./perf.sh log
GNU libc version: 2.35
GNU libc release: stable

-- CORE-MATH reciprocal throughput -- with FMA
[####################] 100 %
Ntrial = 20 ; Min = 17.465 + 0.596 clc/call; Median-Min = 0.602 clc/call; Max = 18.389 clc/call;

-- CORE-MATH reciprocal throughput -- without FMA (-march=x86-64-v2)
[####################] 100 %
Ntrial = 20 ; Min = 54.961 + 2.606 clc/call; Median-Min = 2.180 clc/call; Max = 59.583 clc/call;

-- System LIBC reciprocal throughput --
[####################] 100 %
Ntrial = 20 ; Min = 12.608 + 0.276 clc/call; Median-Min = 0.359 clc/call; Max = 13.147 clc/call;

-- LIBC reciprocal throughput -- with FMA
[####################] 100 %
Ntrial = 20 ; Min = 20.952 + 0.468 clc/call; Median-Min = 0.602 clc/call; Max = 21.881 clc/call;

-- LIBC reciprocal throughput -- without FMA
[####################] 100 %
Ntrial = 20 ; Min = 18.569 + 0.552 clc/call; Median-Min = 0.601 clc/call; Max = 19.259 clc/call;
  • Latency from CORE-MATH's perf tool on Ryzen 5900X:
$ ./perf.sh log --latency
GNU libc version: 2.35
GNU libc release: stable

-- CORE-MATH latency -- with FMA
[####################] 100 %
Ntrial = 20 ; Min = 48.431 + 0.699 clc/call; Median-Min = 0.073 clc/call; Max = 51.269 clc/call;

-- CORE-MATH latency -- without FMA (-march=x86-64-v2)
[####################] 100 %
Ntrial = 20 ; Min = 64.865 + 3.235 clc/call; Median-Min = 3.475 clc/call; Max = 71.788 clc/call;

-- System LIBC latency --
[####################] 100 %
Ntrial = 20 ; Min = 42.151 + 2.090 clc/call; Median-Min = 2.270 clc/call; Max = 44.773 clc/call;

-- LIBC latency -- with FMA
[####################] 100 %
Ntrial = 20 ; Min = 35.266 + 0.479 clc/call; Median-Min = 0.373 clc/call; Max = 36.798 clc/call;

-- LIBC latency -- without FMA
[####################] 100 %
Ntrial = 20 ; Min = 48.518 + 0.484 clc/call; Median-Min = 0.500 clc/call; Max = 49.896 clc/call;
  • Accurate pass latency:
$ ./perf.sh log --latency --simple_stat
GNU libc version: 2.35
GNU libc release: stable

-- CORE-MATH latency -- with FMA
598.306

-- CORE-MATH latency -- without FMA (-march=x86-64-v2)
632.925

-- LIBC latency -- with FMA
455.632

-- LIBC latency -- without FMA
488.564

Diff Detail

Event Timeline

lntue created this revision.May 8 2023, 11:06 AM
Herald added projects: Restricted Project, Restricted Project. · View Herald TranscriptMay 8 2023, 11:06 AM
lntue requested review of this revision.May 8 2023, 11:06 AM
lntue edited the summary of this revision. (Show Details)May 8 2023, 11:24 AM
lntue updated this revision to Diff 520727.May 9 2023, 9:22 AM

Update error bounds for fast pass and sync to HEAD.

lntue updated this revision to Diff 524104.May 21 2023, 7:53 AM

Sync to HEAD.

zimmermann6 requested changes to this revision.May 23 2023, 6:29 AM

I get failures when I try to apply this patch to head (revision 7489301):

patching file libc/config/darwin/arm/entrypoints.txt
patching file libc/config/linux/aarch64/entrypoints.txt
patching file libc/config/linux/x86_64/entrypoints.txt
patching file libc/config/windows/entrypoints.txt
patching file libc/spec/stdc.td
patching file libc/src/math/CMakeLists.txt
patching file libc/src/math/generic/CMakeLists.txt
Hunk #1 succeeded at 784 with fuzz 2 (offset 16 lines).
Hunk #2 FAILED at 785.
Hunk #3 succeeded at 846 (offset -3 lines).
1 out of 3 hunks FAILED -- saving rejects to file libc/src/math/generic/CMakeLists.txt.rej
patching file libc/src/math/generic/common_constants.h
Hunk #1 FAILED at 39.
1 out of 1 hunk FAILED -- saving rejects to file libc/src/math/generic/common_constants.h.rej
patching file libc/src/math/generic/common_constants.cpp
Hunk #1 succeeded at 196 with fuzz 2 (offset -169 lines).
patching file libc/src/math/generic/log.cpp
patching file libc/src/math/generic/log10.cpp
Hunk #1 FAILED at 17.
Hunk #2 FAILED at 41.
Hunk #3 FAILED at 785.
Hunk #4 FAILED at 944.
4 out of 4 hunks FAILED -- saving rejects to file libc/src/math/generic/log10.cpp.rej
patching file libc/src/math/generic/log_range_reduction.h
patching file libc/src/math/log.h
patching file libc/test/src/math/CMakeLists.txt
patching file libc/test/src/math/log_test.cpp
This revision now requires changes to proceed.May 23 2023, 6:29 AM
lntue updated this revision to Diff 524702.May 23 2023, 7:20 AM

Sync to HEAD.

zimmermann6 accepted this revision.May 23 2023, 7:30 AM

it works now, thanks. I get for the reciprocal throughput on a AMD EPYC 7282:

GNU libc version: 2.36
GNU libc release: stable
[####################] 100 %
Ntrial = 20 ; Min = 21.404 + 0.252 clc/call; Median-Min = 0.295 clc/call; Max = 23.841 clc/call;
[####################] 100 %
Ntrial = 20 ; Min = 13.188 + 0.181 clc/call; Median-Min = 0.036 clc/call; Max = 13.582 clc/call;
[####################] 100 %
Ntrial = 20 ; Min = 22.978 + 0.294 clc/call; Median-Min = 0.304 clc/call; Max = 23.516 clc/call;

and for the latency:

GNU libc version: 2.36
GNU libc release: stable
[####################] 100 %
Ntrial = 20 ; Min = 57.524 + 0.353 clc/call; Median-Min = 0.298 clc/call; Max = 58.199 clc/call;
[####################] 100 %
Ntrial = 20 ; Min = 49.889 + 0.110 clc/call; Median-Min = 0.058 clc/call; Max = 50.229 clc/call;
[####################] 100 %
Ntrial = 20 ; Min = 51.938 + 0.328 clc/call; Median-Min = 0.307 clc/call; Max = 52.466 clc/call;
This revision is now accepted and ready to land.May 23 2023, 7:30 AM
This revision was landed with ongoing or failed builds.May 23 2023, 7:35 AM
This revision was automatically updated to reflect the committed changes.

hey @lntue,

libc/src/math/generic/log_range_reduction.h
62

Can you replace __int128_t with MType? It would fix rv32 compilation.