This is an archive of the discontinued LLVM Phabricator instance.

[libc][math] Implement double precision log1p correctly rounded to all rounding modes.
ClosedPublic

Authored by lntue on May 21 2023, 11:12 AM.

Details

Summary

Implement double precision log1p function correctly rounded to all
rounding modes.

Performance

  • For 0.5 <= x <= 2, the fast pass hitting rate is about 99.93%.
  • Benchmarks with ./perf.sh tool from the CORE-MATH project, unit is (CPU clocks / call).
  • Reciprocal throughput from CORE-MATH's perf tool on Ryzen 5900X:
$ ./perf.sh log1p
GNU libc version: 2.35
GNU libc release: stable

-- CORE-MATH reciprocal throughput -- with FMA
[####################] 100 %
Ntrial = 20 ; Min = 39.792 + 1.011 clc/call; Median-Min = 0.940 clc/call; Max = 41.373 clc/call;

-- CORE-MATH reciprocal throughput -- without FMA (-march=x86-64-v2)
[####################] 100 %
Ntrial = 20 ; Min = 87.285 + 1.135 clc/call; Median-Min = 1.299 clc/call; Max = 89.715 clc/call;

-- System LIBC reciprocal throughput --
[####################] 100 %
Ntrial = 20 ; Min = 20.666 + 0.123 clc/call; Median-Min = 0.125 clc/call; Max = 20.828 clc/call;

-- LIBC reciprocal throughput -- with FMA
[####################] 100 %
Ntrial = 20 ; Min = 20.928 + 0.771 clc/call; Median-Min = 0.725 clc/call; Max = 22.767 clc/call;

-- LIBC reciprocal throughput -- without FMA
[####################] 100 %
Ntrial = 20 ; Min = 31.461 + 0.528 clc/call; Median-Min = 0.602 clc/call; Max = 36.809 clc/call;
  • Latency from CORE-MATH's perf tool on Ryzen 5900X:
$ ./perf.sh log1p --latency
GNU libc version: 2.35
GNU libc release: stable

-- CORE-MATH latency -- with FMA
[####################] 100 %
Ntrial = 20 ; Min = 77.875 + 0.062 clc/call; Median-Min = 0.051 clc/call; Max = 78.003 clc/call;

-- CORE-MATH latency -- without FMA (-march=x86-64-v2)
[####################] 100 %
Ntrial = 20 ; Min = 101.958 + 1.202 clc/call; Median-Min = 1.325 clc/call; Max = 104.452 clc/call;

-- System LIBC latency --
[####################] 100 %
Ntrial = 20 ; Min = 60.581 + 1.443 clc/call; Median-Min = 1.611 clc/call; Max = 62.285 clc/call;

-- LIBC latency -- with FMA
[####################] 100 %
Ntrial = 20 ; Min = 48.817 + 1.108 clc/call; Median-Min = 1.300 clc/call; Max = 50.282 clc/call;

-- LIBC latency -- without FMA
[####################] 100 %
Ntrial = 20 ; Min = 61.121 + 0.599 clc/call; Median-Min = 0.761 clc/call; Max = 62.020 clc/call;
  • Accurate pass latency:
$ ./perf.sh log1p --latency --simple_stat
GNU libc version: 2.35
GNU libc release: stable

-- CORE-MATH latency -- with FMA
760.444

-- CORE-MATH latency -- without FMA (-march=x86-64-v2)
827.880

-- LIBC latency -- with FMA
711.837

-- LIBC latency -- without FMA
764.317

Diff Detail

Event Timeline

lntue created this revision.May 21 2023, 11:12 AM
Herald added projects: Restricted Project, Restricted Project. · View Herald TranscriptMay 21 2023, 11:12 AM
lntue requested review of this revision.May 21 2023, 11:12 AM
lntue edited the summary of this revision. (Show Details)May 21 2023, 11:44 AM
zimmermann6 requested changes to this revision.May 23 2023, 6:35 AM

I get the same error as for log2:

CMake Error at /localdisk/zimmerma/llvm-project/libc/cmake/modules/LLVMLibCLibraryRules.cmake:5 (get_target_property):
  get_target_property() called with non-existent target
  "libc.src.math.generic.log_range_reduction".
Call Stack (most recent call first):
  /localdisk/zimmerma/llvm-project/libc/cmake/modules/LLVMLibCLibraryRules.cmake:35 (collect_object_file_deps)
  /localdisk/zimmerma/llvm-project/libc/cmake/modules/LLVMLibCLibraryRules.cmake:35 (collect_object_file_deps)
  /localdisk/zimmerma/llvm-project/libc/cmake/modules/LLVMLibCLibraryRules.cmake:82 (collect_object_file_deps)
  /localdisk/zimmerma/llvm-project/libc/lib/CMakeLists.txt:26 (add_entrypoint_library)
This revision now requires changes to proceed.May 23 2023, 6:35 AM
lntue updated this revision to Diff 524717.May 23 2023, 7:50 AM

Sync to HEAD.

zimmermann6 accepted this revision.May 23 2023, 8:01 AM

this function is faster than core-math, even for the reciprocal throughput, great work!

GNU libc version: 2.36
GNU libc release: stable
[####################] 100 %
Ntrial = 20 ; Min = 49.596 + 0.352 clc/call; Median-Min = 0.315 clc/call; Max = 50.216 clc/call;
[####################] 100 %
Ntrial = 20 ; Min = 26.743 + 0.357 clc/call; Median-Min = 0.333 clc/call; Max = 29.183 clc/call;
[####################] 100 %
Ntrial = 20 ; Min = 38.887 + 0.296 clc/call; Median-Min = 0.249 clc/call; Max = 41.140 clc/call;
GNU libc version: 2.36
GNU libc release: stable
[####################] 100 %
Ntrial = 20 ; Min = 94.356 + 0.365 clc/call; Median-Min = 0.288 clc/call; Max = 95.270 clc/call;
[####################] 100 %
Ntrial = 20 ; Min = 69.370 + 0.356 clc/call; Median-Min = 0.262 clc/call; Max = 70.010 clc/call;
[####################] 100 %
Ntrial = 20 ; Min = 67.833 + 0.346 clc/call; Median-Min = 0.292 clc/call; Max = 68.525 clc/call;
This revision is now accepted and ready to land.May 23 2023, 8:01 AM
This revision was landed with ongoing or failed builds.May 23 2023, 8:04 AM
This revision was automatically updated to reflect the committed changes.