This is an archive of the discontinued LLVM Phabricator instance.

[libc][math] Implement double precision log2 function correctly rounded to all rounding modes.
ClosedPublic

Authored by lntue on May 11 2023, 8:11 AM.

Details

Summary

Implement double precision log2 function correctly rounded to all
rounding modes.

See https://reviews.llvm.org/D150014 for a more detail description of the algorithm.

Performance

  • For 0.5 <= x <= 2, the fast pass hitting rate is about 99.91%.
  • Reciprocal throughput from CORE-MATH's perf tool on Ryzen 5900X:
$ ./perf.sh log2
GNU libc version: 2.35
GNU libc release: stable

-- CORE-MATH reciprocal throughput -- with FMA
[####################] 100 %
Ntrial = 20 ; Min = 15.458 + 0.204 clc/call; Median-Min = 0.224 clc/call; Max = 15.867 clc/call;

-- CORE-MATH reciprocal throughput -- without FMA (-march=x86-64-v2)
[####################] 100 %
Ntrial = 20 ; Min = 23.711 + 0.524 clc/call; Median-Min = 0.443 clc/call; Max = 25.307 clc/call;

-- System LIBC reciprocal throughput --
[####################] 100 %
Ntrial = 20 ; Min = 14.807 + 0.199 clc/call; Median-Min = 0.211 clc/call; Max = 15.137 clc/call;

-- LIBC reciprocal throughput -- with FMA
[####################] 100 %
Ntrial = 20 ; Min = 17.666 + 0.274 clc/call; Median-Min = 0.298 clc/call; Max = 18.531 clc/call;

-- LIBC reciprocal throughput -- without FMA
[####################] 100 %
Ntrial = 20 ; Min = 26.534 + 0.418 clc/call; Median-Min = 0.462 clc/call; Max = 27.327 clc/call;
  • Latency from CORE-MATH's perf tool on Ryzen 5900X:
$ ./perf.sh log2 --latency
GNU libc version: 2.35
GNU libc release: stable

-- CORE-MATH latency -- with FMA
[####################] 100 %
Ntrial = 20 ; Min = 46.048 + 1.643 clc/call; Median-Min = 1.694 clc/call; Max = 48.018 clc/call;

-- CORE-MATH latency -- without FMA (-march=x86-64-v2)
[####################] 100 %
Ntrial = 20 ; Min = 62.333 + 0.138 clc/call; Median-Min = 0.119 clc/call; Max = 62.583 clc/call;

-- System LIBC latency --
[####################] 100 %
Ntrial = 20 ; Min = 45.206 + 1.503 clc/call; Median-Min = 1.467 clc/call; Max = 47.229 clc/call;

-- LIBC latency -- with FMA
[####################] 100 %
Ntrial = 20 ; Min = 43.042 + 0.454 clc/call; Median-Min = 0.484 clc/call; Max = 43.912 clc/call;

-- LIBC latency -- without FMA
[####################] 100 %
Ntrial = 20 ; Min = 57.016 + 1.636 clc/call; Median-Min = 1.655 clc/call; Max = 58.816 clc/call;
  • Accurate pass latency:
$ ./perf.sh log2 --latency --simple_stat
GNU libc version: 2.35
GNU libc release: stable

-- CORE-MATH latency -- with FMA
177.632

-- CORE-MATH latency -- without FMA (-march=x86-64-v2)
231.332

-- LIBC latency -- with FMA
459.751

-- LIBC latency -- without FMA
463.850

Diff Detail

Event Timeline

lntue created this revision.May 11 2023, 8:11 AM
Herald added projects: Restricted Project, Restricted Project. · View Herald TranscriptMay 11 2023, 8:11 AM
lntue requested review of this revision.May 11 2023, 8:11 AM
lntue edited the summary of this revision. (Show Details)May 11 2023, 8:27 AM
lntue updated this revision to Diff 524105.May 21 2023, 7:54 AM

Sync to HEAD.

lntue updated this revision to Diff 524121.May 21 2023, 11:30 AM

Changed log2 function name in memory_utils.

lntue updated this revision to Diff 524149.May 21 2023, 8:00 PM

Update utils_test.cpp

zimmermann6 requested changes to this revision.May 23 2023, 6:33 AM

I get an error while running ninja:

CMake Error at /localdisk/zimmerma/llvm-project/libc/cmake/modules/LLVMLibCLibraryRules.cmake:5 (get_target_property):
  get_target_property() called with non-existent target
  "libc.src.math.generic.log_range_reduction".
Call Stack (most recent call first):
  /localdisk/zimmerma/llvm-project/libc/cmake/modules/LLVMLibCLibraryRules.cmake:35 (collect_object_file_deps)
  /localdisk/zimmerma/llvm-project/libc/cmake/modules/LLVMLibCLibraryRules.cmake:35 (collect_object_file_deps)
  /localdisk/zimmerma/llvm-project/libc/cmake/modules/LLVMLibCLibraryRules.cmake:82 (collect_object_file_deps)
  /localdisk/zimmerma/llvm-project/libc/lib/CMakeLists.txt:26 (add_entrypoint_library)
This revision now requires changes to proceed.May 23 2023, 6:33 AM
lntue updated this revision to Diff 524708.May 23 2023, 7:36 AM

Sync to HEAD.

zimmermann6 accepted this revision.May 23 2023, 7:44 AM

here is what I get on my AMD EPYC 7282 for the reciprocal throughput:

GNU libc version: 2.36
GNU libc release: stable
[####################] 100 %
Ntrial = 20 ; Min = 21.110 + 0.272 clc/call; Median-Min = 0.280 clc/call; Max = 21.622 clc/call;
[####################] 100 %
Ntrial = 20 ; Min = 21.312 + 0.198 clc/call; Median-Min = 0.085 clc/call; Max = 23.254 clc/call;
[####################] 100 %
Ntrial = 20 ; Min = 31.690 + 0.395 clc/call; Median-Min = 0.345 clc/call; Max = 34.334 clc/call;

and for the latency:

GNU libc version: 2.36
GNU libc release: stable
[####################] 100 %
Ntrial = 20 ; Min = 58.310 + 0.416 clc/call; Median-Min = 0.304 clc/call; Max = 59.411 clc/call;
[####################] 100 %
Ntrial = 20 ; Min = 55.688 + 0.187 clc/call; Median-Min = 0.093 clc/call; Max = 56.306 clc/call;
[####################] 100 %
Ntrial = 20 ; Min = 62.254 + 0.355 clc/call; Median-Min = 0.339 clc/call; Max = 62.856 clc/call;
This revision is now accepted and ready to land.May 23 2023, 7:44 AM
This revision was landed with ongoing or failed builds.May 23 2023, 7:49 AM
This revision was automatically updated to reflect the committed changes.