User Details
- User Since
- Nov 30 2021, 1:16 AM (77 w, 6 d)
Tue, May 23
this function is faster than core-math, even for the reciprocal throughput, great work!
GNU libc version: 2.36 GNU libc release: stable [####################] 100 % Ntrial = 20 ; Min = 49.596 + 0.352 clc/call; Median-Min = 0.315 clc/call; Max = 50.216 clc/call; [####################] 100 % Ntrial = 20 ; Min = 26.743 + 0.357 clc/call; Median-Min = 0.333 clc/call; Max = 29.183 clc/call; [####################] 100 % Ntrial = 20 ; Min = 38.887 + 0.296 clc/call; Median-Min = 0.249 clc/call; Max = 41.140 clc/call; GNU libc version: 2.36 GNU libc release: stable [####################] 100 % Ntrial = 20 ; Min = 94.356 + 0.365 clc/call; Median-Min = 0.288 clc/call; Max = 95.270 clc/call; [####################] 100 % Ntrial = 20 ; Min = 69.370 + 0.356 clc/call; Median-Min = 0.262 clc/call; Max = 70.010 clc/call; [####################] 100 % Ntrial = 20 ; Min = 67.833 + 0.346 clc/call; Median-Min = 0.292 clc/call; Max = 68.525 clc/call;
here is what I get on my AMD EPYC 7282 for the reciprocal throughput:
GNU libc version: 2.36 GNU libc release: stable [####################] 100 % Ntrial = 20 ; Min = 21.110 + 0.272 clc/call; Median-Min = 0.280 clc/call; Max = 21.622 clc/call; [####################] 100 % Ntrial = 20 ; Min = 21.312 + 0.198 clc/call; Median-Min = 0.085 clc/call; Max = 23.254 clc/call; [####################] 100 % Ntrial = 20 ; Min = 31.690 + 0.395 clc/call; Median-Min = 0.345 clc/call; Max = 34.334 clc/call;
and for the latency:
GNU libc version: 2.36 GNU libc release: stable [####################] 100 % Ntrial = 20 ; Min = 58.310 + 0.416 clc/call; Median-Min = 0.304 clc/call; Max = 59.411 clc/call; [####################] 100 % Ntrial = 20 ; Min = 55.688 + 0.187 clc/call; Median-Min = 0.093 clc/call; Max = 56.306 clc/call; [####################] 100 % Ntrial = 20 ; Min = 62.254 + 0.355 clc/call; Median-Min = 0.339 clc/call; Max = 62.856 clc/call;
it works now, thanks. I get for the reciprocal throughput on a AMD EPYC 7282:
GNU libc version: 2.36 GNU libc release: stable [####################] 100 % Ntrial = 20 ; Min = 21.404 + 0.252 clc/call; Median-Min = 0.295 clc/call; Max = 23.841 clc/call; [####################] 100 % Ntrial = 20 ; Min = 13.188 + 0.181 clc/call; Median-Min = 0.036 clc/call; Max = 13.582 clc/call; [####################] 100 % Ntrial = 20 ; Min = 22.978 + 0.294 clc/call; Median-Min = 0.304 clc/call; Max = 23.516 clc/call;
and for the latency:
GNU libc version: 2.36 GNU libc release: stable [####################] 100 % Ntrial = 20 ; Min = 57.524 + 0.353 clc/call; Median-Min = 0.298 clc/call; Max = 58.199 clc/call; [####################] 100 % Ntrial = 20 ; Min = 49.889 + 0.110 clc/call; Median-Min = 0.058 clc/call; Max = 50.229 clc/call; [####################] 100 % Ntrial = 20 ; Min = 51.938 + 0.328 clc/call; Median-Min = 0.307 clc/call; Max = 52.466 clc/call;
I get the same error as for log2:
CMake Error at /localdisk/zimmerma/llvm-project/libc/cmake/modules/LLVMLibCLibraryRules.cmake:5 (get_target_property): get_target_property() called with non-existent target "libc.src.math.generic.log_range_reduction". Call Stack (most recent call first): /localdisk/zimmerma/llvm-project/libc/cmake/modules/LLVMLibCLibraryRules.cmake:35 (collect_object_file_deps) /localdisk/zimmerma/llvm-project/libc/cmake/modules/LLVMLibCLibraryRules.cmake:35 (collect_object_file_deps) /localdisk/zimmerma/llvm-project/libc/cmake/modules/LLVMLibCLibraryRules.cmake:82 (collect_object_file_deps) /localdisk/zimmerma/llvm-project/libc/lib/CMakeLists.txt:26 (add_entrypoint_library)
I get an error while running ninja:
CMake Error at /localdisk/zimmerma/llvm-project/libc/cmake/modules/LLVMLibCLibraryRules.cmake:5 (get_target_property): get_target_property() called with non-existent target "libc.src.math.generic.log_range_reduction". Call Stack (most recent call first): /localdisk/zimmerma/llvm-project/libc/cmake/modules/LLVMLibCLibraryRules.cmake:35 (collect_object_file_deps) /localdisk/zimmerma/llvm-project/libc/cmake/modules/LLVMLibCLibraryRules.cmake:35 (collect_object_file_deps) /localdisk/zimmerma/llvm-project/libc/cmake/modules/LLVMLibCLibraryRules.cmake:82 (collect_object_file_deps) /localdisk/zimmerma/llvm-project/libc/lib/CMakeLists.txt:26 (add_entrypoint_library)
I get failures when I try to apply this patch to head (revision 7489301):
patching file libc/config/darwin/arm/entrypoints.txt patching file libc/config/linux/aarch64/entrypoints.txt patching file libc/config/linux/x86_64/entrypoints.txt patching file libc/config/windows/entrypoints.txt patching file libc/spec/stdc.td patching file libc/src/math/CMakeLists.txt patching file libc/src/math/generic/CMakeLists.txt Hunk #1 succeeded at 784 with fuzz 2 (offset 16 lines). Hunk #2 FAILED at 785. Hunk #3 succeeded at 846 (offset -3 lines). 1 out of 3 hunks FAILED -- saving rejects to file libc/src/math/generic/CMakeLists.txt.rej patching file libc/src/math/generic/common_constants.h Hunk #1 FAILED at 39. 1 out of 1 hunk FAILED -- saving rejects to file libc/src/math/generic/common_constants.h.rej patching file libc/src/math/generic/common_constants.cpp Hunk #1 succeeded at 196 with fuzz 2 (offset -169 lines). patching file libc/src/math/generic/log.cpp patching file libc/src/math/generic/log10.cpp Hunk #1 FAILED at 17. Hunk #2 FAILED at 41. Hunk #3 FAILED at 785. Hunk #4 FAILED at 944. 4 out of 4 hunks FAILED -- saving rejects to file libc/src/math/generic/log10.cpp.rej patching file libc/src/math/generic/log_range_reduction.h patching file libc/src/math/log.h patching file libc/test/src/math/CMakeLists.txt patching file libc/test/src/math/log_test.cpp
all worst cases from core-math do pass. However I get a larger reciprocal throughput on a AMD EPYC 7282:
GNU libc version: 2.36 GNU libc release: stable [####################] 100 % Ntrial = 20 ; Min = 21.270 + 0.322 clc/call; Median-Min = 0.302 clc/call; Max = 23.839 clc/call; [####################] 100 % Ntrial = 20 ; Min = 26.172 + 0.346 clc/call; Median-Min = 0.310 clc/call; Max = 26.798 clc/call; [####################] 100 % Ntrial = 20 ; Min = 49.688 + 0.306 clc/call; Median-Min = 0.308 clc/call; Max = 50.288 clc/call;
For the latency I get similar values:
GNU libc version: 2.36 GNU libc release: stable [####################] 100 % Ntrial = 20 ; Min = 65.915 + 0.316 clc/call; Median-Min = 0.282 clc/call; Max = 66.518 clc/call; [####################] 100 % Ntrial = 20 ; Min = 73.612 + 0.389 clc/call; Median-Min = 0.326 clc/call; Max = 74.422 clc/call; [####################] 100 % Ntrial = 20 ; Min = 66.387 + 0.276 clc/call; Median-Min = 0.303 clc/call; Max = 67.140 clc/call;
Apr 10 2023
all test are ok now. Reciprocal throughput:
GNU libc version: 2.36 GNU libc release: stable [####################] 100 % Ntrial = 20 ; Min = 9.649 + 0.454 clc/call; Median-Min = 0.288 clc/call; Max = 11.526 clc/call; [####################] 100 % Ntrial = 20 ; Min = 7.224 + 0.310 clc/call; Median-Min = 0.320 clc/call; Max = 8.306 clc/call; [####################] 100 % Ntrial = 20 ; Min = 10.579 + 0.162 clc/call; Median-Min = 0.024 clc/call; Max = 11.528 clc/call;
Latency:
GNU libc version: 2.36 GNU libc release: stable [####################] 100 % Ntrial = 20 ; Min = 42.395 + 0.340 clc/call; Median-Min = 0.322 clc/call; Max = 42.983 clc/call; [####################] 100 % Ntrial = 20 ; Min = 38.033 + 0.434 clc/call; Median-Min = 0.304 clc/call; Max = 39.897 clc/call; [####################] 100 % Ntrial = 20 ; Min = 41.699 + 0.343 clc/call; Median-Min = 0.327 clc/call; Max = 42.248 clc/call;
Apr 7 2023
thanks all tests do pass now. For the reciprocal throughput I get:
zimmerma@biscotte:~/svn/core-math$ LIBM=/localdisk/zimmerma/llvm-project/build/projects/libc/lib/libllvmlibc.a ./perf.sh logf GNU libc version: 2.36 GNU libc release: stable [####################] 100 % Ntrial = 20 ; Min = 10.839 + 0.378 clc/call; Median-Min = 0.304 clc/call; Max = 13.593 clc/call; [####################] 100 % Ntrial = 20 ; Min = 7.240 + 0.351 clc/call; Median-Min = 0.307 clc/call; Max = 9.576 clc/call; [####################] 100 % Ntrial = 20 ; Min = 18.822 + 0.339 clc/call; Median-Min = 0.314 clc/call; Max = 19.396 clc/call;
and for the latency:
zimmerma@biscotte:~/svn/core-math$ PERF_ARGS=--latency LIBM=/localdisk/zimmerma/llvm-project/build/projects/libc/lib/libllvmlibc.a ./perf.sh logf GNU libc version: 2.36 GNU libc release: stable [####################] 100 % Ntrial = 20 ; Min = 46.968 + 0.321 clc/call; Median-Min = 0.301 clc/call; Max = 47.649 clc/call; [####################] 100 % Ntrial = 20 ; Min = 38.243 + 0.328 clc/call; Median-Min = 0.318 clc/call; Max = 38.928 clc/call; [####################] 100 % Ntrial = 20 ; Min = 54.652 + 0.404 clc/call; Median-Min = 0.329 clc/call; Max = 55.396 clc/call;
I get an error at compilation:
/localdisk/zimmerma/llvm-project/libc/src/math/generic/log2f.cpp:149:48: error: use of undeclared identifier 'R' static_cast<double>(R[index]), -1.0); // Exact ^
the patch fails to apply to main (revision 10cff75):
$ patch -p1 -i /tmp/D147755.diff patching file libc/src/math/generic/common_constants.h Hunk #1 succeeded at 17 with fuzz 2 (offset -3 lines). patching file libc/src/math/generic/common_constants.cpp Hunk #1 FAILED at 109. Hunk #2 succeeded at 102 with fuzz 2 (offset -28 lines). 1 out of 2 hunks FAILED -- saving rejects to file libc/src/math/generic/common_constants.cpp.rej patching file libc/src/math/generic/logf.cpp patching file libc/test/src/math/logf_test.cpp
Apr 6 2023
the failure is fixed now, and the performance is slightly better than the 'main' branch on my machine:
zimmerma@biscotte:~/svn/core-math$ LIBM=/localdisk/zimmerma/llvm-project/build/projects/libc/lib/libllvmlibc.a ./perf.sh log10f GNU libc version: 2.36 GNU libc release: stable [####################] 100 % Ntrial = 20 ; Min = 10.523 + 0.392 clc/call; Median-Min = 0.332 clc/call; Max = 11.603 clc/call; [####################] 100 % Ntrial = 20 ; Min = 18.521 + 0.418 clc/call; Median-Min = 0.330 clc/call; Max = 19.653 clc/call; [####################] 100 % Ntrial = 20 ; Min = 12.929 + 0.626 clc/call; Median-Min = 0.297 clc/call; Max = 15.292 clc/call;
I get a failure for rounding down:
zimmerma@biscotte:~/svn/core-math$ CORE_MATH_CHECK_STD=true LIBM=/localdisk/zimmerma/llvm-project/build/projects/libc/lib/libllvmlibc.a ./check.sh log10f Running exhaustive check in --rndn mode... all ok Running exhaustive check in --rndz mode... all ok Running exhaustive check in --rndu mode... all ok Running exhaustive check in --rndd mode... FAIL x=0x1p+0 ref=0x0p+0 y=-0x0p+0
Also, on an AMD EPYC 7282 I get a regression in speed. With master:
zimmerma@biscotte:~/svn/core-math$ LIBM=/localdisk/zimmerma/llvm-project/build/projects/libc/lib/libllvmlibc.a ./perf.sh log10f GNU libc version: 2.36 GNU libc release: stable [####################] 100 % Ntrial = 20 ; Min = 10.531 + 0.273 clc/call; Median-Min = 0.281 clc/call; Max = 13.047 clc/call; [####################] 100 % Ntrial = 20 ; Min = 18.529 + 0.342 clc/call; Median-Min = 0.309 clc/call; Max = 19.811 clc/call; [####################] 100 % Ntrial = 20 ; Min = 13.059 + 0.526 clc/call; Median-Min = 0.290 clc/call; Max = 15.586 clc/call;
With this patch:
zimmerma@biscotte:~/svn/core-math$ LIBM=/localdisk/zimmerma/llvm-project/build/projects/libc/lib/libllvmlibc.a ./perf.sh log10f GNU libc version: 2.36 GNU libc release: stable [####################] 100 % Ntrial = 20 ; Min = 10.534 + 0.297 clc/call; Median-Min = 0.303 clc/call; Max = 11.415 clc/call; [####################] 100 % Ntrial = 20 ; Min = 18.529 + 0.561 clc/call; Median-Min = 0.327 clc/call; Max = 20.729 clc/call; [####################] 100 % Ntrial = 20 ; Min = 19.791 + 0.313 clc/call; Median-Min = 0.338 clc/call; Max = 22.809 clc/call;
Jan 28 2023
all exhaustive tests do pass. The performance is a little worse than CORE-MATH and glibc:
zimmerma@biscotte:~/svn/core-math$ LIBM=/localdisk/zimmerma/llvm-project/build/projects/libc/lib/libllvmlibc.a ./perf.sh acoshf GNU libc version: 2.36 GNU libc release: stable [####################] 100 % Ntrial = 20 ; Min = 17.383 + 0.168 clc/call; Median-Min = 0.004 clc/call; Max = 17.769 clc/call; [####################] 100 % Ntrial = 20 ; Min = 21.634 + 0.150 clc/call; Median-Min = 0.059 clc/call; Max = 22.014 clc/call; [####################] 100 % Ntrial = 20 ; Min = 25.034 + 0.320 clc/call; Median-Min = 0.305 clc/call; Max = 25.602 clc/call;
I cannot apply to main (revision f7c1982), the patch fails.
Jan 27 2023
I confirm all exhaustive tests do pass, and the timings are similar to CORE-MATH:
zimmerma@biscotte:~/svn/core-math$ LIBM=/localdisk/zimmerma/llvm-project/build/projects/libc/lib/libllvmlibc.a ./perf.sh asinhf GNU libc version: 2.36 GNU libc release: stable [####################] 100 % Ntrial = 20 ; Min = 24.687 + 0.365 clc/call; Median-Min = 0.296 clc/call; Max = 26.051 clc/call; [####################] 100 % Ntrial = 20 ; Min = 37.687 + 0.295 clc/call; Median-Min = 0.287 clc/call; Max = 39.359 clc/call; [####################] 100 % Ntrial = 20 ; Min = 25.939 + 0.320 clc/call; Median-Min = 0.292 clc/call; Max = 26.596 clc/call;
Dec 14 2022
I tried on 2096978 hard-to-round cases I have generated and all tests pass, with all four rounding modes. Great work!
zimmerma@biscotte:/tmp/core-math$ LIBM=/localdisk/zimmerma/llvm-project/build/projects/libc/lib/libllvmlibc.a ./check.sh --worst log10 Running worst cases check in --rndn mode... 2096978 tests passed, 0 failure(s) Running worst cases check in --rndz mode... 2096978 tests passed, 0 failure(s) Running worst cases check in --rndu mode... 2096978 tests passed, 0 failure(s) Running worst cases check in --rndd mode... 2096978 tests passed, 0 failure(s)
Sep 26 2022
I get similar timings:
zimmerma@biscotte:~/svn/core-math$ LIBM=/localdisk/zimmerma/llvm-project/build/projects/libc/lib/libllvmlibc.a CORE_MATH_PERF_MODE=rdtsc ./perf.sh tanf GNU libc version: 2.34 GNU libc release: stable 14.464 50.813 14.254 zimmerma@biscotte:~/svn/core-math$ PERF_ARGS=--latency LIBM=/localdisk/zimmerma/llvm-project/build/projects/libc/lib/libllvmlibc.a CORE_MATH_PERF_MODE=rdtsc ./perf.sh tanf GNU libc version: 2.34 GNU libc release: stable 54.500 110.220 59.640
Good work!
Sep 21 2022
sorry for the delay. It seems this does not compile properly with current main:
/localdisk/zimmerma/llvm-project/libc/src/math/generic/expm1f.cpp:11:10: fatal error: 'expxf.h' file not found #include "expxf.h" ^~~~~~~~~
Sep 19 2022
thank you, the new version is fine. Great work!
I confirm the function is still correctly rounded, and the timings improved.
I got one reject when applying this patch to main (revision 458598c):
patching file libc/src/math/generic/explogxf.h Hunk #1 FAILED at 51. Hunk #2 succeeded at 41 with fuzz 2 (offset -29 lines). 1 out of 2 hunks FAILED -- saving rejects to file libc/src/math/generic/explogxf.h.rej
Sep 16 2022
does this patch need to be applied on another one? Or rebased? It does not apply cleanly to main (71e52a1), unless I did something wrong.
Sep 15 2022
I confirm the improvement, and the function is still correctly rounded.
this is very clever! I confirm the speed improvement (and still correct rounding by exhaustive search).
Sep 9 2022
great work! The reciprocal throughput is indeed slightly better than CORE-MATH, and the latency slightly worse:
# reciprocal throughput GNU libc version: 2.34 GNU libc release: stable 33.819 37.064 29.462 # latency GNU libc version: 2.34 GNU libc release: stable 54.951 80.046 62.001
Sep 7 2022
ok for me. I have added more checks near the underflow and overflow boundaries in CORE-MATH check.sh, and it passes all tests.
it works now, thanks. I confirm it is correctly rounded. I get similar figures on a AMD EPYC 7282 (glibc, core-math, llvm-libc):
# reciprocal throughput GNU libc version: 2.34 GNU libc release: stable 26.722 31.752 27.841 # latency GNU libc version: 2.34 GNU libc release: stable 56.310 64.140 61.051
Sep 6 2022
the patch fails for me on top of main (revision ea953b9). Is there any other patch to apply first?
Aug 29 2022
ok for me, I get slightly different figures on a AMD EPYC 7282:
zimmerma@biscotte:~/svn/core-math$ LIBM=/localdisk/zimmerma/llvm-project/build/projects/libc/lib/libllvmlibc.a CORE_MATH_PERF_MODE=rdtsc ./perf.sh atanf GNU libc version: 2.34 GNU libc release: stable 17.539 31.797 26.903
ok for me. I get slightly better figures for llvm-libc:
zimmerma@biscotte:~/svn/core-math$ LIBM=/localdisk/zimmerma/llvm-project/build/projects/libc/lib/libllvmlibc.a CORE_MATH_PERF_MODE=rdtsc ./perf.sh atanhf GNU libc version: 2.34 GNU libc release: stable 23.547 70.432 20.065
This is on a AMD EPYC 7282.
Aug 22 2022
are the performance numbers for sinf, for cosf, or for random calls?
Jul 29 2022
I get slightly different figures on my machine:
zimmerma@biscotte:~/svn/core-math$ LIBM=/localdisk/zimmerma/llvm-project/build/projects/libc/lib/libllvmlibc.a CORE_MATH_PERF_MODE=rdtsc ./perf.sh coshf GNU libc version: 2.33 GNU libc release: release 17.730 19.322 22.815 zimmerma@biscotte:~/svn/core-math$ LIBM=/localdisk/zimmerma/llvm-project/build/projects/libc/lib/libllvmlibc.a CORE_MATH_PERF_MODE=rdtsc PERF_ARGS=--latency ./perf.sh coshf GNU libc version: 2.33 GNU libc release: release 49.478 48.614 75.194
Jul 28 2022
this looks all good to me:
GNU libc version: 2.33 GNU libc release: release 17.271 25.064 13.555 GNU libc version: 2.33 GNU libc release: release 48.048 58.428 54.403
The first figures are for reciprocal throughput (core-math, glibc, llvm-libc), the second ones are for the latency. Great work!
I confirm the reciprocal throughput decreased from 18 to 10 cycles (on the same machine as above):
zimmerma@biscotte:~/svn/core-math$ LIBM=/localdisk/zimmerma/llvm-project/build/projects/libc/lib/libllvmlibc.a CORE_MATH_PERF_MODE=rdtsc ./perf.sh exp2f GNU libc version: 2.33 GNU libc release: release 9.728 7.085 10.040 zimmerma@biscotte:~/svn/core-math$ LIBM=/localdisk/zimmerma/llvm-project/build/projects/libc/lib/libllvmlibc.a CORE_MATH_PERF_MODE=rdtsc PERF_ARGS=--latency ./perf.sh exp2f GNU libc version: 2.33 GNU libc release: release 37.106 29.520 48.515
Good work Kirill!
Jul 27 2022
here are the timings I get:
zimmerma@biscotte:~/svn/core-math$ !273 LIBM=/localdisk/zimmerma/llvm-project/build/projects/libc/lib/libllvmlibc.a CORE_MATH_PERF_MODE=rdtsc ./perf.sh sinf GNU libc version: 2.33 GNU libc release: release 16.784 23.823 14.114 zimmerma@biscotte:~/svn/core-math$ LIBM=/localdisk/zimmerma/llvm-project/build/projects/libc/lib/libllvmlibc.a CORE_MATH_PERF_MODE=rdtsc PERF_ARGS=--latency ./perf.sh sinf GNU libc version: 2.33 GNU libc release: release 47.889 57.795 52.998
Jul 26 2022
I confirm that I get similar timings. Nice work!
I confirm it is still correctly rounded, and now faster than CORE-MATH. Nice work!
Jul 18 2022
I confirm the latest version is correctly rounded for all rounding modes on the machine I tried (AMD EPYC 7282 with gcc 10.2.1 and clang 11.0.1-2).
For the reciprocal throughput I get:
zimmerma@biscotte:/tmp/core-math$ LIBM=/localdisk/zimmerma/llvm-project/build/projects/libc/lib/libllvmlibc.a CORE_MATH_LAUNCHER="/localdisk/zimmerma/glibc-2.35/install/lib/ld-linux-x86-64.so.2 --library-path /localdisk/zimmerma/glibc-2.35/install/lib" CORE_MATH_PERF_MODE=rdtsc ./perf.sh sinf GNU libc version: 2.35 GNU libc release: stable 16.705 23.636 13.989
i.e., 16.7 cycles for core-math, 23.6 cycles for glibc 2.35, and 14.0 cycles for llvm-libc. Good work!
For the latency the figures are worse than glibc:
zimmerma@biscotte:/tmp/core-math$ LIBM=/localdisk/zimmerma/llvm-project/build/projects/libc/lib/libllvmlibc.a CORE_MATH_LAUNCHER="/localdisk/zimmerma/glibc-2.35/install/lib/ld-linux-x86-64.so.2 --library-path /localdisk/zimmerma/glibc-2.35/install/lib" CORE_MATH_PERF_MODE=rdtsc PERF_ARGS=--latency ./perf.sh sinf GNU libc version: 2.35 GNU libc release: stable 47.926 57.338 62.912
Jul 12 2022
I couldn't build this patch on top of 'main' (revision 81af344):
/localdisk/zimmerma/llvm-project/libc/src/math/generic/coshf.cpp:11:10: fatal error: 'src/math/generic/expxf.h' file not found #include "src/math/generic/expxf.h" ^~~~~~~~~~~~~~~~~~~~~~~~~~
Is there any dependency?
I confirm the new function is correctly rounded (for all rounding modes). For what concerns efficiency, here is what I get on a AMD EPYC 7282 with gcc 10.2.1 and clang 11.0.1-2:
zimmerma@biscotte:~/svn/core-math$ LIBM=/localdisk/zimmerma/llvm-project/build/projects/libc/lib/libllvmlibc.a CORE_MATH_PERF_MODE=rdtsc ./perf.sh exp2f GNU libc version: 2.31 GNU libc release: stable 9.720 6.228 18.081
Jul 11 2022
I confirm the new version is correctly rounded (on the machine I tried it). However I find different figures for the number of cycles:
zimmerma@biscotte:~/svn/core-math$ LIBM=/localdisk/zimmerma/llvm-project/build/projects/libc/lib/libllvmlibc.a CORE_MATH_PERF_MODE=rdtsc ./perf.sh sinf GNU libc version: 2.31 GNU libc release: stable 16.781 23.443 32.737
This is on a AMD EPYC 7282 with gcc version 10.2.1 and clang 11.0.1-2 (I guess llvm-libc is compiled with clang). This gives 17 cycles for the core-math routine, and 33 cycles for the llvm-libc one.
May 2 2022
I'm sorry, I not fluent enough in C++ to review this patch
Apr 6 2022
all tests pass now, and I get the following figures (first CORE-MATH, 2nd GNU libc, 2rd LLVM libc):
$ LIBM=/users/zimmerma/svn/core-math/libllvmlibc.a ./perf.sh sinf 38.997 26.503 33.990
I get an error for rounding up:
Using llvm-libc MPFR library: 4.1.0 MPFR header: 4.1.0 (based on 4.1.0) Checking function sinf with MPFR_RNDU libm wrong by up to 3.40e-11 ulp(s) [1] for x=-0x1.47d0fep+34 sin gives -0x1p+0 mpfr_sin gives -0x1.fffffep-1 Total: errors=1 (0.00%) errors2=0 maxerr=3.40e-11 ulp(s)
Mar 29 2022
all values are still correctly rounded, and I confirm the speed improvement:
zimmerma@tomate:/tmp/core-math$ LIBM=/users/zimmerma/svn/core-math/libllvmlibc.a ./perf.sh expm1f # previous code 22.692 54.039 53.218 zimmerma@tomate:/tmp/core-math$ LIBM=/tmp/libllvmlibc.a ./perf.sh expm1f # new code 22.698 54.037 17.240
The llvm-libm results are the 3rd ones (first CORE-MATH, second GNU libc).
Mar 25 2022
ok for me, all exhaustive tests pass, and the performance increased a lot:
zimmerma@tomate:~/svn/core-math$ LIBM=/users/zimmerma/svn/core-math/libllvmlibc.a ./perf.sh expf 21.594 10.968 51.286 zimmerma@tomate:~/svn/core-math$ LIBM=/tmp/libllvmlibc.a ./perf.sh expf 21.596 10.966 16.997
The first run is with the previous version, the second one with the new version. The last timing is the one for llvm-libc, the first one for core-math, and the 2nd one for the GNU libc (not CR).
Mar 24 2022
I confirm the results are still correctly rounded for all four rounding modes. For rounding to nearest the reciprocal throughput/latency decreased from 56/103.5 cycles to 26.6/76.0 cycles on a Core i5-4590.
As a comparison, the core-math code runs in 21.2/63.2 cycles.
Mar 15 2022
ok for me too, I confirm all exhaustive searchs do pass. Great!
Do you have any performance numbers comparing before and after?
Mar 14 2022
ok for me too, all exhaustive tests do pass!
Mar 11 2022
my exhaustive search confirms it is correctly rounded for all four rounding modes, great!
Feb 4 2022
apart the compiler warnings, all exhaustive tests comparing to MPFR do pass, for all four rounding modes. Good work!
I get several warnings when I compile this version:
In file included from /localdisk/zimmerma/llvm-project/libc/src/stdlib/strtold.cpp:11: In file included from /localdisk/zimmerma/llvm-project/libc/src/__support/str_to_float.h:16: /localdisk/zimmerma/llvm-project/libc/src/__support/high_precision_decimal.h:115:42: warning: comparison of integers of different signs: 'int32_t' (aka 'int') and 'uint32_t' (aka 'unsigned int') [-Wsign-compare] if (roundToDigit < 0 || roundToDigit >= this->num_digits) { ~~~~~~~~~~~~ ^ ~~~~~~~~~~~~~~~~ /localdisk/zimmerma/llvm-project/libc/src/__support/high_precision_decimal.h:121:26: warning: comparison of integers of different signs: 'int' and 'uint32_t' (aka 'unsigned int') [-Wsign-compare] roundToDigit + 1 == this->num_digits) { ~~~~~~~~~~~~~~~~ ^ ~~~~~~~~~~~~~~~~
Jan 31 2022
the version of last Friday is fine for me: I did run exhaustive tests for 2^23 <= y < 2^24, and 2^(23+k) <= x < 2^(24+k) for 0 <= k <= 13.
However since it changed in the meantime, I don't have resources any more to review the new version.
Jan 28 2022
I'm still running semi-exhaustive tests, it takes some time. I wonder whether a full exhaustive test is possible, by comparing the LLVM implementation with the code from Alexei at https://core-math.gitlabpages.inria.fr/. On a 64-core machine (Intel Xeon Gold 6130 @ 2.10GHz), it takes 4.6s to check 2^33 pairs (x,y). If one tests only positive x,y and x>=y, as exhaustive comparison would have to check 2^61 pairs for each rounding mode, which would take less than 1.5 month using 10000 such machines. This would not be a proof, but the probability that both codes are wrong for the same inputs and give exactly the same wrong answer is quite small.
Jan 26 2022
I get some errors for rounding to nearest:
Difference for 0x1.faf49ep+25,0x1.480002p+23 llvm_hypot: 0x1.00c5bp+26 as_hypot: 0x1.00c5b2p+26 pz_hypot: 0x1.00c5b2p+26
This is the error messages that we got on aarch64-ubuntu: Buildbot log https://lab.llvm.org/buildbot/#/builders/138/builds/16983/steps/4/logs/stdio
Jan 25 2022
all exhaustive tests do pass, with all four rounding modes. Maybe put in comment the corresponding input values for the exceptional cases ?
Dear Tue,
this revision passes all exhaustive tests, for the four rounding modes, great work!
a performance graph is available at https://core-math.gitlabpages.inria.fr/graph_perf_hypotf.pdf
Jan 20 2022
attached is a file with 1200 binary32 exact cases with ulp(x)=2^12*ulp(y), x^2+y^2=z^2 having up to 72 bits. You might add them to your test cases.
I'm ok with the new revision. However I see there are still some calls to get_round(). Did you try to replace them by floating-point operations?
I got similar results with binary64:
Checking hypot with llvm-project and rndu Using seed 1078001 NEW hypot 0 -1 0x1.ccbbbcfef3c02p-523,0x1.924bf639c1a94p+500 [1.00] 1 1 libm gives 0x1.924bf639c1a94p+500 mpfr gives 0x1.924bf639c1a95p+500
after fixing my stress program I was able to find one value which does not seem to be correctly rounded (for binary32 and rounding up):
zimmerma@biscotte:~/svn/tbd/20/src/binary32$ CFLAGS=-DCHECK_CR LLVM=llvm-project VERBOSE=-v RND=rndu ./doitb.llvm hypot 1000 Checking hypot with llvm-project and rndu Using seed 1076573 NEW hypot 0 -1 0x1.ffffecp-1,-0x1.000002p+27 [1.00] 1 1 libm gives 0x1.000002p+27 mpfr gives 0x1.000004p+27
Please can you confirm?
Jan 19 2022
the stress tests were successful (for all four rounding modes, both in single and double precision).
Thus I am ok with this version, thanks!
I still get warnings with the latest revision:
/localdisk/zimmerma/llvm-project/libc/src/__support/FPUtil/Hypot.h:149:22: warning: hexadecimal floating literals are a C++17 feature [-Wc++17-extensions] if ((y != 0) && (0x1p0f + 0x1p-24f != 0x1p0f)) { ^
Dear Tue,
Dear Tue,
Jan 14 2022
Dear Tue,
Dec 23 2021
Dear Santosh,
Dear Santosh,
Dear Santosh,
Dec 21 2021
Dear Santosh,
Dear Santosh,
Dec 20 2021
Dear Santosh,
Dec 17 2021
the new version applies cleanly to the main branch. I have tested it on x86_64 under Linux (haswell). I confirm it is CR for rounding to nearest, and I get 3 failures if I disable the 3 exceptional cases. For other rounding modes I get 8 failures for rounding towards zero (with the exceptional cases), 8 failures too for rounding towards -Inf, and 7 failures for rounding towards +Inf.
Dec 16 2021
a rebase is needed so that this patch can be applied on the 'main' branch
this revision is fine to me (for rounding to nearest), thanks!
Dec 15 2021
this patch does not apply to the current main branch (db5aceb):
$ patch -p1 -i /tmp/D115828.diff patching file libc/config/linux/aarch64/entrypoints.txt Hunk #1 FAILED at 136. 1 out of 1 hunk FAILED -- saving rejects to file libc/config/linux/aarch64/entrypoints.txt.rej
It seems this patch was built on the branch which adds logf.
Dec 14 2021
I confirm the new version is CR for all cases in rounding to nearest. A way to make the exceptional cases CR for directed rounding modes is the following:
Dec 13 2021
maybe I did something wrong, but with the latest version I get two failures for x=0x1.2f1fd6p+3 and x=0x1.bacb4ap+25.
If I disable the test for exceptional values I get five failures, those two and x=0x1.01a33ep+0, x=0x1.b121a6p+76 and 0x1.6351d8p+95.
Dec 10 2021
I confirm this version is correctly rounded for all binary32 inputs and rounding to nearest, by exhaustive testing. For other rounding modes I find 46 incorrect roundings for rounding towards zero, 46 for rounding towards +Inf, and 44 for rounding towards -Inf. I guess part of them are due to the hard-coded values for the 21 exceptional cases, which are on the wrong side with probability 1/2 each. Thus with little additional effort you could get a correctly rounded function for all rounding modes.
Dec 9 2021
Do you know the cost in latency/throughput of the switch() with the 21 exceptional cases? Another way would be to perform a rounding test once the approximation of log has been computed, and go in the switch only if the test fails (which would happen very rarely).