This is an archive of the discontinued LLVM Phabricator instance.

[libc][math] Update range reduction step for logf and reduce its latency.
ClosedPublic

Authored by lntue on Apr 6 2023, 6:48 PM.

Details

Summary

Simplify the range reduction steps by choosing the reduction constants
carefully so that the reduced arguments v = r*m_x - 1 and v^2 are exact in double
precision, even without FMA instructions, and -2^-8 <= v < 2^-7. This allows the
polynomial evaluations to be parallelized more efficiently.

Diff Detail

Event Timeline

lntue created this revision.Apr 6 2023, 6:48 PM
Herald added projects: Restricted Project, Restricted Project. · View Herald TranscriptApr 6 2023, 6:48 PM
lntue requested review of this revision.Apr 6 2023, 6:48 PM
zimmermann6 requested changes to this revision.Apr 7 2023, 12:04 AM

the patch fails to apply to main (revision 10cff75):

$ patch -p1 -i /tmp/D147755.diff 
patching file libc/src/math/generic/common_constants.h
Hunk #1 succeeded at 17 with fuzz 2 (offset -3 lines).
patching file libc/src/math/generic/common_constants.cpp
Hunk #1 FAILED at 109.
Hunk #2 succeeded at 102 with fuzz 2 (offset -28 lines).
1 out of 2 hunks FAILED -- saving rejects to file libc/src/math/generic/common_constants.cpp.rej
patching file libc/src/math/generic/logf.cpp
patching file libc/test/src/math/logf_test.cpp
This revision now requires changes to proceed.Apr 7 2023, 12:04 AM
lntue added a comment.Apr 7 2023, 4:22 AM

the patch fails to apply to main (revision 10cff75):

$ patch -p1 -i /tmp/D147755.diff 
patching file libc/src/math/generic/common_constants.h
Hunk #1 succeeded at 17 with fuzz 2 (offset -3 lines).
patching file libc/src/math/generic/common_constants.cpp
Hunk #1 FAILED at 109.
Hunk #2 succeeded at 102 with fuzz 2 (offset -28 lines).
1 out of 2 hunks FAILED -- saving rejects to file libc/src/math/generic/common_constants.cpp.rej
patching file libc/src/math/generic/logf.cpp
patching file libc/test/src/math/logf_test.cpp

Can you try to apply https://reviews.llvm.org/D147676 before this patch to see if it works?

lntue updated this revision to Diff 511688.Apr 7 2023, 7:33 AM

Sync to head.

lntue added a comment.Apr 7 2023, 7:34 AM

the patch fails to apply to main (revision 10cff75):

$ patch -p1 -i /tmp/D147755.diff 
patching file libc/src/math/generic/common_constants.h
Hunk #1 succeeded at 17 with fuzz 2 (offset -3 lines).
patching file libc/src/math/generic/common_constants.cpp
Hunk #1 FAILED at 109.
Hunk #2 succeeded at 102 with fuzz 2 (offset -28 lines).
1 out of 2 hunks FAILED -- saving rejects to file libc/src/math/generic/common_constants.cpp.rej
patching file libc/src/math/generic/logf.cpp
patching file libc/test/src/math/logf_test.cpp

I've committed https://reviews.llvm.org/D147676. Can you try to sync to head and just apply this patch? Thanks,

zimmermann6 accepted this revision.Apr 7 2023, 8:38 AM

thanks all tests do pass now. For the reciprocal throughput I get:

zimmerma@biscotte:~/svn/core-math$ LIBM=/localdisk/zimmerma/llvm-project/build/projects/libc/lib/libllvmlibc.a ./perf.sh logf
GNU libc version: 2.36
GNU libc release: stable
[####################] 100 %
Ntrial = 20 ; Min = 10.839 + 0.378 clc/call; Median-Min = 0.304 clc/call; Max = 13.593 clc/call;
[####################] 100 %
Ntrial = 20 ; Min = 7.240 + 0.351 clc/call; Median-Min = 0.307 clc/call; Max = 9.576 clc/call;
[####################] 100 %
Ntrial = 20 ; Min = 18.822 + 0.339 clc/call; Median-Min = 0.314 clc/call; Max = 19.396 clc/call;

and for the latency:

zimmerma@biscotte:~/svn/core-math$ PERF_ARGS=--latency LIBM=/localdisk/zimmerma/llvm-project/build/projects/libc/lib/libllvmlibc.a ./perf.sh logf
GNU libc version: 2.36
GNU libc release: stable
[####################] 100 %
Ntrial = 20 ; Min = 46.968 + 0.321 clc/call; Median-Min = 0.301 clc/call; Max = 47.649 clc/call;
[####################] 100 %
Ntrial = 20 ; Min = 38.243 + 0.328 clc/call; Median-Min = 0.318 clc/call; Max = 38.928 clc/call;
[####################] 100 %
Ntrial = 20 ; Min = 54.652 + 0.404 clc/call; Median-Min = 0.329 clc/call; Max = 55.396 clc/call;
This revision is now accepted and ready to land.Apr 7 2023, 8:38 AM
santoshn accepted this revision.Apr 7 2023, 8:50 AM
lntue added a comment.Apr 7 2023, 9:17 AM

thanks all tests do pass now. For the reciprocal throughput I get:

zimmerma@biscotte:~/svn/core-math$ LIBM=/localdisk/zimmerma/llvm-project/build/projects/libc/lib/libllvmlibc.a ./perf.sh logf
GNU libc version: 2.36
GNU libc release: stable
[####################] 100 %
Ntrial = 20 ; Min = 10.839 + 0.378 clc/call; Median-Min = 0.304 clc/call; Max = 13.593 clc/call;
[####################] 100 %
Ntrial = 20 ; Min = 7.240 + 0.351 clc/call; Median-Min = 0.307 clc/call; Max = 9.576 clc/call;
[####################] 100 %
Ntrial = 20 ; Min = 18.822 + 0.339 clc/call; Median-Min = 0.314 clc/call; Max = 19.396 clc/call;

and for the latency:

zimmerma@biscotte:~/svn/core-math$ PERF_ARGS=--latency LIBM=/localdisk/zimmerma/llvm-project/build/projects/libc/lib/libllvmlibc.a ./perf.sh logf
GNU libc version: 2.36
GNU libc release: stable
[####################] 100 %
Ntrial = 20 ; Min = 46.968 + 0.321 clc/call; Median-Min = 0.301 clc/call; Max = 47.649 clc/call;
[####################] 100 %
Ntrial = 20 ; Min = 38.243 + 0.328 clc/call; Median-Min = 0.318 clc/call; Max = 38.928 clc/call;
[####################] 100 %
Ntrial = 20 ; Min = 54.652 + 0.404 clc/call; Median-Min = 0.329 clc/call; Max = 55.396 clc/call;

Thanks for verifying this! I'm a little surprised by the performance number that you got on EPYC. I got the following results on Ryzen 5900X:
For reciprocal throughput:

$ ./perf.sh logf
LIBC-location: /home/lnt/experiment/llvm/llvm-project/build/projects/libc/lib/libllvmlibc.a
GNU libc version: 2.35
GNU libc release: stable
-- CORE-MATH reciprocal throughput --
[####################] 100 %
Ntrial = 20 ; Min = 9.396 + 0.158 clc/call; Median-Min = 0.140 clc/call; Max = 10.384 clc/call;
-- System LIBC reciprocal throughput --
[####################] 100 %
Ntrial = 20 ; Min = 8.852 + 0.128 clc/call; Median-Min = 0.120 clc/call; Max = 9.382 clc/call;
-- LIBC reciprocal throughput --
[####################] 100 %
Ntrial = 20 ; Min = 8.767 + 0.253 clc/call; Median-Min = 0.191 clc/call; Max = 9.264 clc/call;

And for latency:

$ ./perf.sh logf --path2 --latency
LIBC-location: /home/lnt/experiment/llvm/llvm-project/build/projects/libc/lib/libllvmlibc.a
GNU libc version: 2.35
GNU libc release: stable
-- CORE-MATH reciprocal throughput --
[####################] 100 %
Ntrial = 20 ; Min = 36.244 + 0.893 clc/call; Median-Min = 1.115 clc/call; Max = 38.248 clc/call;
-- System LIBC reciprocal throughput --
[####################] 100 %
Ntrial = 20 ; Min = 29.517 + 0.647 clc/call; Median-Min = 0.851 clc/call; Max = 30.541 clc/call;
-- LIBC reciprocal throughput --
[####################] 100 %
Ntrial = 20 ; Min = 33.060 + 0.774 clc/call; Median-Min = 0.972 clc/call; Max = 35.069 clc/call;

It's possible that LLVM libc was built without FMA on your machine. But then it looks like a performance regression for non-FMA targets that I need to fix.

lntue updated this revision to Diff 512045.Apr 9 2023, 4:39 PM

Improve performance for non-FMA targets.