This is an archive of the discontinued LLVM Phabricator instance.

[libc][math] Update range reduction step for log10f and reduce its latency.
ClosedPublic

Authored by lntue on Apr 5 2023, 7:32 PM.

Details

Summary

Simplify the range reduction steps by choosing the reduction constants
carefully so that the reduced arguments v = r*m_x - 1 and v^2 are exact in double
precision, even without FMA instructions, and -2^-8 <= v < 2^-7. This allows the
polynomial evaluations to be parallelized more efficiently.

Diff Detail

Event Timeline

lntue created this revision.Apr 5 2023, 7:32 PM
Herald added projects: Restricted Project, Restricted Project. · View Herald TranscriptApr 5 2023, 7:32 PM
lntue requested review of this revision.Apr 5 2023, 7:32 PM
zimmermann6 requested changes to this revision.Apr 6 2023, 2:16 AM

I get a failure for rounding down:

zimmerma@biscotte:~/svn/core-math$ CORE_MATH_CHECK_STD=true LIBM=/localdisk/zimmerma/llvm-project/build/projects/libc/lib/libllvmlibc.a ./check.sh log10f
Running exhaustive check in --rndn mode...
all ok
Running exhaustive check in --rndz mode...
all ok
Running exhaustive check in --rndu mode...
all ok
Running exhaustive check in --rndd mode...
FAIL x=0x1p+0 ref=0x0p+0 y=-0x0p+0

Also, on an AMD EPYC 7282 I get a regression in speed. With master:

zimmerma@biscotte:~/svn/core-math$ LIBM=/localdisk/zimmerma/llvm-project/build/projects/libc/lib/libllvmlibc.a ./perf.sh log10f
GNU libc version: 2.36
GNU libc release: stable
[####################] 100 %
Ntrial = 20 ; Min = 10.531 + 0.273 clc/call; Median-Min = 0.281 clc/call; Max = 13.047 clc/call;
[####################] 100 %
Ntrial = 20 ; Min = 18.529 + 0.342 clc/call; Median-Min = 0.309 clc/call; Max = 19.811 clc/call;
[####################] 100 %
Ntrial = 20 ; Min = 13.059 + 0.526 clc/call; Median-Min = 0.290 clc/call; Max = 15.586 clc/call;

With this patch:

zimmerma@biscotte:~/svn/core-math$ LIBM=/localdisk/zimmerma/llvm-project/build/projects/libc/lib/libllvmlibc.a ./perf.sh log10f
GNU libc version: 2.36
GNU libc release: stable
[####################] 100 %
Ntrial = 20 ; Min = 10.534 + 0.297 clc/call; Median-Min = 0.303 clc/call; Max = 11.415 clc/call;
[####################] 100 %
Ntrial = 20 ; Min = 18.529 + 0.561 clc/call; Median-Min = 0.327 clc/call; Max = 20.729 clc/call;
[####################] 100 %
Ntrial = 20 ; Min = 19.791 + 0.313 clc/call; Median-Min = 0.338 clc/call; Max = 22.809 clc/call;
This revision now requires changes to proceed.Apr 6 2023, 2:16 AM
lntue updated this revision to Diff 511415.Apr 6 2023, 7:42 AM

Fix special case log10(1.0f) = 0.0f, and improve performance.

lntue added a comment.Apr 6 2023, 7:47 AM

I get a failure for rounding down:

zimmerma@biscotte:~/svn/core-math$ CORE_MATH_CHECK_STD=true LIBM=/localdisk/zimmerma/llvm-project/build/projects/libc/lib/libllvmlibc.a ./check.sh log10f
Running exhaustive check in --rndn mode...
all ok
Running exhaustive check in --rndz mode...
all ok
Running exhaustive check in --rndu mode...
all ok
Running exhaustive check in --rndd mode...
FAIL x=0x1p+0 ref=0x0p+0 y=-0x0p+0

Also, on an AMD EPYC 7282 I get a regression in speed. With master:

zimmerma@biscotte:~/svn/core-math$ LIBM=/localdisk/zimmerma/llvm-project/build/projects/libc/lib/libllvmlibc.a ./perf.sh log10f
GNU libc version: 2.36
GNU libc release: stable
[####################] 100 %
Ntrial = 20 ; Min = 10.531 + 0.273 clc/call; Median-Min = 0.281 clc/call; Max = 13.047 clc/call;
[####################] 100 %
Ntrial = 20 ; Min = 18.529 + 0.342 clc/call; Median-Min = 0.309 clc/call; Max = 19.811 clc/call;
[####################] 100 %
Ntrial = 20 ; Min = 13.059 + 0.526 clc/call; Median-Min = 0.290 clc/call; Max = 15.586 clc/call;

With this patch:

zimmerma@biscotte:~/svn/core-math$ LIBM=/localdisk/zimmerma/llvm-project/build/projects/libc/lib/libllvmlibc.a ./perf.sh log10f
GNU libc version: 2.36
GNU libc release: stable
[####################] 100 %
Ntrial = 20 ; Min = 10.534 + 0.297 clc/call; Median-Min = 0.303 clc/call; Max = 11.415 clc/call;
[####################] 100 %
Ntrial = 20 ; Min = 18.529 + 0.561 clc/call; Median-Min = 0.327 clc/call; Max = 20.729 clc/call;
[####################] 100 %
Ntrial = 20 ; Min = 19.791 + 0.313 clc/call; Median-Min = 0.338 clc/call; Max = 22.809 clc/call;

I've fixed the issue with signed zero, and here are the current performance numbers I got on Ryzen 5900X:

$ ./perf.sh log10f --path2
LIBC-location: /home/lnt/experiment/llvm/llvm-project/build/projects/libc/lib/libllvmlibc.a
GNU libc version: 2.35
GNU libc release: stable
-- CORE-MATH reciprocal throughput --
[####################] 100 %
Ntrial = 20 ; Min = 10.004 + 0.159 clc/call; Median-Min = 0.198 clc/call; Max = 10.279 clc/call;
-- System LIBC reciprocal throughput --
[####################] 100 %
Ntrial = 20 ; Min = 16.652 + 0.222 clc/call; Median-Min = 0.264 clc/call; Max = 16.999 clc/call;
-- LIBC reciprocal throughput --
[####################] 100 %
Ntrial = 20 ; Min = 8.586 + 0.194 clc/call; Median-Min = 0.204 clc/call; Max = 8.921 clc/call;

And for latency:

$ ./perf.sh log10f --path2 --latency
LIBC-location: /home/lnt/experiment/llvm/llvm-project/build/projects/libc/lib/libllvmlibc.a
GNU libc version: 2.35
GNU libc release: stable
-- CORE-MATH reciprocal throughput --
[####################] 100 %
Ntrial = 20 ; Min = 35.298 + 0.314 clc/call; Median-Min = 0.312 clc/call; Max = 35.782 clc/call;
-- System LIBC reciprocal throughput --
[####################] 100 %
Ntrial = 20 ; Min = 48.766 + 0.755 clc/call; Median-Min = 0.917 clc/call; Max = 49.771 clc/call;
-- LIBC reciprocal throughput --
[####################] 100 %
Ntrial = 20 ; Min = 34.878 + 0.203 clc/call; Median-Min = 0.246 clc/call; Max = 35.256 clc/call;
lntue updated this revision to Diff 511422.Apr 6 2023, 8:02 AM

Fix comments on range reduction constant formula.

zimmermann6 accepted this revision.Apr 6 2023, 11:07 PM

the failure is fixed now, and the performance is slightly better than the 'main' branch on my machine:

zimmerma@biscotte:~/svn/core-math$ LIBM=/localdisk/zimmerma/llvm-project/build/projects/libc/lib/libllvmlibc.a ./perf.sh log10f
GNU libc version: 2.36
GNU libc release: stable
[####################] 100 %
Ntrial = 20 ; Min = 10.523 + 0.392 clc/call; Median-Min = 0.332 clc/call; Max = 11.603 clc/call;
[####################] 100 %
Ntrial = 20 ; Min = 18.521 + 0.418 clc/call; Median-Min = 0.330 clc/call; Max = 19.653 clc/call;
[####################] 100 %
Ntrial = 20 ; Min = 12.929 + 0.626 clc/call; Median-Min = 0.297 clc/call; Max = 15.292 clc/call;
This revision is now accepted and ready to land.Apr 6 2023, 11:07 PM