Implement correctly rounded logf based on RLIBM library: https://people.cs.rutgers.edu/~sn349/rlibm/.
Details
Diff Detail
- Repository
- rG LLVM Github Monorepo
Event Timeline
Just a few cosmetic comments for now. Needs a rebase though to absorb a lot style changes.
libc/src/math/generic/logf.cpp | ||
---|---|---|
39 | s/check out/refer to | |
40 | s/ribm/rlibm Also, seems like this is a yet to be published paper. So, we should update the link when it gets published. | |
48 | For this and others, when they are already in an internal namespace which has a prefix of __, we shouldn't use __ prefix for globals. Also, can this be a constexpr? |
Thanks, I cited the full title of the paper instead.
libc/src/__support/FPUtil/x86_64/PolyEvalDouble.h | ||
---|---|---|
2 ↗ | (On Diff #392978) | This file was moved to https://reviews.llvm.org/D115347 |
libc/src/__support/FPUtil/x86_64/PolyEvalFloat.h | ||
2 ↗ | (On Diff #392978) | This file was moved to https://reviews.llvm.org/D115347 |
libc/src/math/generic/logf.cpp | ||
40 | I added the full reference of the paper. |
Do you know the cost in latency/throughput of the switch() with the 21 exceptional cases? Another way would be to perform a rounding test once the approximation of log has been computed, and go in the switch only if the test fails (which would happen very rarely).
I confirm this version is correctly rounded for all binary32 inputs and rounding to nearest, by exhaustive testing. For other rounding modes I find 46 incorrect roundings for rounding towards zero, 46 for rounding towards +Inf, and 44 for rounding towards -Inf. I guess part of them are due to the hard-coded values for the 21 exceptional cases, which are on the wrong side with probability 1/2 each. Thus with little additional effort you could get a correctly rounded function for all rounding modes.
One reason there are 21 exceptional inputs is because this patch uses a different polynomial evaluation than Horner's method that RLIBM uses. Let me generate a new polynomial using the polynomial evaluation in the patch. It will have significantly fewer exceptional inputs for a 5th-degree polynomial for log.
Addressing Paul Zimmerman's comment about other rounding modes, if you use the RLIBM polynomial generated with the round-to-odd for a 34-bit FP, it will produce correct results for all rounding modes. Isn't that a better solution than using the round-to-nearest polynomial and then tweaking it with exceptional inputs?
If the goal is to generate 5th degree polynomial for log for the round-to-nearest with the specified polynomial evaluation in the patch.
Here is the best polynomial that I could generate as of now with 3 special case inputs;
Polynomial: y=0.0000000000000000000000000000000000000000000000000000000000000000000000e+00 x^(0) + 1.0000000000073561157165613622055388987064361572265625000000000000000000e+00 x^(1) + -5.0000000925149434838345996467978693544864654541015625000000000000000000e-01 x^(2) + 3.3333714042884438066849384085799101740121841430664062500000000000000000e-01 x^(3) + -2.5063110611886163514583358846721239387989044189453125000000000000000000e-01 x^(4) + 2.3503273830199439276000816789746750146150588989257812500000000000000000e-01 x^(5)
Special case inputs:
1.7966015845941743847569149750142969423905014991760253906250000000000000e-03
2.6190722430193863652647667805695164133794605731964111328125000000000000e-03
3.7279721707274009363797251381811292958445847034454345703125000000000000e-03
The constant term is 0.
I've updated the patch using the polynomial provided by @santoshn . I've tried to benchmark various implementations, and it seems like the most efficient way to compute the quintic polynomial with zero free coeff that @santoshn provided is as follow:
- First compute it as a quartic polynomial r = a_5 x^4 + a4 x^3 + ... + a1 = polyeval(x, a1, a2, a3, a4, a5) = fma(polyeval(x, a2, a3, a4, a5), x, a1) // cubic polynomial is optimized in src/math/__support/FPUtil/x86_64/PolyEval.h
- Then use another fma to multiply r by x, and the add the extra part (m * log(2) + log(f)): result = fma(r, x, fma(m, log(2), log(f)))
With this computation, the exhaustive test found out that the number of exception cases reduced to 5: // Input decimal: 1.00639712810516357421875000000000000000000000000000
// MPFR result: 0.00637675332837318907527892795527933097341434970404 // Input decimal: 9.47263622283935546875000000000000000000000000000000 // MPFR result: 2.24840724468231192940413656173141803167922920169308 // Input decimal: 58037908.00000000000000000000000000000000000000000000000000 // MPFR result: 17.87660694122314468798277511771537045730722298001695 // Input decimal: 127837836949849943048192.00000000000000000000000000000000000000000000000000 // MPFR result: 53.20504951477050802794144423622214960099434498468168 // Input decimal: 54983060754563292101907316736.00000000000000000000000000000000000000000000000000 // MPFR result: 66.17682266235352211194320433287158374936494916276426
The last 2 large exceptional value might come from evaluating m*log(2) + log(f) with fma.
@zimmermann6 : About latency: The early switch cases for exceptional values did slow down the whole function by 20-25%. I was playing around with it and found out that by moving the switch for exceptional cases close to the final fma call, the extra latency due to the switch is reduce to only ~5% of the whole function, which I think is acceptable.
About correct rounding for all the rounding modes, even though it would be a few extra switches, I'll need to update the testing infrastructure, especially the support for MPFR with all rounding modes, and the exhaustive tests for that. Also there is another design question needs to be considered is to support all rounding modes at compile-time and run-time. Since that works quite significant by itself, I would prefer to add the support for other rounding modes in a different patch.
maybe I did something wrong, but with the latest version I get two failures for x=0x1.2f1fd6p+3 and x=0x1.bacb4ap+25.
If I disable the test for exceptional values I get five failures, those two and x=0x1.01a33ep+0, x=0x1.b121a6p+76 and 0x1.6351d8p+95.
With the following minimax polynomial found using Sollya I get only four failures (the value x=0x1.01a33ep+0 is CR):
0.19620128080245119450708557451434899121522903442383 * x^5 + -0.24996749678377125358785804110084427520632743835449 * x^4 + 0.33333320666203714033315463893814012408256530761719 * x^3 + -0.49999999978385922805301788685028441250324249267578 * x^2 + 0.9999999999998795408018281705153640359640121459961 * x
Something is not right, since all of the failures that you found were already in the list of exceptional values:
(hexfloat, decimal, hex bits in float)
(0x0x1.01a33ep+0, 1.00639712810516357421875, 0x3f80d19f)
(0x1.2f1fd6p+3, 9.47263622283935546875, 0x41178feb)
(0x1.bacb4ap+25, 58037908.0, 0x4c5d65a5)
(0x1.b121a6p+76, 127837836949849943048192.0, 0x65d890d3)
(0x1.6351d8p+95, 54983060754563292101907316736.0, 0x6f31a8ec)
Nonetheless, after clearing the build cache and re-run the tests, I found 2 exceptional cases' results were off due to double-conversion when I tried to print their hexadecimal floats (inadvertently using intermediate doubles).
You might try clear the build cache and rerun the tests again.
I confirm the new version is CR for all cases in rounding to nearest. A way to make the exceptional cases CR for directed rounding modes is the following:
case 0x3f80d19f: { volatile double h = 0x1.a1e82cp-8f, l = -0x1.fffd1ep-33f; return h + l; }
Or compile with -frounding-math to avoid the "volatile" trick.
libc/src/math/generic/logf.cpp | ||
---|---|---|
22 | I think that this addition will burn a lot of accuracy in the case when the exponent m = -1 and 1.mant is close to 2. This is catastrophic cancellation. There is a trick to avoid the cancellation, typically by bringing 1.mant into the range 0.75 <= 1.mant <= 1.5, which can be done by adding in a 1 into the IEEE754 representation at the right place. |
libc/src/math/generic/logf.cpp | ||
---|---|---|
174 | Overall, I think that there should be way more comments for math functions like this one. A line like the one I am highlighting is so hard to understand, even when you are working on that code. Imagine you open up the code in 5 years and need to reunderstand a line like this. |
I think that this addition will burn a lot of accuracy in the case when the exponent m = -1 and 1.mant is close to 2. This is catastrophic cancellation.
There is a trick to avoid the cancellation, typically by bringing 1.mant into the range 0.75 <= 1.mant <= 1.5, which can be done by adding in a 1 into the IEEE754 representation at the right place.