This is an archive of the discontinued LLVM Phabricator instance.

[libc] Implement correctly rounded logf based on RLIBM library.
ClosedPublic

Authored by lntue on Dec 8 2021, 4:36 PM.

Diff Detail

Event Timeline

lntue created this revision.Dec 8 2021, 4:36 PM
Herald added a project: Restricted Project. · View Herald TranscriptDec 8 2021, 4:36 PM
lntue requested review of this revision.Dec 8 2021, 4:36 PM

Just a few cosmetic comments for now. Needs a rebase though to absorb a lot style changes.

libc/src/math/generic/logf.cpp
38

s/check out/refer to

39

s/ribm/rlibm

Also, seems like this is a yet to be published paper. So, we should update the link when it gets published.

47

For this and others, when they are already in an internal namespace which has a prefix of __, we shouldn't use __ prefix for globals. Also, can this be a constexpr?

michaelrj added inline comments.Dec 9 2021, 10:17 AM
libc/src/__support/FPUtil/x86_64/PolyEvalDouble.h
2

nit: formatting

libc/src/__support/FPUtil/x86_64/PolyEvalFloat.h
2

nit: formatting

Do you believe this is a permalink? Maybe cite a paper of the algorithm?

lntue updated this revision to Diff 393244.Dec 9 2021, 11:52 AM

Updating D115408: [libc] Implement correctly rounded logf based on RLIBM library.

lntue updated this revision to Diff 393325.Dec 9 2021, 4:06 PM
lntue marked an inline comment as done.

[libc] Implement correctly rounded logf based on RLIBM library.

lntue marked an inline comment as done.Dec 9 2021, 4:08 PM

Do you believe this is a permalink? Maybe cite a paper of the algorithm?

Thanks, I cited the full title of the paper instead.

libc/src/__support/FPUtil/x86_64/PolyEvalDouble.h
2

This file was moved to https://reviews.llvm.org/D115347

libc/src/__support/FPUtil/x86_64/PolyEvalFloat.h
2

This file was moved to https://reviews.llvm.org/D115347

libc/src/math/generic/logf.cpp
39

I added the full reference of the paper.

lntue updated this revision to Diff 393350.Dec 9 2021, 5:46 PM

[libc] Implement correctly rounded logf based on RLIBM library.

Do you know the cost in latency/throughput of the switch() with the 21 exceptional cases? Another way would be to perform a rounding test once the approximation of log has been computed, and go in the switch only if the test fails (which would happen very rarely).

I confirm this version is correctly rounded for all binary32 inputs and rounding to nearest, by exhaustive testing. For other rounding modes I find 46 incorrect roundings for rounding towards zero, 46 for rounding towards +Inf, and 44 for rounding towards -Inf. I guess part of them are due to the hard-coded values for the 21 exceptional cases, which are on the wrong side with probability 1/2 each. Thus with little additional effort you could get a correctly rounded function for all rounding modes.

One reason there are 21 exceptional inputs is because this patch uses a different polynomial evaluation than Horner's method that RLIBM uses. Let me generate a new polynomial using the polynomial evaluation in the patch. It will have significantly fewer exceptional inputs for a 5th-degree polynomial for log.

Addressing Paul Zimmerman's comment about other rounding modes, if you use the RLIBM polynomial generated with the round-to-odd for a 34-bit FP, it will produce correct results for all rounding modes. Isn't that a better solution than using the round-to-nearest polynomial and then tweaking it with exceptional inputs?

If the goal is to generate 5th degree polynomial for log for the round-to-nearest with the specified polynomial evaluation in the patch.

Here is the best polynomial that I could generate as of now with 3 special case inputs;

Polynomial: y=0.0000000000000000000000000000000000000000000000000000000000000000000000e+00 x^(0) + 1.0000000000073561157165613622055388987064361572265625000000000000000000e+00 x^(1) + -5.0000000925149434838345996467978693544864654541015625000000000000000000e-01 x^(2) + 3.3333714042884438066849384085799101740121841430664062500000000000000000e-01 x^(3) + -2.5063110611886163514583358846721239387989044189453125000000000000000000e-01 x^(4) + 2.3503273830199439276000816789746750146150588989257812500000000000000000e-01 x^(5)

Special case inputs:

1.7966015845941743847569149750142969423905014991760253906250000000000000e-03
2.6190722430193863652647667805695164133794605731964111328125000000000000e-03
3.7279721707274009363797251381811292958445847034454345703125000000000000e-03

The constant term is 0.

lntue updated this revision to Diff 393721.Dec 11 2021, 10:44 PM

Use a more accurate polynomial provided by @santoshn.

I've updated the patch using the polynomial provided by @santoshn . I've tried to benchmark various implementations, and it seems like the most efficient way to compute the quintic polynomial with zero free coeff that @santoshn provided is as follow:

  • First compute it as a quartic polynomial r = a_5 x^4 + a4 x^3 + ... + a1 = polyeval(x, a1, a2, a3, a4, a5) = fma(polyeval(x, a2, a3, a4, a5), x, a1) // cubic polynomial is optimized in src/math/__support/FPUtil/x86_64/PolyEval.h
  • Then use another fma to multiply r by x, and the add the extra part (m * log(2) + log(f)): result = fma(r, x, fma(m, log(2), log(f)))

With this computation, the exhaustive test found out that the number of exception cases reduced to 5: // Input decimal: 1.00639712810516357421875000000000000000000000000000

//   MPFR result: 0.00637675332837318907527892795527933097341434970404
// Input decimal: 9.47263622283935546875000000000000000000000000000000
//   MPFR result: 2.24840724468231192940413656173141803167922920169308
// Input decimal: 58037908.00000000000000000000000000000000000000000000000000
//   MPFR result: 17.87660694122314468798277511771537045730722298001695
// Input decimal: 127837836949849943048192.00000000000000000000000000000000000000000000000000
//   MPFR result: 53.20504951477050802794144423622214960099434498468168
// Input decimal: 54983060754563292101907316736.00000000000000000000000000000000000000000000000000
//   MPFR result: 66.17682266235352211194320433287158374936494916276426

The last 2 large exceptional value might come from evaluating m*log(2) + log(f) with fma.

@zimmermann6 : About latency: The early switch cases for exceptional values did slow down the whole function by 20-25%. I was playing around with it and found out that by moving the switch for exceptional cases close to the final fma call, the extra latency due to the switch is reduce to only ~5% of the whole function, which I think is acceptable.
About correct rounding for all the rounding modes, even though it would be a few extra switches, I'll need to update the testing infrastructure, especially the support for MPFR with all rounding modes, and the exhaustive tests for that. Also there is another design question needs to be considered is to support all rounding modes at compile-time and run-time. Since that works quite significant by itself, I would prefer to add the support for other rounding modes in a different patch.

maybe I did something wrong, but with the latest version I get two failures for x=0x1.2f1fd6p+3 and x=0x1.bacb4ap+25.
If I disable the test for exceptional values I get five failures, those two and x=0x1.01a33ep+0, x=0x1.b121a6p+76 and 0x1.6351d8p+95.

With the following minimax polynomial found using Sollya I get only four failures (the value x=0x1.01a33ep+0 is CR):
0.19620128080245119450708557451434899121522903442383 * x^5 + -0.24996749678377125358785804110084427520632743835449 * x^4 + 0.33333320666203714033315463893814012408256530761719 * x^3 + -0.49999999978385922805301788685028441250324249267578 * x^2 + 0.9999999999998795408018281705153640359640121459961 * x

lntue updated this revision to Diff 393947.Dec 13 2021, 9:50 AM

Fix double-rounding errors for exceptional cases.

lntue added a comment.Dec 13 2021, 9:51 AM

maybe I did something wrong, but with the latest version I get two failures for x=0x1.2f1fd6p+3 and x=0x1.bacb4ap+25.
If I disable the test for exceptional values I get five failures, those two and x=0x1.01a33ep+0, x=0x1.b121a6p+76 and 0x1.6351d8p+95.

With the following minimax polynomial found using Sollya I get only four failures (the value x=0x1.01a33ep+0 is CR):
0.19620128080245119450708557451434899121522903442383 * x^5 + -0.24996749678377125358785804110084427520632743835449 * x^4 + 0.33333320666203714033315463893814012408256530761719 * x^3 + -0.49999999978385922805301788685028441250324249267578 * x^2 + 0.9999999999998795408018281705153640359640121459961 * x

Something is not right, since all of the failures that you found were already in the list of exceptional values:
(hexfloat, decimal, hex bits in float)
(0x0x1.01a33ep+0, 1.00639712810516357421875, 0x3f80d19f)
(0x1.2f1fd6p+3, 9.47263622283935546875, 0x41178feb)
(0x1.bacb4ap+25, 58037908.0, 0x4c5d65a5)
(0x1.b121a6p+76, 127837836949849943048192.0, 0x65d890d3)
(0x1.6351d8p+95, 54983060754563292101907316736.0, 0x6f31a8ec)

Nonetheless, after clearing the build cache and re-run the tests, I found 2 exceptional cases' results were off due to double-conversion when I tried to print their hexadecimal floats (inadvertently using intermediate doubles).
You might try clear the build cache and rerun the tests again.

I confirm the new version is CR for all cases in rounding to nearest. A way to make the exceptional cases CR for directed rounding modes is the following:

case 0x3f80d19f:
  {
    volatile double h = 0x1.a1e82cp-8f, l = -0x1.fffd1ep-33f;
    return h + l;
  }

Or compile with -frounding-math to avoid the "volatile" trick.

zimmermann6 accepted this revision.Dec 16 2021, 9:09 AM

this revision is fine to me (for rounding to nearest), thanks!

This revision is now accepted and ready to land.Dec 16 2021, 9:09 AM
sivachandra accepted this revision.Dec 16 2021, 9:37 AM
santoshn accepted this revision.Dec 16 2021, 9:52 AM
jpl169 accepted this revision.Dec 16 2021, 9:56 AM
cqlauter added inline comments.Dec 17 2021, 12:19 PM
libc/src/math/generic/logf.cpp
22

I think that this addition will burn a lot of accuracy in the case when the exponent m = -1 and 1.mant is close to 2. This is catastrophic cancellation.

There is a trick to avoid the cancellation, typically by bringing 1.mant into the range 0.75 <= 1.mant <= 1.5, which can be done by adding in a 1 into the IEEE754 representation at the right place.

cqlauter added inline comments.Dec 17 2021, 12:23 PM
libc/src/math/generic/logf.cpp
174

Overall, I think that there should be way more comments for math functions like this one. A line like the one I am highlighting is so hard to understand, even when you are working on that code. Imagine you open up the code in 5 years and need to reunderstand a line like this.