This is an archive of the discontinued LLVM Phabricator instance.

[libc] Improve hypotf performance with different algorithm correctly rounded to all rounding modes.
ClosedPublic

Authored by lntue on Jan 25 2022, 8:52 AM.

Details

Summary

Algorithm for hypotf: compute (a*a + b*b) in double precision, then use Dekker's algorithm to find the rounding error, and then correcting it after taking its square-root.

Diff Detail

Event Timeline

lntue created this revision.Jan 25 2022, 8:52 AM
Herald added a project: Restricted Project. · View Herald TranscriptJan 25 2022, 8:52 AM
lntue requested review of this revision.Jan 25 2022, 8:52 AM
sivachandra added inline comments.Jan 25 2022, 9:19 AM
libc/src/math/generic/hypotf.cpp
22

Incorrect variable naming style at many places in this function.

32

A call to a target independent builtin for a standard function can lead to a call back to the libc. That is, compilers are free to call the sqrt function from the libc. We can refactor our sqrt implementation so that we can replace this call to __builtin_sqrt with a call to LLVM libc's sqrt.

lntue added inline comments.Jan 25 2022, 12:20 PM
libc/src/math/generic/hypotf.cpp
32

I refactor our sqrt implementation in https://reviews.llvm.org/D118173
Will wait for that patch to be landed.

zimmermann6 requested changes to this revision.EditedJan 26 2022, 6:07 AM

I get some errors for rounding to nearest:

Difference for 0x1.faf49ep+25,0x1.480002p+23
llvm_hypot: 0x1.00c5bp+26
as_hypot:   0x1.00c5b2p+26
pz_hypot:   0x1.00c5b2p+26
libc/src/math/generic/hypotf.cpp
29–30

I hadn't seen that trick to compute the rounding error, do you have a reference?
By the way, I'm not sure the reference to Dekker is appropriate. For me, Dekker's algorithm splits
two floating-point numbers in two each, and computes their product (high + low part) using 4 multiplies.

This revision now requires changes to proceed.Jan 26 2022, 6:07 AM
vinc17 added a subscriber: vinc17.Jan 26 2022, 6:36 AM
vinc17 added inline comments.
libc/src/math/generic/hypotf.cpp
29–30

If I understand correctly, err should get the rounding error of the sum. The algorithm is known as TwoSum. It needs 6 operations, including the sum sumSq, and this is the same number of operations as you have. But with 6 add/sub operations, I proved that there is only one algorithm (up to obvious symmetries) that works. And the above one is different. Thus it will be sometimes incorrect. I think that if xSq and ySq are close to each other and their sum is not exact, then the above algorithm will give you twice the rounding error.

lntue updated this revision to Diff 403268.Jan 26 2022, 7:51 AM

Remove the check for non-zero error.

lntue added a comment.Jan 26 2022, 7:53 AM

I get some errors for rounding to nearest:

Difference for 0x1.faf49ep+25,0x1.480002p+23
llvm_hypot: 0x1.00c5bp+26
as_hypot:   0x1.00c5b2p+26
pz_hypot:   0x1.00c5b2p+26

Thanks Paul for finding the error! For this example, the sum square is actually exact, but the rounding errors still need to be updated in order to avoid double rounding errors. By removing the check (err != 0) in line 36, I got it back correctly.

lntue marked an inline comment as not done.Jan 26 2022, 8:18 AM
lntue added inline comments.
libc/src/math/generic/hypotf.cpp
29–30

Thanks Paul and Vincent for finding the issue with this! Actually I was trying to implement the Fast2Sum version since we are in radix-2. Moreover, since both xSq and ySq are non-negative:

max(xSq, ySq) <= sumSq <= sqrt(2) max(xSq, ySq)

and so sumSq - max(xSq, ySq) is exact. I was trying to implement it without branch and thought that (sumSq - min(xSq, ySq)) - max(xSq, ySq) = 0, which could be easily disproved by the following example: Consider single precision with xSq = 1 + 2^(-23) (I know it's not a square) and ySq = 2^(-24) with default rounding mode, then sumSq = xSq + ySq = 1 + 2^(-22). Then:

sumSq - xSq = 2^(-23)  and  sumSq - ySq = 1 + 2^(-22) = sumSq

And hence:

(sumSq - xSq) - ySq = 2^-24
(sumSq - ySq) - xSq = 2^-23
((sumSq - xSq) - ySq) + ((sumSq - ySq) - xSq) != 2^(-24)  which is the rounding error.

I change it back to the normal Fast2Sum implementation with branching now, so the rounding error computation should be correct.

I'm still running semi-exhaustive tests, it takes some time. I wonder whether a full exhaustive test is possible, by comparing the LLVM implementation with the code from Alexei at https://core-math.gitlabpages.inria.fr/. On a 64-core machine (Intel Xeon Gold 6130 @ 2.10GHz), it takes 4.6s to check 2^33 pairs (x,y). If one tests only positive x,y and x>=y, as exhaustive comparison would have to check 2^61 pairs for each rounding mode, which would take less than 1.5 month using 10000 such machines. This would not be a proof, but the probability that both codes are wrong for the same inputs and give exactly the same wrong answer is quite small.

lntue marked an inline comment as not done.Jan 28 2022, 9:20 AM

I'm still running semi-exhaustive tests, it takes some time. I wonder whether a full exhaustive test is possible, by comparing the LLVM implementation with the code from Alexei at https://core-math.gitlabpages.inria.fr/. On a 64-core machine (Intel Xeon Gold 6130 @ 2.10GHz), it takes 4.6s to check 2^33 pairs (x,y). If one tests only positive x,y and x>=y, as exhaustive comparison would have to check 2^61 pairs for each rounding mode, which would take less than 1.5 month using 10000 such machines. This would not be a proof, but the probability that both codes are wrong for the same inputs and give exactly the same wrong answer is quite small.

Or if you don't mind to be slower, you can compare it with the shift-and-add algorithm implemented in the LLVM-libc that this one is trying to speed up, since that can be proved mathematically to be correct.

Another option is that since the idea of this algorithm is scalable, we can have a version of it for half precision (essentially just change the data types and masks/constants), where we it can be tested exhaustively? That should at least increase the confidence with single, and maybe double precision later?

lntue updated this revision to Diff 404181.Jan 28 2022, 3:14 PM

Use fputil::sqrt instead of __builtin_sqrtf.

lntue updated this revision to Diff 404184.Jan 28 2022, 3:21 PM

Fix variable names and ignore compiler warnings about C++17.

lntue updated this revision to Diff 404339.Jan 29 2022, 9:48 PM

Add a quick return when the exponent difference is at least 2 more than the mantissa length.

lntue marked an inline comment as done.Jan 30 2022, 6:38 AM
zimmermann6 resigned from this revision.Jan 31 2022, 12:12 AM

the version of last Friday is fine for me: I did run exhaustive tests for 2^23 <= y < 2^24, and 2^(23+k) <= x < 2^(24+k) for 0 <= k <= 13.
However since it changed in the meantime, I don't have resources any more to review the new version.

the version of last Friday is fine for me: I did run exhaustive tests for 2^23 <= y < 2^24, and 2^(23+k) <= x < 2^(24+k) for 0 <= k <= 13.
However since it changed in the meantime, I don't have resources any more to review the new version.

Thanks Paul for checking! The new version only changes when exponent of x >= exponent of y + 25, so your verification should still hold.

lntue updated this revision to Diff 404637.Jan 31 2022, 11:37 AM

Add exhaustive testing to test for inputs (x, y) with 2^23 <= x < 2^24, 2^(23 + 14) <= y < 2^(23 + 25).

lntue updated this revision to Diff 404641.Jan 31 2022, 11:39 AM

Fix names in the exhaustive test.

sivachandra accepted this revision.Jan 31 2022, 3:12 PM

OK for the structuring aspects.

This revision is now accepted and ready to land.Jan 31 2022, 3:12 PM
lntue marked an inline comment as done.Feb 13 2022, 8:20 PM

the version of last Friday is fine for me: I did run exhaustive tests for 2^23 <= y < 2^24, and 2^(23+k) <= x < 2^(24+k) for 0 <= k <= 13.
However since it changed in the meantime, I don't have resources any more to review the new version.

@zimmermann6 : I've finished testing the remaining pairs: (x, y) with 2^23 <= y < 2^24, and 2^(23+k) <= x < 2^(24+k) for 14 <= k <= 24.

lntue updated this revision to Diff 408320.Feb 13 2022, 8:22 PM

Sync to HEAD.