Algorithm for hypotf: compute (a*a + b*b) in double precision, then use Dekker's algorithm to find the rounding error, and then correcting it after taking its square-root.
Incorrect variable naming style at many places in this function.
A call to a target independent builtin for a standard function can lead to a call back to the libc. That is, compilers are free to call the sqrt function from the libc. We can refactor our sqrt implementation so that we can replace this call to __builtin_sqrt with a call to LLVM libc's sqrt.
I refactor our sqrt implementation in https://reviews.llvm.org/D118173
I get some errors for rounding to nearest:
Difference for 0x1.faf49ep+25,0x1.480002p+23 llvm_hypot: 0x1.00c5bp+26 as_hypot: 0x1.00c5b2p+26 pz_hypot: 0x1.00c5b2p+26
I hadn't seen that trick to compute the rounding error, do you have a reference?
If I understand correctly, err should get the rounding error of the sum. The algorithm is known as TwoSum. It needs 6 operations, including the sum sumSq, and this is the same number of operations as you have. But with 6 add/sub operations, I proved that there is only one algorithm (up to obvious symmetries) that works. And the above one is different. Thus it will be sometimes incorrect. I think that if xSq and ySq are close to each other and their sum is not exact, then the above algorithm will give you twice the rounding error.
Thanks Paul for finding the error! For this example, the sum square is actually exact, but the rounding errors still need to be updated in order to avoid double rounding errors. By removing the check (err != 0) in line 36, I got it back correctly.
Thanks Paul and Vincent for finding the issue with this! Actually I was trying to implement the Fast2Sum version since we are in radix-2. Moreover, since both xSq and ySq are non-negative:
max(xSq, ySq) <= sumSq <= sqrt(2) max(xSq, ySq)
and so sumSq - max(xSq, ySq) is exact. I was trying to implement it without branch and thought that (sumSq - min(xSq, ySq)) - max(xSq, ySq) = 0, which could be easily disproved by the following example: Consider single precision with xSq = 1 + 2^(-23) (I know it's not a square) and ySq = 2^(-24) with default rounding mode, then sumSq = xSq + ySq = 1 + 2^(-22). Then:
sumSq - xSq = 2^(-23) and sumSq - ySq = 1 + 2^(-22) = sumSq
(sumSq - xSq) - ySq = 2^-24 (sumSq - ySq) - xSq = 2^-23 ((sumSq - xSq) - ySq) + ((sumSq - ySq) - xSq) != 2^(-24) which is the rounding error.
I change it back to the normal Fast2Sum implementation with branching now, so the rounding error computation should be correct.
I'm still running semi-exhaustive tests, it takes some time. I wonder whether a full exhaustive test is possible, by comparing the LLVM implementation with the code from Alexei at https://core-math.gitlabpages.inria.fr/. On a 64-core machine (Intel Xeon Gold 6130 @ 2.10GHz), it takes 4.6s to check 2^33 pairs (x,y). If one tests only positive x,y and x>=y, as exhaustive comparison would have to check 2^61 pairs for each rounding mode, which would take less than 1.5 month using 10000 such machines. This would not be a proof, but the probability that both codes are wrong for the same inputs and give exactly the same wrong answer is quite small.
Or if you don't mind to be slower, you can compare it with the shift-and-add algorithm implemented in the LLVM-libc that this one is trying to speed up, since that can be proved mathematically to be correct.
Another option is that since the idea of this algorithm is scalable, we can have a version of it for half precision (essentially just change the data types and masks/constants), where we it can be tested exhaustively? That should at least increase the confidence with single, and maybe double precision later?
Add a quick return when the exponent difference is at least 2 more than the mantissa length.
the version of last Friday is fine for me: I did run exhaustive tests for 2^23 <= y < 2^24, and 2^(23+k) <= x < 2^(24+k) for 0 <= k <= 13.
However since it changed in the meantime, I don't have resources any more to review the new version.
Thanks Paul for checking! The new version only changes when exponent of x >= exponent of y + 25, so your verification should still hold.
Add exhaustive testing to test for inputs (x, y) with 2^23 <= x < 2^24, 2^(23 + 14) <= y < 2^(23 + 25).
@zimmermann6 : I've finished testing the remaining pairs: (x, y) with 2^23 <= y < 2^24, and 2^(23+k) <= x < 2^(24+k) for 14 <= k <= 24.