Algorithm for hypotf: compute (a*a + b*b) in double precision, then use Dekker's algorithm to find the rounding error, and then correcting it after taking its square-root.

# Details

# Diff Detail

- Repository
- rG LLVM Github Monorepo

### Event Timeline

libc/src/math/generic/hypotf.cpp | ||
---|---|---|

24 | Incorrect variable naming style at many places in this function. | |

34 | A call to a target independent builtin for a standard function can lead to a call back to the libc. That is, compilers are free to call the |

libc/src/math/generic/hypotf.cpp | ||
---|---|---|

34 | I refactor our |

I get some errors for rounding to nearest:

Difference for 0x1.faf49ep+25,0x1.480002p+23 llvm_hypot: 0x1.00c5bp+26 as_hypot: 0x1.00c5b2p+26 pz_hypot: 0x1.00c5b2p+26

libc/src/math/generic/hypotf.cpp | ||
---|---|---|

31–32 | I hadn't seen that trick to compute the rounding error, do you have a reference? |

libc/src/math/generic/hypotf.cpp | ||
---|---|---|

31–32 | If I understand correctly, |

Thanks Paul for finding the error! For this example, the sum square is actually exact, but the rounding errors still need to be updated in order to avoid double rounding errors. By removing the check (err != 0) in line 36, I got it back correctly.

libc/src/math/generic/hypotf.cpp | ||
---|---|---|

31–32 | Thanks Paul and Vincent for finding the issue with this! Actually I was trying to implement the Fast2Sum version since we are in radix-2. Moreover, since both max(xSq, ySq) <= sumSq <= sqrt(2) max(xSq, ySq) and so sumSq - xSq = 2^(-23) and sumSq - ySq = 1 + 2^(-22) = sumSq And hence: (sumSq - xSq) - ySq = 2^-24 (sumSq - ySq) - xSq = 2^-23 ((sumSq - xSq) - ySq) + ((sumSq - ySq) - xSq) != 2^(-24) which is the rounding error. I change it back to the normal Fast2Sum implementation with branching now, so the rounding error computation should be correct. |

I'm still running semi-exhaustive tests, it takes some time. I wonder whether a full exhaustive test is possible, by comparing the LLVM implementation with the code from Alexei at https://core-math.gitlabpages.inria.fr/. On a 64-core machine (Intel Xeon Gold 6130 @ 2.10GHz), it takes 4.6s to check 2^33 pairs (x,y). If one tests only positive x,y and x>=y, as exhaustive comparison would have to check 2^61 pairs for each rounding mode, which would take less than 1.5 month using 10000 such machines. This would not be a proof, but the probability that both codes are wrong for the same inputs and give exactly the same wrong answer is quite small.

Or if you don't mind to be slower, you can compare it with the shift-and-add algorithm implemented in the LLVM-libc that this one is trying to speed up, since that can be proved mathematically to be correct.

Another option is that since the idea of this algorithm is scalable, we can have a version of it for half precision (essentially just change the data types and masks/constants), where we it can be tested exhaustively? That should at least increase the confidence with single, and maybe double precision later?

Add a quick return when the exponent difference is at least 2 more than the mantissa length.

the version of last Friday is fine for me: I did run exhaustive tests for 2^23 <= y < 2^24, and 2^(23+k) <= x < 2^(24+k) for 0 <= k <= 13.

However since it changed in the meantime, I don't have resources any more to review the new version.

Thanks Paul for checking! The new version only changes when `exponent of x >= exponent of y + 25`, so your verification should still hold.

Add exhaustive testing to test for inputs (x, y) with 2^23 <= x < 2^24, 2^(23 + 14) <= y < 2^(23 + 25).

@zimmermann6 : I've finished testing the remaining pairs: (x, y) with 2^23 <= y < 2^24, and 2^(23+k) <= x < 2^(24+k) for 14 <= k <= 24.