# Details

# Diff Detail

- Repository
- rG LLVM Github Monorepo

### Event Timeline

If this change makes sense, can we just switch comparisons with ulp error of 0.5 to comparisons with ulp error of 0.0?

libc/utils/MPFRWrapper/MPFRUtils.cpp | ||
---|---|---|

301 | If we change to | input - float(mpfrValue) | / eps(input), we more or less calculating | input_bitfield - float(mpfrValue)_bitfield |, so in my opinion it's better to use - if we use the eps(input), the calculated ulp will be 2,
- if we use min ( eps(input), eps(float(mpfrValue)) ), the calculated ulp will be 1.
A concrete example is that ulp( input = float(1 - 2^(-24)), float(mpfrValue) = float(1) ) |

libc/utils/MPFRWrapper/MPFRUtils.cpp | ||
---|---|---|

301 | It seems like the problem you describe will occur irrespective of this change, no? |float(mpfrValue) - input|/eps(float(mpfrValue)) My reasoning is, the error should be relative to what we think is the correct/more accurate answer. Since we treat |

libc/utils/MPFRWrapper/MPFRUtils.cpp | ||
---|---|---|

301 | Yes, it is the same problem with / eps(float(mpfrValue)), with input is 2^n and float(mpfrValue) is 2^n - (eps(x)/2). So the only reasonable way that will return 1 ulp for both cases is to / min (eps(input), eps(float(mpfrValue)). And actually these edge cases are the only time / min(eps, eps) gives different answers than / eps(input) or eps(float(mpfrValue). |

libc/utils/MPFRWrapper/MPFRUtils.cpp | ||
---|---|---|

301 |
I want to discuss this example more. Assuming we are dealing with single precision floating point numbers and that input = float(1 - 2 ^ (-24)) mpfrResult = float(1); eps(mpfrResult) = 2 ^ (-23) eps(input) = 2 ^ (-24) |mpfrResult - input| = 2 ^ (-24) So, if we calculate ulp error wrt the input ulp = (2 ^ (-24)) / (2 ^ (-24)) = 1 If we calculate ulp error wrt ulp = (2 ^ (-24)) / (2 ^ (-23)) = 1/2 Lets consider the other way around example: input = float(1 + 2 ^ (-23)) mpfrResult = float(1); eps(mpfrResult) = 2 ^ (-23) eps(input) = 2 ^ (-23) |mpfrResult - input| = 2 ^ (-23) And so the ulp error will be 1 in whichever way we calculate. Now, consider another example: input = float(1 - 2 ^ (-24)) mpfrResult = float(1 + 2 ^ (-23)); eps(mpfrResult) = 2 ^ (-23) eps(input) = 2 ^ (-24) |mpfrResult - input| = 2 ^ (-23) + 2 ^ (-24) So, if we calculate ulp error wrt the input ulp = (2 ^ (-24) + 2 ^ (-23)) / (2 ^ (-24)) = 3 If we calculate ulp error wrt ulp = (2 ^ (-24) + 2 ^ (-23)) / (2 ^ (-23)) = 1.5 I think our goal should be to treat bit distances on either sides of N = max(exp(input), exp(mpfrResult)) eps_input = 2^(exp(input) - 23) eps_mpfr = 2^(exp(mpfrResult - 23) ulp = |2^N - input|/eps_input+ |2^N - mpfrResult|/eps_mpfr I think this formulation not only has the symmetry property, but also corresponds to the bit distances for close enough results (which do not differ in the exponent by more than 1). For results farther apart, I don't think it matters. WDYT? |

libc/utils/MPFRWrapper/MPFRUtils.cpp | ||
---|---|---|

301 | When ulp = |input - mpfrResult|/eps(input) This should also take care of numbers on either side of 0. |

libc/utils/MPFRWrapper/MPFRUtils.cpp | ||
---|---|---|

301 | In either case, we won't have to worry about 0, because eps(0) == eps( smallest non-zero denormal number). | |

301 | One main problem with using max( eps(input), float(eps(mpfrResult)) ) is that it will give us a false-positive when: |

libc/utils/MPFRWrapper/MPFRUtils.cpp | ||
---|---|---|

301 |
So, lets do ulp error calculation for this example using the scheme I proposed: eps_input = 2 ^ (-23) eps_mpfr = 2 ^ (-24) N = max(exp(input), exp(mpfrResult)) = 0 ulp = |2^N - input|/eps_input + |2^N - mpfrResult|/eps_mpfr = 0 + (2 ^ (-23))/(2 ^ (-24) = 2 So, the ulp error calculated is as expected. The solution I am proposing is NOT to use | |

301 |
What I meant to say is that, for all cases in which ulp = |input - mpfrResult|/eps(input) This will be correct for close enough numbers (numbers which have the same exponent). |

libc/utils/MPFRWrapper/MPFRUtils.cpp | ||||||||||||||||||||||||||
---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|

301 | Sorry for the confusion. So let summary as following: consider 4 options: ulp_1 = | input - float(mpfrResult) | / eps_input ulp_2 = | input - float(mpfrResult) | / eps_mpfrResult ulp_3 = | input - float(mpfrResult) | / min( eps_input, eps_mpfrResult ) ulp_4 = | input - float(mpfrResult) | / max( eps_input, eps_mpfrResult ) When eps_input == eps(mpfrResult), all 4 ulp functions will return the same answer, so it doesn't matter which one to use in this case. On the other hand, on the edge cases:
So if we using eps_input (ulp_1), we will risk accepting (1 approximating 1 - 2^(-23) with 1 bit of accuracy) and with eps_mpfrResult, we will risk accepting (1 - 2^(-23) approximating 1 with 1 bit of accuracy). So I think using ulp_3 overall is the correct one to use if the goal is to have at most 1 bit difference compared to mpfr results. |

libc/utils/MPFRWrapper/MPFRUtils.cpp | |||||||||||||||||||||||||||||||||||||||||||||||
---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|

301 | Let me define a ulp_5 = |2^N - input| / eps(input) + |2^N - mpfrResult| / eps(mpfrResult) where: N = max(exponent(input), exponent(mpfrResult)) And now lets add a couple of more rows and a column to the table you have:
So, the point I am trying to make is that |

libc/utils/MPFRWrapper/MPFRUtils.cpp | ||
---|---|---|

301 | ulp_5 will work well when 2^N is between input and mpfrResult, but it will return wrong answer when 2^N < min( input, mpfrResult ). |

libc/utils/MPFRWrapper/MPFRUtils.cpp | ||
---|---|---|

301 | Yes. That is why, in a separate comment above, I said that if ulp = |input - mpfrResult|/eps(input) I have also said that, to keep it simple, we can apply the same formula when That said, I think we both are now talking about the same thing. What I want to understand next is, how is this discussion related to change being attempted in this patch? As in, can the change to the ULP error formula be done in a separate patch? IIUC, what you are trying to point out is, with |

libc/utils/MPFRWrapper/MPFRUtils.cpp | ||
---|---|---|

301 | Yes, updating the ULP error to match the bit distance better with explanation comments in a followup patch SGTM. |