Details
Diff Detail
- Repository
- rG LLVM Github Monorepo
Event Timeline
If this change makes sense, can we just switch comparisons with ulp error of 0.5 to comparisons with ulp error of 0.0?
libc/utils/MPFRWrapper/MPFRUtils.cpp | ||
---|---|---|
301 | If we change to | input - float(mpfrValue) | / eps(input), we more or less calculating | input_bitfield - float(mpfrValue)_bitfield |, so in my opinion it's better to use
A concrete example is that ulp( input = float(1 - 2^(-24)), float(mpfrValue) = float(1) ) |
libc/utils/MPFRWrapper/MPFRUtils.cpp | ||
---|---|---|
301 | It seems like the problem you describe will occur irrespective of this change, no? |float(mpfrValue) - input|/eps(float(mpfrValue)) My reasoning is, the error should be relative to what we think is the correct/more accurate answer. Since we treat float(mpfrValue) to be the more accurate one, we should be calculating the error wrt its eps. Anything wrong with this reasoning? The point you raise about ulp error of 2 vs 1 is very valid. We already it special case it one way. Should we special case the other way around as well? And may be that is why you are saying we should generalize these special cases with a symmetric algorithm to calculate the ulp error? |
libc/utils/MPFRWrapper/MPFRUtils.cpp | ||
---|---|---|
301 | Yes, it is the same problem with / eps(float(mpfrValue)), with input is 2^n and float(mpfrValue) is 2^n - (eps(x)/2). So the only reasonable way that will return 1 ulp for both cases is to / min (eps(input), eps(float(mpfrValue)). And actually these edge cases are the only time / min(eps, eps) gives different answers than / eps(input) or eps(float(mpfrValue). |
libc/utils/MPFRWrapper/MPFRUtils.cpp | ||
---|---|---|
301 |
I want to discuss this example more. Assuming we are dealing with single precision floating point numbers and that mpfrResult = float(actualMpfrResult): input = float(1 - 2 ^ (-24)) mpfrResult = float(1); eps(mpfrResult) = 2 ^ (-23) eps(input) = 2 ^ (-24) |mpfrResult - input| = 2 ^ (-24) So, if we calculate ulp error wrt the input eps, then: ulp = (2 ^ (-24)) / (2 ^ (-24)) = 1 If we calculate ulp error wrt eps of mpfrResult, then: ulp = (2 ^ (-24)) / (2 ^ (-23)) = 1/2 Lets consider the other way around example: input = float(1 + 2 ^ (-23)) mpfrResult = float(1); eps(mpfrResult) = 2 ^ (-23) eps(input) = 2 ^ (-23) |mpfrResult - input| = 2 ^ (-23) And so the ulp error will be 1 in whichever way we calculate. Now, consider another example: input = float(1 - 2 ^ (-24)) mpfrResult = float(1 + 2 ^ (-23)); eps(mpfrResult) = 2 ^ (-23) eps(input) = 2 ^ (-24) |mpfrResult - input| = 2 ^ (-23) + 2 ^ (-24) So, if we calculate ulp error wrt the input eps, then: ulp = (2 ^ (-24) + 2 ^ (-23)) / (2 ^ (-24)) = 3 If we calculate ulp error wrt eps of mpfrResult, then: ulp = (2 ^ (-24) + 2 ^ (-23)) / (2 ^ (-23)) = 1.5 I think our goal should be to treat bit distances on either sides of 2^N uniformly where N = max(exp(input), exp(mpfrResult)). Then: N = max(exp(input), exp(mpfrResult)) eps_input = 2^(exp(input) - 23) eps_mpfr = 2^(exp(mpfrResult - 23) ulp = |2^N - input|/eps_input+ |2^N - mpfrResult|/eps_mpfr I think this formulation not only has the symmetry property, but also corresponds to the bit distances for close enough results (which do not differ in the exponent by more than 1). For results farther apart, I don't think it matters. WDYT? |
libc/utils/MPFRWrapper/MPFRUtils.cpp | ||
---|---|---|
301 | When exp(input) == exp(mpfrResult), the formula can be simply: ulp = |input - mpfrResult|/eps(input) This should also take care of numbers on either side of 0. |
libc/utils/MPFRWrapper/MPFRUtils.cpp | ||
---|---|---|
301 | In either case, we won't have to worry about 0, because eps(0) == eps( smallest non-zero denormal number). | |
301 | One main problem with using max( eps(input), float(eps(mpfrResult)) ) is that it will give us a false-positive when: |
libc/utils/MPFRWrapper/MPFRUtils.cpp | ||
---|---|---|
301 |
So, lets do ulp error calculation for this example using the scheme I proposed: eps_input = 2 ^ (-23) eps_mpfr = 2 ^ (-24) N = max(exp(input), exp(mpfrResult)) = 0 ulp = |2^N - input|/eps_input + |2^N - mpfrResult|/eps_mpfr = 0 + (2 ^ (-23))/(2 ^ (-24) = 2 So, the ulp error calculated is as expected. The solution I am proposing is NOT to use max(eps(input), eps(mpfrResult)). May be you are misreading the step which calculates N? | |
301 |
What I meant to say is that, for all cases in which input and mpfrResult differ in sign, we can use the simple formula: ulp = |input - mpfrResult|/eps(input) This will be correct for close enough numbers (numbers which have the same exponent). |
libc/utils/MPFRWrapper/MPFRUtils.cpp | ||||||||||||||||||||||||||
---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
301 | Sorry for the confusion. So let summary as following: consider 4 options: ulp_1 = | input - float(mpfrResult) | / eps_input ulp_2 = | input - float(mpfrResult) | / eps_mpfrResult ulp_3 = | input - float(mpfrResult) | / min( eps_input, eps_mpfrResult ) ulp_4 = | input - float(mpfrResult) | / max( eps_input, eps_mpfrResult ) When eps_input == eps(mpfrResult), all 4 ulp functions will return the same answer, so it doesn't matter which one to use in this case. On the other hand, on the edge cases:
So if we using eps_input (ulp_1), we will risk accepting (1 approximating 1 - 2^(-23) with 1 bit of accuracy) and with eps_mpfrResult, we will risk accepting (1 - 2^(-23) approximating 1 with 1 bit of accuracy). So I think using ulp_3 overall is the correct one to use if the goal is to have at most 1 bit difference compared to mpfr results. |
libc/utils/MPFRWrapper/MPFRUtils.cpp | |||||||||||||||||||||||||||||||||||||||||||||||
---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
301 | Let me define a ulp_5 as follows: ulp_5 = |2^N - input| / eps(input) + |2^N - mpfrResult| / eps(mpfrResult) where: N = max(exponent(input), exponent(mpfrResult)) And now lets add a couple of more rows and a column to the table you have:
So, the point I am trying to make is that ulp_5 captures the bit distance better than the other definitions of ulp error. For example, ulp_3 is overestimating the bit distance. |
libc/utils/MPFRWrapper/MPFRUtils.cpp | ||
---|---|---|
301 | ulp_5 will work well when 2^N is between input and mpfrResult, but it will return wrong answer when 2^N < min( input, mpfrResult ). |
libc/utils/MPFRWrapper/MPFRUtils.cpp | ||
---|---|---|
301 | Yes. That is why, in a separate comment above, I said that if exponent(input) == exponent(mpfrResult), then ulp error should be calculated as: ulp = |input - mpfrResult|/eps(input) I have also said that, to keep it simple, we can apply the same formula when input and mpfrResult differ in sign. That said, I think we both are now talking about the same thing. What I want to understand next is, how is this discussion related to change being attempted in this patch? As in, can the change to the ULP error formula be done in a separate patch? IIUC, what you are trying to point out is, with mpfrResult rounded to the target floating point format as done in this change, we can now make ULP error match the bit distance better? |
libc/utils/MPFRWrapper/MPFRUtils.cpp | ||
---|---|---|
301 | Yes, updating the ULP error to match the bit distance better with explanation comments in a followup patch SGTM. |
If we change to | input - float(mpfrValue) | / eps(input), we more or less calculating | input_bitfield - float(mpfrValue)_bitfield |, so in my opinion it's better to use
min (eps(input), eps(float(mpfrValue))), since that would avoid the case where float(mpfrValue) is 2^n and input is 2^n - (eps(x)/2) which is representable:
A concrete example is that ulp( input = float(1 - 2^(-24)), float(mpfrValue) = float(1) )
Moreover, another advantage of using min (eps, eps) is that the ulp function will then be symmetric: ulp(a, b) = ulp(b, a).