This is an archive of the discontinued LLVM Phabricator instance.

libc/utils/MPFRWrapper/MPFRUtils.cpp
301	If we change to \| input - float(mpfrValue) \| / eps(input), we more or less calculating \| input_bitfield - float(mpfrValue)_bitfield \|, so in my opinion it's better to use min (eps(input), eps(float(mpfrValue))), since that would avoid the case where float(mpfrValue) is 2^n and input is 2^n - (eps(x)/2) which is representable: if we use the eps(input), the calculated ulp will be 2, if we use min ( eps(input), eps(float(mpfrValue)) ), the calculated ulp will be 1. A concrete example is that ulp( input = float(1 - 2^(-24)), float(mpfrValue) = float(1) ) Moreover, another advantage of using min (eps, eps) is that the ulp function will then be symmetric: ulp(a, b) = ulp(b, a).

sivachandra added inline comments.Jun 21 2021, 10:27 PM

libc/utils/MPFRWrapper/MPFRUtils.cpp
301	It seems like the problem you describe will occur irrespective of this change, no? Also, why should we want `ulp(b, a) = ulp(a, b)`? Thinking of it now, it seems like we should define `ulp` error to be: \|float(mpfrValue) - input\|/eps(float(mpfrValue)) My reasoning is, the error should be relative to what we think is the correct/more accurate answer. Since we treat `float(mpfrValue)` to be the more accurate one, we should be calculating the error wrt its `eps`. Anything wrong with this reasoning? The point you raise about ulp error of 2 vs 1 is very valid. We already it special case it one way. Should we special case the other way around as well? And may be that is why you are saying we should generalize these special cases with a symmetric algorithm to calculate the ulp error?

lntue added inline comments.Jun 22 2021, 7:37 AM

libc/utils/MPFRWrapper/MPFRUtils.cpp
301	Yes, it is the same problem with / eps(float(mpfrValue)), with input is 2^n and float(mpfrValue) is 2^n - (eps(x)/2). So the only reasonable way that will return 1 ulp for both cases is to / min (eps(input), eps(float(mpfrValue)). And actually these edge cases are the only time / min(eps, eps) gives different answers than / eps(input) or eps(float(mpfrValue).

sivachandra added inline comments.Jun 22 2021, 8:28 PM

libc/utils/MPFRWrapper/MPFRUtils.cpp
301	If we change to \| input - float(mpfrValue) \| / eps(input), we more or less calculating \| input_bitfield - float(mpfrValue)_bitfield \|, so in my opinion it's better to use min (eps(input), eps(float(mpfrValue))), since that would avoid the case where float(mpfrValue) is 2^n and input is 2^n - (eps(x)/2) which is representable: if we use the eps(input), the calculated ulp will be 2, if we use min ( eps(input), eps(float(mpfrValue)) ), the calculated ulp will be 1. A concrete example is that ulp( input = float(1 - 2^(-24)), float(mpfrValue) = float(1) ) I want to discuss this example more. Assuming we are dealing with single precision floating point numbers and that `mpfrResult = float(actualMpfrResult)`: input = float(1 - 2 ^ (-24)) mpfrResult = float(1); eps(mpfrResult) = 2 ^ (-23) eps(input) = 2 ^ (-24) \|mpfrResult - input\| = 2 ^ (-24) So, if we calculate ulp error wrt the input `eps`, then: ulp = (2 ^ (-24)) / (2 ^ (-24)) = 1 If we calculate ulp error wrt `eps` of `mpfrResult`, then: ulp = (2 ^ (-24)) / (2 ^ (-23)) = 1/2 Lets consider the other way around example: input = float(1 + 2 ^ (-23)) mpfrResult = float(1); eps(mpfrResult) = 2 ^ (-23) eps(input) = 2 ^ (-23) \|mpfrResult - input\| = 2 ^ (-23) And so the ulp error will be 1 in whichever way we calculate. Now, consider another example: input = float(1 - 2 ^ (-24)) mpfrResult = float(1 + 2 ^ (-23)); eps(mpfrResult) = 2 ^ (-23) eps(input) = 2 ^ (-24) \|mpfrResult - input\| = 2 ^ (-23) + 2 ^ (-24) So, if we calculate ulp error wrt the input `eps`, then: ulp = (2 ^ (-24) + 2 ^ (-23)) / (2 ^ (-24)) = 3 If we calculate ulp error wrt `eps` of `mpfrResult`, then: ulp = (2 ^ (-24) + 2 ^ (-23)) / (2 ^ (-23)) = 1.5 I think our goal should be to treat bit distances on either sides of `2^N` uniformly where `N = max(exp(input), exp(mpfrResult))`. Then: N = max(exp(input), exp(mpfrResult)) eps_input = 2^(exp(input) - 23) eps_mpfr = 2^(exp(mpfrResult - 23) ulp = \|2^N - input\|/eps_input+ \|2^N - mpfrResult\|/eps_mpfr I think this formulation not only has the symmetry property, but also corresponds to the bit distances for close enough results (which do not differ in the exponent by more than 1). For results farther apart, I don't think it matters. WDYT?

sivachandra added inline comments.Jun 22 2021, 8:52 PM

libc/utils/MPFRWrapper/MPFRUtils.cpp
301	When `exp(input) == exp(mpfrResult)`, the formula can be simply: ulp = \|input - mpfrResult\|/eps(input) This should also take care of numbers on either side of 0.

lntue added inline comments.Jun 23 2021, 5:49 AM

libc/utils/MPFRWrapper/MPFRUtils.cpp
301	In either case, we won't have to worry about 0, because eps(0) == eps( smallest non-zero denormal number).
301	One main problem with using max( eps(input), float(eps(mpfrResult)) ) is that it will give us a false-positive when: input = float(1) and float(mpfrResult) = float(1 - 2^(-23)) Their representation differ by last 2 bits, but ulp calculation will return 1 if we use max eps.

sivachandra added inline comments.Jun 23 2021, 9:25 AM

libc/utils/MPFRWrapper/MPFRUtils.cpp
301	One main problem with using max( eps(input), float(eps(mpfrResult)) ) is that it will give us a false-positive when: input = float(1) and float(mpfrResult) = float(1 - 2^(-23)) Their representation differ by last 2 bits, but ulp calculation will return 1 if we use max eps. So, lets do ulp error calculation for this example using the scheme I proposed: eps_input = 2 ^ (-23) eps_mpfr = 2 ^ (-24) N = max(exp(input), exp(mpfrResult)) = 0 ulp = \|2^N - input\|/eps_input + \|2^N - mpfrResult\|/eps_mpfr = 0 + (2 ^ (-23))/(2 ^ (-24) = 2 So, the ulp error calculated is as expected. The solution I am proposing is NOT to use `max(eps(input), eps(mpfrResult))`. May be you are misreading the step which calculates `N`?
301	In either case, we won't have to worry about 0, because eps(0) == eps( smallest non-zero denormal number). What I meant to say is that, for all cases in which `input` and `mpfrResult` differ in sign, we can use the simple formula: ulp = \|input - mpfrResult\|/eps(input) This will be correct for close enough numbers (numbers which have the same exponent).

lntue added inline comments.Jun 23 2021, 9:48 AM

libc/utils/MPFRWrapper/MPFRUtils.cpp

301

Sorry for the confusion. So let summary as following: consider 4 options:

ulp_1 = | input - float(mpfrResult) | / eps_input
ulp_2 = | input - float(mpfrResult) | / eps_mpfrResult
ulp_3 = | input - float(mpfrResult) | / min( eps_input, eps_mpfrResult )
ulp_4 = | input - float(mpfrResult) | / max( eps_input, eps_mpfrResult )

When eps_input == eps(mpfrResult), all 4 ulp functions will return the same answer, so it doesn't matter which one to use in this case.

On the other hand, on the edge cases:

input	float(mpfrResult)	eps_input	eps_mpfrResult	ulp_1	ulp_2	ulp_3	ulp_4
1	1 - 2^(-23)	2^(-23)	2^(-24)	1	2	2	1
1 - 2^(-23)	1	2^(-24)	2^(-23)	2	1	2	1

So if we using eps_input (ulp_1), we will risk accepting (1 approximating 1 - 2^(-23) with 1 bit of accuracy) and with eps_mpfrResult, we will risk accepting (1 - 2^(-23) approximating 1 with 1 bit of accuracy).

So I think using ulp_3 overall is the correct one to use if the goal is to have at most 1 bit difference compared to mpfr results.

sivachandra added inline comments.Jun 23 2021, 10:15 AM

libc/utils/MPFRWrapper/MPFRUtils.cpp

301

Let me define a ulp_5 as follows:

ulp_5 = |2^N - input| / eps(input) + |2^N - mpfrResult| / eps(mpfrResult)

where:

N = max(exponent(input), exponent(mpfrResult))

And now lets add a couple of more rows and a column to the table you have:

input	float(mpfrResult)	eps_input	eps_mpfrResult	ulp_1	ulp_2	ulp_3	ulp_4	ulp_5
1	1 - 2^(-23)	2^(-23)	2^(-24)	1	2	2	1	2
1 - 2^(-23)	1	2^(-24)	2^(-23)	2	1	2	1	2
1 - 2^(-24)	1 + 2 ^ (-23)	2^(-24)	2^(-23)	3	1.5	3	1.5	2
1 + 2^(-23)	1 - 2 ^ (-24)	2^(-23)	2^(-24)	1.5	3	3	1.5	2

So, the point I am trying to make is that ulp_5 captures the bit distance better than the other definitions of ulp error. For example, ulp_3 is overestimating the bit distance.

lntue added inline comments.Jun 23 2021, 10:42 AM

libc/utils/MPFRWrapper/MPFRUtils.cpp
301	ulp_5 will work well when 2^N is between input and mpfrResult, but it will return wrong answer when 2^N < min( input, mpfrResult ). For example, if input == 1 + 2^(-23) == mpfrResult, ulp_5 will return 2. ulp_3 does have overestimating bit distance, but as I mentioned, it does correctly capture whether the input and float(mpfrResult) are within 1-bit of each other.

sivachandra added inline comments.Jun 23 2021, 10:57 AM

libc/utils/MPFRWrapper/MPFRUtils.cpp
301	Yes. That is why, in a separate comment above, I said that if `exponent(input) == exponent(mpfrResult)`, then ulp error should be calculated as: ulp = \|input - mpfrResult\|/eps(input) I have also said that, to keep it simple, we can apply the same formula when `input` and `mpfrResult` differ in sign. That said, I think we both are now talking about the same thing. What I want to understand next is, how is this discussion related to change being attempted in this patch? As in, can the change to the ULP error formula be done in a separate patch? IIUC, what you are trying to point out is, with `mpfrResult` rounded to the target floating point format as done in this change, we can now make ULP error match the bit distance better?

lntue accepted this revision.Jun 23 2021, 11:18 AM

lntue added inline comments.

libc/utils/MPFRWrapper/MPFRUtils.cpp
301	Yes, updating the ULP error to match the bit distance better with explanation comments in a followup patch SGTM.

This revision is now accepted and ready to land.Jun 23 2021, 11:18 AM

This revision was landed with ongoing or failed builds.Jun 23 2021, 1:30 PM

Closed by commit rGd5700bb69449: [libc] Calculate ulp error after rounding MPFR result to the result type. (authored by sivachandra). · Explain Why

This revision was automatically updated to reflect the committed changes.

sivachandra added a commit: rGd5700bb69449: [libc] Calculate ulp error after rounding MPFR result to the result type..

sivachandra mentioned this in D105334: [libc] Make ULP error reflect the bit distance more closely..Jul 1 2021, 11:50 PM

Revision Contents

Path

Size

libc/

utils/

MPFRWrapper/

MPFRUtils.cpp

14 lines

Diff 353271

libc/utils/MPFRWrapper/MPFRUtils.cpp

Show All 18 Lines

#ifdef CUSTOM_MPFR_INCLUDER #ifdef CUSTOM_MPFR_INCLUDER

// Some downstream repos are monoliths carrying MPFR sources in their third // Some downstream repos are monoliths carrying MPFR sources in their third

// party directory. In such repos, including the MPFR header as // party directory. In such repos, including the MPFR header as

// `#include <mpfr.h>` is either disallowed or not possible. If that is the // `#include <mpfr.h>` is either disallowed or not possible. If that is the

// case, a file named `CustomMPFRIncluder.h` should be added through which the // case, a file named `CustomMPFRIncluder.h` should be added through which the

// MPFR header can be included in manner allowed in that repo. // MPFR header can be included in manner allowed in that repo.

#include "CustomMPFRIncluder.h" #include "CustomMPFRIncluder.h"

#else #else

#include <mpfr.h> #include <mpfr.h>

Lint: Pre-merge checks

clang-tidy: error: 'mpfr.h' file not found [clang-diagnostic-error]
not useful

Lint: Pre-merge checks: clang-tidy: error: 'mpfr.h' file not found [clang-diagnostic-error] [[https://github.

#endif #endif

template <typename T> using FPBits = __llvm_libc::fputil::FPBits<T>; template <typename T> using FPBits = __llvm_libc::fputil::FPBits<T>;

namespace __llvm_libc { namespace __llvm_libc {

namespace testing { namespace testing {

namespace mpfr { namespace mpfr {

▲ Show 20 Lines • Show All 229 Lines • ▼ Show 20 Lines public:

// argument) are reasonable close, and we will provide an upper bound // argument) are reasonable close, and we will provide an upper bound

// of ULP value for testing. Morever, most of the fractional parts of // of ULP value for testing. Morever, most of the fractional parts of

// ULP value do not matter much, so using double as the return type // ULP value do not matter much, so using double as the return type

// should be good enough. // should be good enough.

template <typename T> template <typename T>

cpp::EnableIfType<cpp::IsFloatingPointType<T>::Value, double> ulp(T input) { cpp::EnableIfType<cpp::IsFloatingPointType<T>::Value, double> ulp(T input) {

fputil::FPBits<T> bits(input); fputil::FPBits<T> bits(input);

MPFRNumber mpfrInput(input); MPFRNumber mpfrInput(input);

// When calculating error, we first round this number to the floating

// abs(value - input) // point type T and calculate the ulp error against this rounded number.

mpfr_sub(mpfrInput.value, value, mpfrInput.value, MPFR_RNDN); // The input is always either exact or rounded. So, we if compare

// with this number directly, we can end up with a large ULP error.

MPFRNumber thisRoundedToFloat(as<T>());

Lint: Pre-merge checks

clang-tidy: warning: invalid case style for variable 'thisRoundedToFloat' [readability-identifier-naming]
not useful

Lint: Pre-merge checks: clang-tidy: warning: invalid case style for variable 'thisRoundedToFloat' [readability…

// abs(thisRoundedToFloat - input)

mpfr_sub(mpfrInput.value, thisRoundedToFloat.value, mpfrInput.value,

MPFR_RNDN);

mpfr_abs(mpfrInput.value, mpfrInput.value, MPFR_RNDN); mpfr_abs(mpfrInput.value, mpfrInput.value, MPFR_RNDN);

// get eps(input) // get eps(input)

int epsExponent = bits.encoding.exponent - fputil::FPBits<T>::exponentBias - int epsExponent = bits.encoding.exponent - fputil::FPBits<T>::exponentBias -

fputil::MantissaWidth<T>::value; fputil::MantissaWidth<T>::value;

if (bits.encoding.exponent == 0) { if (bits.encoding.exponent == 0) {

// correcting denormal exponent // correcting denormal exponent

++epsExponent; ++epsExponent;

} else if ((bits.encoding.mantissa == 0) && (bits.encoding.exponent > 1) && } else if ((bits.encoding.mantissa == 0) && (bits.encoding.exponent > 1) &&

mpfr_less_p(value, mpfrInput.value)) { mpfr_less_p(thisRoundedToFloat.value, mpfrInput.value)) {

// when the input is exactly 2^n, distance (epsilon) between the input // when the input is exactly 2^n, distance (epsilon) between the input

// and the next floating point number is different from the distance to // and the next floating point number is different from the distance to

// the previous floating point number. So in that case, if the correct // the previous floating point number. So in that case, if the correct

// value from MPFR is smaller than the input, we use the smaller epsilon // value from MPFR is smaller than the input, we use the smaller epsilon

--epsExponent; --epsExponent;

} }

// Since eps(value) is of the form 2^e, instead of dividing such number, // Since eps(value) is of the form 2^e, instead of dividing such number,

// we multiply by its inverse 2^{-e}. // we multiply by its inverse 2^{-e}.

mpfr_mul_2si(mpfrInput.value, mpfrInput.value, -epsExponent, MPFR_RNDN); mpfr_mul_2si(mpfrInput.value, mpfrInput.value, -epsExponent, MPFR_RNDN);

lntueUnsubmitted

Not Done

If we change to | input - float(mpfrValue) | / eps(input), we more or less calculating | input_bitfield - float(mpfrValue)_bitfield |, so in my opinion it's better to use
min (eps(input), eps(float(mpfrValue))), since that would avoid the case where float(mpfrValue) is 2^n and input is 2^n - (eps(x)/2) which is representable:

if we use the eps(input), the calculated ulp will be 2,
if we use min ( eps(input), eps(float(mpfrValue)) ), the calculated ulp will be 1.

A concrete example is that ulp( input = float(1 - 2^(-24)), float(mpfrValue) = float(1) )
Moreover, another advantage of using min (eps, eps) is that the ulp function will then be symmetric: ulp(a, b) = ulp(b, a).

lntue: If we change to | input - float(mpfrValue) | / eps(input), we more or less calculating |…

sivachandraAuthorUnsubmitted

Done

It seems like the problem you describe will occur irrespective of this change, no?
Also, why should we want ulp(b, a) = ulp(a, b)? Thinking of it now, it seems like we should define ulp error to be:

|float(mpfrValue) - input|/eps(float(mpfrValue))

My reasoning is, the error should be relative to what we think is the correct/more accurate answer. Since we treat float(mpfrValue) to be the more accurate one, we should be calculating the error wrt its eps. Anything wrong with this reasoning? The point you raise about ulp error of 2 vs 1 is very valid. We already it special case it one way. Should we special case the other way around as well? And may be that is why you are saying we should generalize these special cases with a symmetric algorithm to calculate the ulp error?

sivachandra: It seems like the problem you describe will occur irrespective of this change, no? Also, why…

sivachandraAuthorUnsubmitted

Done

If we change to | input - float(mpfrValue) | / eps(input), we more or less calculating | input_bitfield - float(mpfrValue)_bitfield |, so in my opinion it's better to use
min (eps(input), eps(float(mpfrValue))), since that would avoid the case where float(mpfrValue) is 2^n and input is 2^n - (eps(x)/2) which is representable:

if we use the eps(input), the calculated ulp will be 2,

if we use min ( eps(input), eps(float(mpfrValue)) ), the calculated ulp will be 1.

A concrete example is that ulp( input = float(1 - 2^(-24)), float(mpfrValue) = float(1) )

I want to discuss this example more. Assuming we are dealing with single precision floating point numbers and that mpfrResult = float(actualMpfrResult):

input = float(1 - 2 ^ (-24))
mpfrResult = float(1);
eps(mpfrResult) = 2 ^ (-23)
eps(input) = 2 ^ (-24)

|mpfrResult - input| = 2 ^ (-24)

So, if we calculate ulp error wrt the input eps, then:

ulp = (2 ^ (-24)) / (2 ^ (-24)) = 1

If we calculate ulp error wrt eps of mpfrResult, then:

ulp = (2 ^ (-24)) / (2 ^ (-23)) = 1/2

Lets consider the other way around example:

input = float(1 + 2 ^ (-23))
mpfrResult = float(1);
eps(mpfrResult) = 2 ^ (-23)
eps(input) = 2 ^ (-23)

|mpfrResult - input| = 2 ^ (-23)

And so the ulp error will be 1 in whichever way we calculate.

Now, consider another example:

input = float(1 - 2 ^ (-24))
mpfrResult = float(1 + 2 ^ (-23));
eps(mpfrResult) = 2 ^ (-23)
eps(input) = 2 ^ (-24)

|mpfrResult - input| = 2 ^ (-23) + 2 ^ (-24)

So, if we calculate ulp error wrt the input eps, then:

ulp = (2 ^ (-24) + 2 ^ (-23)) / (2 ^ (-24)) = 3

If we calculate ulp error wrt eps of mpfrResult, then:

ulp = (2 ^ (-24) + 2 ^ (-23)) / (2 ^ (-23)) = 1.5

I think our goal should be to treat bit distances on either sides of 2^N uniformly where N = max(exp(input), exp(mpfrResult)). Then:

N = max(exp(input), exp(mpfrResult))
eps_input = 2^(exp(input) - 23)
eps_mpfr = 2^(exp(mpfrResult - 23)
ulp = |2^N - input|/eps_input+ |2^N - mpfrResult|/eps_mpfr

I think this formulation not only has the symmetry property, but also corresponds to the bit distances for close enough results (which do not differ in the exponent by more than 1). For results farther apart, I don't think it matters. WDYT?

sivachandra: > If we change to | input - float(mpfrValue) | / eps(input), we more or less calculating |…

lntueUnsubmitted

Not Done

Yes, it is the same problem with / eps(float(mpfrValue)), with input is 2^n and float(mpfrValue) is 2^n - (eps(x)/2). So the only reasonable way that will return 1 ulp for both cases is to / min (eps(input), eps(float(mpfrValue)). And actually these edge cases are the only time / min(eps, eps) gives different answers than / eps(input) or eps(float(mpfrValue).

lntue: Yes, it is the same problem with / eps(float(mpfrValue)), with input is 2^n and float…

sivachandraAuthorUnsubmitted

Done

When exp(input) == exp(mpfrResult), the formula can be simply:

ulp = |input - mpfrResult|/eps(input)

This should also take care of numbers on either side of 0.

sivachandra: When `exp(input) == exp(mpfrResult)`, the formula can be simply: ``` ulp = |input…

lntueUnsubmitted

Not Done

One main problem with using max( eps(input), float(eps(mpfrResult)) ) is that it will give us a false-positive when:
input = float(1) and float(mpfrResult) = float(1 - 2^(-23))
Their representation differ by last 2 bits, but ulp calculation will return 1 if we use max eps.

lntue: One main problem with using max( eps(input), float(eps(mpfrResult)) ) is that it will give us a…

lntueUnsubmitted

Not Done

In either case, we won't have to worry about 0, because eps(0) == eps( smallest non-zero denormal number).

lntue: In either case, we won't have to worry about 0, because eps(0) == eps( smallest non-zero…

sivachandraAuthorUnsubmitted

Done

In either case, we won't have to worry about 0, because eps(0) == eps( smallest non-zero denormal number).

What I meant to say is that, for all cases in which input and mpfrResult differ in sign, we can use the simple formula:

ulp = |input - mpfrResult|/eps(input)

This will be correct for close enough numbers (numbers which have the same exponent).

sivachandra: > In either case, we won't have to worry about 0, because eps(0) == eps( smallest non-zero…

sivachandraAuthorUnsubmitted

Done

One main problem with using max( eps(input), float(eps(mpfrResult)) ) is that it will give us a false-positive when:
input = float(1) and float(mpfrResult) = float(1 - 2^(-23))
Their representation differ by last 2 bits, but ulp calculation will return 1 if we use max eps.

So, lets do ulp error calculation for this example using the scheme I proposed:

eps_input = 2 ^ (-23)
eps_mpfr = 2 ^ (-24)
N = max(exp(input), exp(mpfrResult)) = 0
ulp = |2^N - input|/eps_input + |2^N - mpfrResult|/eps_mpfr = 0 + (2 ^ (-23))/(2 ^ (-24) = 2

So, the ulp error calculated is as expected. The solution I am proposing is NOT to use max(eps(input), eps(mpfrResult)). May be you are misreading the step which calculates N?

sivachandra: > One main problem with using max( eps(input), float(eps(mpfrResult)) ) is that it will give us…

lntueUnsubmitted

Not Done

Sorry for the confusion. So let summary as following: consider 4 options:

ulp_1 = | input - float(mpfrResult) | / eps_input
ulp_2 = | input - float(mpfrResult) | / eps_mpfrResult
ulp_3 = | input - float(mpfrResult) | / min( eps_input, eps_mpfrResult )
ulp_4 = | input - float(mpfrResult) | / max( eps_input, eps_mpfrResult )

When eps_input == eps(mpfrResult), all 4 ulp functions will return the same answer, so it doesn't matter which one to use in this case.

On the other hand, on the edge cases:

input	float(mpfrResult)	eps_input	eps_mpfrResult	ulp_1	ulp_2	ulp_3	ulp_4
1	1 - 2^(-23)	2^(-23)	2^(-24)	1	2	2	1
1 - 2^(-23)	1	2^(-24)	2^(-23)	2	1	2	1

So I think using ulp_3 overall is the correct one to use if the goal is to have at most 1 bit difference compared to mpfr results.

lntue: Sorry for the confusion. So let summary as following: consider 4 options: ``` ulp_1 = | input…

sivachandraAuthorUnsubmitted

Done

Let me define a ulp_5 as follows:

ulp_5 = |2^N - input| / eps(input) + |2^N - mpfrResult| / eps(mpfrResult)

where:

N = max(exponent(input), exponent(mpfrResult))

And now lets add a couple of more rows and a column to the table you have:

input	float(mpfrResult)	eps_input	eps_mpfrResult	ulp_1	ulp_2	ulp_3	ulp_4	ulp_5
1	1 - 2^(-23)	2^(-23)	2^(-24)	1	2	2	1	2
1 - 2^(-23)	1	2^(-24)	2^(-23)	2	1	2	1	2
1 - 2^(-24)	1 + 2 ^ (-23)	2^(-24)	2^(-23)	3	1.5	3	1.5	2
1 + 2^(-23)	1 - 2 ^ (-24)	2^(-23)	2^(-24)	1.5	3	3	1.5	2

So, the point I am trying to make is that ulp_5 captures the bit distance better than the other definitions of ulp error. For example, ulp_3 is overestimating the bit distance.

sivachandra: Let me define a `ulp_5` as follows: ``` ulp_5 = |2^N - input| / eps(input) + |2^N…

lntueUnsubmitted

Not Done

ulp_5 will work well when 2^N is between input and mpfrResult, but it will return wrong answer when 2^N < min( input, mpfrResult ).
For example, if input == 1 + 2^(-23) == mpfrResult, ulp_5 will return 2. ulp_3 does have overestimating bit distance, but as I mentioned, it does correctly capture whether the input and float(mpfrResult) are within 1-bit of each other.

lntue: ulp_5 will work well when 2^N is between input and mpfrResult, but it will return wrong answer…

sivachandraAuthorUnsubmitted

Done

Yes. That is why, in a separate comment above, I said that if exponent(input) == exponent(mpfrResult), then ulp error should be calculated as:

ulp = |input - mpfrResult|/eps(input)

I have also said that, to keep it simple, we can apply the same formula when input and mpfrResult differ in sign.

That said, I think we both are now talking about the same thing. What I want to understand next is, how is this discussion related to change being attempted in this patch? As in, can the change to the ULP error formula be done in a separate patch? IIUC, what you are trying to point out is, with mpfrResult rounded to the target floating point format as done in this change, we can now make ULP error match the bit distance better?

sivachandra: Yes. That is why, in a separate comment above, I said that if `exponent(input) == exponent…

lntueUnsubmitted

Not Done

Yes, updating the ULP error to match the bit distance better with explanation comments in a followup patch SGTM.

lntue: Yes, updating the ULP error to match the bit distance better with explanation comments in a…

return mpfrInput.as<double>(); return mpfrInput.as<double>();

} }

}; };

namespace internal { namespace internal {

template <typename InputType> template <typename InputType>

▲ Show 20 Lines • Show All 445 Lines • Show Last 20 Lines

This is an archive of the discontinued LLVM Phabricator instance.

[libc] Calculate ulp error after rounding MPFR result to the result type.ClosedPublic

Details

Diff Detail

Event Timeline

Revision Contents

Diff 353271

libc/utils/MPFRWrapper/MPFRUtils.cpp

[libc] Calculate ulp error after rounding MPFR result to the result type.
ClosedPublic