The single-constant algorithm produces infinities on a lot of denormal values. The precision of the two-constant algorithm is actually sufficient across the range of denormals. We will switch to that algorithm for now to avoid the infinities on denormals. In the future, we will re-evaluate the algorithm to find the optimal one for PowerPC.
Example:
$ cat a.c #include <stdio.h> #include <math.h> float __attribute__((noinline)) test(float f) { return sqrtf(f); } int main(void) { return printf("sqrt(0.49e-43): %g\n", test(0.49e-43)); } $ clang -Ofast a.c $ ./a.out sqrt(0.49e-43): -inf
Desired output (and output with this patch applied):
$ ./a.out sqrt(0.49e-43): -inf
We have also run this through a reasonable approximation of the gamut of tests (1,000,000 tests per exponent over the full single-precision range vs. the precise HW instruction). Here are the results from this test (courtesy of @renenkel):
0 ulps: 72 % 1 ulps: 27 2 ulps: 0.032 3 ulps: 0 >3 ulps: 0.35 max error = 2 ulps over full range except returns NaN for +Inf
nit:
I think it is "Newton-Raphson" not "Newton-Rhapson".