This is an archive of the discontinued LLVM Phabricator instance.

[PowerPC] Use the two-constant NR algorithm for refining estimates
ClosedPublic

Authored by nemanjai on Mar 30 2019, 4:39 PM.

Details

Summary

The single-constant algorithm produces infinities on a lot of denormal values. The precision of the two-constant algorithm is actually sufficient across the range of denormals. We will switch to that algorithm for now to avoid the infinities on denormals. In the future, we will re-evaluate the algorithm to find the optimal one for PowerPC.

Example:

$ cat a.c 
#include <stdio.h>
#include <math.h>
float __attribute__((noinline)) test(float f) { return sqrtf(f); }
int main(void) {
  return printf("sqrt(0.49e-43): %g\n", test(0.49e-43));
}

$ clang -Ofast a.c
$ ./a.out 
sqrt(0.49e-43): -inf

Desired output (and output with this patch applied):

$ ./a.out 
sqrt(0.49e-43): -inf

We have also run this through a reasonable approximation of the gamut of tests (1,000,000 tests per exponent over the full single-precision range vs. the precise HW instruction). Here are the results from this test (courtesy of @renenkel):

 0 ulps:  72 %
  1 ulps:  27
  2 ulps:  0.032
  3 ulps:  0
 >3 ulps:  0.35

max error = 2 ulps over full range
except returns NaN for +Inf

Diff Detail

Repository
rL LLVM

Event Timeline

nemanjai created this revision.Mar 30 2019, 4:39 PM
Herald added a project: Restricted Project. · View Herald TranscriptMar 30 2019, 4:39 PM

I have a couple of nits. The main one being the test case for fma-mutate.ll.
I'm not sure if there is a way to use one const NR for just that one test.

lib/Target/PowerPC/PPC.td
140

nit:
I think it is "Newton-Raphson" not "Newton-Rhapson".

lib/Target/PowerPC/PPCISelLowering.cpp
10988

Same, nit:
Newton-Raphson

test/CodeGen/PowerPC/fma-mutate.ll
17

With this new patch this test no longer really tests all of the things that the author wanted to test.

As the author of the test mentions, the first transformation is not reasonable. However, the potential opportunity for this first transformation is no longer there (the fmr instruction for it is gone) with two const Newton-Raphson. We can at least detect the second legal transformation with the CHECK-NOT.

; CHECK: @foo3
; CHECK-NOT: fmr
; CHECK: xsmaddmdp
; CHECK: xsmaddadp
nemanjai updated this revision to Diff 197763.May 2 2019, 6:16 AM
nemanjai marked an inline comment as done.

Fixed the typos, added a check for no register move.

stefanp accepted this revision.May 2 2019, 11:10 AM

I only have one comment here and I think it can be fixed when the patch is committed.
LGTM.

lib/Target/PowerPC/PPCSubtarget.h
101

I think that this variable needs to be initialized to false in void PPCSubtarget::initializeEnvironment().

This revision is now accepted and ready to land.May 2 2019, 11:10 AM
nemanjai marked an inline comment as done.May 7 2019, 5:05 AM
nemanjai added inline comments.
lib/Target/PowerPC/PPC.td
140

Thanks. I'll fix it.

lib/Target/PowerPC/PPCSubtarget.h
101

Ah, good catch. I forgot about that.

This revision was automatically updated to reflect the committed changes.