Download Raw Diff

Details

Reviewers

michaelrj
sivachandra
zimmermann6

Commits

rGaad04534c419: [libc] Implement correct rounding with all rounding modes for hypot functions.

Summary

Update the rounding logic for generic hypot function so that it will round correctly with all rounding modes.

Diff Detail

Repository: rG LLVM Github Monorepo

Event Timeline

lntue created this revision.Jan 18 2022, 11:12 AM

Herald added a project: Restricted Project. · View Herald TranscriptJan 18 2022, 11:12 AM

Herald added subscribers: ecnelises, tschuett. · View Herald Transcript

lntue requested review of this revision.Jan 18 2022, 11:12 AM

Harbormaster completed remote builds in B144068: Diff 400919.Jan 18 2022, 11:17 AM

LGTM from a formatting perspective

Dear Tue,

Update the rounding logic for generic hypot function so that it will round correctly with all rounding modes.

I did stress this revision with my random testing code
(https://gitlab.inria.fr/zimmerma/math_accuracy) and did not find any
incorrectly rounded results, for all rounding modes, both for binary32
and binary64.

Instead of testing get_round() == FE_UPWARD to know the current rounding
mode, I wonder whether a test like 0x1p0f + 0x1p-24f != 0x1p0f would be
faster.

Paul

In D117590#3254008, @zimmermann6 wrote:

Dear Tue,

Update the rounding logic for generic hypot function so that it will round correctly with all rounding modes.

I did stress this revision with my random testing code
(https://gitlab.inria.fr/zimmerma/math_accuracy) and did not find any
incorrectly rounded results, for all rounding modes, both for binary32
and binary64.

Instead of testing get_round() == FE_UPWARD to know the current rounding
mode, I wonder whether a test like 0x1p0f + 0x1p-24f != 0x1p0f would be
faster.

Paul

Thanks Paul for testing the patch!

I've tried with using the 0x1p0f + 0x1p-24f != 0x1p0f instead of get_round() == FE_UPWARD, and it does make the perf tests on normal range ~ 5% faster.
But this is due to the compiler optimized away the expression (making it always False), and in turn, making the function not correctly rounded for all rounding modes any more: https://godbolt.org/z/87z4bWE9P

And if feel like if we add extra stuff to prevent the compiler from optimizing the expression away, it would bring the performance back to what we got with get_round() == FE_UPWARD.
This is also belong to the exceptional cases where we short-circuit the results, and so at least any changes inside would not affect the worst case performance.

Dear Tue,

I've tried with using the 0x1p0f + 0x1p-24f != 0x1p0f instead of get_round() == FE_UPWARD, and it does make the perf tests on normal range ~ 5% faster.
But this is due to the compiler optimized away the expression (making it always False), and in turn, making the function not correctly rounded for all rounding modes any more: https://godbolt.org/z/87z4bWE9P

yes, you need to add -frounding-math to the compiler options.

And if feel like if we add extra stuff to prevent the compiler from optimizing the expression away, it would bring the performance back to what we got with get_round() == FE_UPWARD.
This is also belong to the exceptional cases where we short-circuit the results, and so at least any changes inside would not affect the worst case performance.

please can you try with -frounding-math and check if the perf tests are
faster and slower?

Paul

Use a faster check for FE_UPWARD rounding mode.

Herald added a subscriber: mgorny. · View Herald TranscriptJan 19 2022, 8:19 AM

In D117590#3254772, @zimmermann6 wrote:

Dear Tue,

I've tried with using the 0x1p0f + 0x1p-24f != 0x1p0f instead of get_round() == FE_UPWARD, and it does make the perf tests on normal range ~ 5% faster.
But this is due to the compiler optimized away the expression (making it always False), and in turn, making the function not correctly rounded for all rounding modes any more: https://godbolt.org/z/87z4bWE9P

yes, you need to add -frounding-math to the compiler options.

And if feel like if we add extra stuff to prevent the compiler from optimizing the expression away, it would bring the performance back to what we got with get_round() == FE_UPWARD.
This is also belong to the exceptional cases where we short-circuit the results, and so at least any changes inside would not affect the worst case performance.

please can you try with -frounding-math and check if the perf tests are
faster and slower?

Paul

Thanks Paul! I've added -fround-math and it worked correctly while maintaining 5% overall performance improvement compared to get_round().

I've updated the patch accordingly. You might have to redo the stress tests again to make sure everything is still working properly.

Thanks!

Harbormaster completed remote builds in B144307: Diff 401251.Jan 19 2022, 8:54 AM

Remove warnings for perf tests.

sivachandra added inline comments.Jan 19 2022, 10:39 AM

libc/test/src/math/HypotTest.h
67	Is the call to `func` happening with the intended rounding mode?

lntue added inline comments.Jan 19 2022, 11:03 AM

libc/test/src/math/HypotTest.h
67	Yes, by putting the call inside the macro, it will be put by the macro after the rounding mode is set before it is evaluated. If it does not work as intended, the assertions with different rounding modes will definitely catch it, because most of the outputs are not representable in the floating point, and hence rounding up/down will guarantee to give different results.

Please wait for @zimmermann6 to give his green light.

This revision is now accepted and ready to land.Jan 19 2022, 11:07 AM

Harbormaster completed remote builds in B144360: Diff 401323.Jan 19 2022, 12:21 PM

I still get warnings with the latest revision:

/localdisk/zimmerma/llvm-project/libc/src/__support/FPUtil/Hypot.h:149:22: warning: hexadecimal floating literals are a C++17 feature [-Wc++17-extensions]
    if ((y != 0) && (0x1p0f + 0x1p-24f != 0x1p0f)) {
                     ^

the stress tests were successful (for all four rounding modes, both in single and double precision).
Thus I am ok with this version, thanks!

after fixing my stress program I was able to find one value which does not seem to be correctly rounded (for binary32 and rounding up):

zimmerma@biscotte:~/svn/tbd/20/src/binary32$ CFLAGS=-DCHECK_CR LLVM=llvm-project VERBOSE=-v RND=rndu ./doitb.llvm hypot 1000
Checking hypot with llvm-project and rndu
Using seed 1076573
NEW hypot 0 -1 0x1.ffffecp-1,-0x1.000002p+27 [1.00] 1 1
libm gives 0x1.000002p+27
mpfr gives 0x1.000004p+27

Please can you confirm?

This revision now requires changes to proceed.Jan 20 2022, 2:52 AM

I got similar results with binary64:

Checking hypot with llvm-project and rndu
Using seed 1078001
NEW hypot 0 -1 0x1.ccbbbcfef3c02p-523,0x1.924bf639c1a94p+500 [1.00] 1 1
libm gives 0x1.924bf639c1a94p+500
mpfr gives 0x1.924bf639c1a95p+500

Fix compiler flags that were added to the wrong entrypoints.

In D117590#3257664, @zimmermann6 wrote:
after fixing my stress program I was able to find one value which does not seem to be correctly rounded (for binary32 and rounding up):
zimmerma@biscotte:~/svn/tbd/20/src/binary32$ CFLAGS=-DCHECK_CR LLVM=llvm-project VERBOSE=-v RND=rndu ./doitb.llvm hypot 1000
Checking hypot with llvm-project and rndu
Using seed 1076573
NEW hypot 0 -1 0x1.ffffecp-1,-0x1.000002p+27 [1.00] 1 1
libm gives 0x1.000002p+27
mpfr gives 0x1.000004p+27
Please can you confirm?

Thanks Paul for catching the problem! For some reason before my previous patch update, merging to the head had put the -fround-math compiler flags into the wrong entrypoints that I didn't notice. After moving them back to hypotf and hypot, the results are correct on my side now.

Harbormaster completed remote builds in B144570: Diff 401620.Jan 20 2022, 6:43 AM

I'm ok with the new revision. However I see there are still some calls to get_round(). Did you try to replace them by floating-point operations?

You might also want to add the following hard-to-round cases (for binary32) in your test cases:

/* the following are hard-to-round cases with many identical bits after       
   the round bit */
{0x1.900004p+34,0x1.400002p+23}, /* 45 identical bits */
{0x1.05555p+34,0x1.bffffep+23},  /* 44 identical bits */
{0x1.e5fffap+34,0x1.affffep+23}, /* 45 identical bits */
{0x1.260002p+34,0x1.500002p+23}, /* 45 identical bits */
{0x1.fffffap+34,0x1.fffffep+23}, /* 45 identical bits */
{0x1.8ffffap+34,0x1.3ffffep+23}, /* 45 identical bits */
{0x1.87fffcp+35,0x1.bffffep+23}, /* 47 identical bits */

By the way, none of the other libraries is correctly rounded for binary32, here are the corresponding worst cases:

/* hypot(x,y) */
{0x1.b8e50ap-52,-0x1.db1e78p-64},   /* GNU libc       0.500001 */
{0x1.03b54cp-33,0x1.6ca6bep-45},    /* icc            0.500001 */
{0x1.e2eff6p+97,-0x1.044cb2p+108},  /* AMD LibM       0.500001 */
{-0x1.6b05c4p-127,0x1.6b3146p-126}, /* Newlib         1.20805 */
{-0x1.6b05c4p-127,0x1.6b3146p-126}, /* OpenLibm       1.20805 */
{0x1.26b188p-127,-0x1.a4f2fp-128},  /* Musl           0.926707 */
{0x1.e2eff6p+97,-0x1.044cb2p+108},  /* Darwin 20.4.0  0.500001 */

This revision is now accepted and ready to land.Jan 20 2022, 7:31 AM

wc.1231 KBDownload

attached is a file with 1200 binary32 exact cases with ulp(x)=2^12*ulp(y), x^2+y^2=z^2 having up to 72 bits. You might add them to your test cases.

Add more hard-to-round tests.

Add even more hard-to-round tests from Paul.

In D117590#3258279, @zimmermann6 wrote:

I'm ok with the new revision. However I see there are still some calls to get_round(). Did you try to replace them by floating-point operations?

You might also want to add the following hard-to-round cases (for binary32) in your test cases:

/* the following are hard-to-round cases with many identical bits after       
   the round bit */
{0x1.900004p+34,0x1.400002p+23}, /* 45 identical bits */
{0x1.05555p+34,0x1.bffffep+23},  /* 44 identical bits */
{0x1.e5fffap+34,0x1.affffep+23}, /* 45 identical bits */
{0x1.260002p+34,0x1.500002p+23}, /* 45 identical bits */
{0x1.fffffap+34,0x1.fffffep+23}, /* 45 identical bits */
{0x1.8ffffap+34,0x1.3ffffep+23}, /* 45 identical bits */
{0x1.87fffcp+35,0x1.bffffep+23}, /* 47 identical bits */

By the way, none of the other libraries is correctly rounded for binary32, here are the corresponding worst cases:

/* hypot(x,y) */
{0x1.b8e50ap-52,-0x1.db1e78p-64},   /* GNU libc       0.500001 */
{0x1.03b54cp-33,0x1.6ca6bep-45},    /* icc            0.500001 */
{0x1.e2eff6p+97,-0x1.044cb2p+108},  /* AMD LibM       0.500001 */
{-0x1.6b05c4p-127,0x1.6b3146p-126}, /* Newlib         1.20805 */
{-0x1.6b05c4p-127,0x1.6b3146p-126}, /* OpenLibm       1.20805 */
{0x1.26b188p-127,-0x1.a4f2fp-128},  /* Musl           0.926707 */
{0x1.e2eff6p+97,-0x1.044cb2p+108},  /* Darwin 20.4.0  0.500001 */

Thanks Paul for the test cases! I've added these and the other 1.2k cases in your attachment to the test.

I have also tried replacing the final get_round with the corresponding floating point operations, but for some reason, -O3 always partially optimized it away even with -frounding-math flag together, and that make the results incorrect.
If I change the flag to -O2, then it is correct, but I see virtually no performance improvement. So I decided to keep the get_round() at the end so that it is easier to read and less sensitive to optimization flags.

Harbormaster completed remote builds in B144610: Diff 401673.Jan 20 2022, 10:32 AM

Closed by commit rGaad04534c419: [libc] Implement correct rounding with all rounding modes for hypot functions. (authored by lntue). · Explain WhyJan 20 2022, 10:33 AM

This revision was automatically updated to reflect the committed changes.

lntue added a commit: rGaad04534c419: [libc] Implement correct rounding with all rounding modes for hypot functions..

a performance graph is available at https://core-math.gitlabpages.inria.fr/graph_perf_hypotf.pdf

Dear Tue,

I have also tried replacing the final get_round with the corresponding floating point operations, but for some reason, -O3 always partially optimized it away even with -frounding-math flag together, and that make the results incorrect.
If I change the flag to -O2, then it is correct, but I see virtually no performance improvement. So I decided to keep the get_round() at the end so that it is easier to read and less sensitive to optimization flags.

this might be a compiler bug, since -frounding-math should be supported
whatever the optimization level (-O2 or -O3). You might want to investigate
further and report the bug if any.

Best regards,
Paul

this might be a compiler bug, since -frounding-math should be supported whatever the optimization level (-O2 or -O3). You might want to investigate further and report the bug if any.

Hi, what's the target in the context? Or can you show the partial IR after -frounding-math? Currently only X86/SystemZ/PowerPC in LLVM has rather complete support for rounding-mode-aware float operations.

In D117590#3269109, @qiucf wrote:

this might be a compiler bug, since -frounding-math should be supported whatever the optimization level (-O2 or -O3). You might want to investigate further and report the bug if any.

Hi, what's the target in the context? Or can you show the partial IR after -frounding-math? Currently only X86/SystemZ/PowerPC in LLVM has rather complete support for rounding-mode-aware float operations.

This is the error messages that we got on aarch64-ubuntu: Buildbot log

[1/3] Building CXX object projects/libc/src/math/generic/CMakeFiles/libc.src.math.generic.hypotf.dir/hypotf.cpp.o
clang-9: warning: optimization flag '-frounding-math' is not supported [-Wignored-optimization-argument]
[2/3] Building CXX object projects/libc/src/math/generic/CMakeFiles/libc.src.math.generic.hypot.dir/hypot.cpp.o
clang-9: warning: optimization flag '-frounding-math' is not supported [-Wignored-optimization-argument]

It looks like our buildbot for aarch64 uses clang-9 and -frounding-math is not supported there?

This is the error messages that we got on aarch64-ubuntu: Buildbot log https://lab.llvm.org/buildbot/#/builders/138/builds/16983/steps/4/logs/stdio

it would be better to print the inputs with %e, %g or %a in the checking code,
since with %f tiny inputs simply appear as 0:

Input decimal: x: 0.00000000000000000000000000000000000000000000000000 y: 1797693134862315708145274237317043567980705675258449965989174768031572607800285387605895586327668781715404589535143824642343213268894641827684675467035375169860499105765512820762454900903893289440758

This is an archive of the discontinued LLVM Phabricator instance.

[libc] Implement correct rounding with all rounding modes for hypot functions.
ClosedPublic

Details

Diff Detail

Event Timeline

Revision Contents

Diff 400919

libc/src/__support/FPUtil/Hypot.h

libc/test/src/math/HypotTest.h

This is an archive of the discontinued LLVM Phabricator instance.

[libc] Implement correct rounding with all rounding modes for hypot functions.ClosedPublic

Details

Diff Detail

Event Timeline

Revision Contents

Diff 400919

libc/src/__support/FPUtil/Hypot.h

libc/test/src/math/HypotTest.h

[libc] Implement correct rounding with all rounding modes for hypot functions.
ClosedPublic