Update the rounding logic for generic hypot function so that it will round correctly with all rounding modes.
Details
Diff Detail
- Repository
- rG LLVM Github Monorepo
Event Timeline
Dear Tue,
Update the rounding logic for generic hypot function so that it will round correctly with all rounding modes.
I did stress this revision with my random testing code
(https://gitlab.inria.fr/zimmerma/math_accuracy) and did not find any
incorrectly rounded results, for all rounding modes, both for binary32
and binary64.
Instead of testing get_round() == FE_UPWARD to know the current rounding
mode, I wonder whether a test like 0x1p0f + 0x1p-24f != 0x1p0f would be
faster.
Paul
Thanks Paul for testing the patch!
I've tried with using the 0x1p0f + 0x1p-24f != 0x1p0f instead of get_round() == FE_UPWARD, and it does make the perf tests on normal range ~ 5% faster.
But this is due to the compiler optimized away the expression (making it always False), and in turn, making the function not correctly rounded for all rounding modes any more: https://godbolt.org/z/87z4bWE9P
And if feel like if we add extra stuff to prevent the compiler from optimizing the expression away, it would bring the performance back to what we got with get_round() == FE_UPWARD.
This is also belong to the exceptional cases where we short-circuit the results, and so at least any changes inside would not affect the worst case performance.
Dear Tue,
I've tried with using the 0x1p0f + 0x1p-24f != 0x1p0f instead of get_round() == FE_UPWARD, and it does make the perf tests on normal range ~ 5% faster.
But this is due to the compiler optimized away the expression (making it always False), and in turn, making the function not correctly rounded for all rounding modes any more: https://godbolt.org/z/87z4bWE9P
yes, you need to add -frounding-math to the compiler options.
And if feel like if we add extra stuff to prevent the compiler from optimizing the expression away, it would bring the performance back to what we got with get_round() == FE_UPWARD.
This is also belong to the exceptional cases where we short-circuit the results, and so at least any changes inside would not affect the worst case performance.
please can you try with -frounding-math and check if the perf tests are
faster and slower?
Paul
Thanks Paul! I've added -fround-math and it worked correctly while maintaining 5% overall performance improvement compared to get_round().
I've updated the patch accordingly. You might have to redo the stress tests again to make sure everything is still working properly.
Thanks!
libc/test/src/math/HypotTest.h | ||
---|---|---|
67 | Is the call to func happening with the intended rounding mode? |
libc/test/src/math/HypotTest.h | ||
---|---|---|
67 | Yes, by putting the call inside the macro, it will be put by the macro after the rounding mode is set before it is evaluated. If it does not work as intended, the assertions with different rounding modes will definitely catch it, because most of the outputs are not representable in the floating point, and hence rounding up/down will guarantee to give different results. |
I still get warnings with the latest revision:
/localdisk/zimmerma/llvm-project/libc/src/__support/FPUtil/Hypot.h:149:22: warning: hexadecimal floating literals are a C++17 feature [-Wc++17-extensions] if ((y != 0) && (0x1p0f + 0x1p-24f != 0x1p0f)) { ^
the stress tests were successful (for all four rounding modes, both in single and double precision).
Thus I am ok with this version, thanks!
after fixing my stress program I was able to find one value which does not seem to be correctly rounded (for binary32 and rounding up):
zimmerma@biscotte:~/svn/tbd/20/src/binary32$ CFLAGS=-DCHECK_CR LLVM=llvm-project VERBOSE=-v RND=rndu ./doitb.llvm hypot 1000 Checking hypot with llvm-project and rndu Using seed 1076573 NEW hypot 0 -1 0x1.ffffecp-1,-0x1.000002p+27 [1.00] 1 1 libm gives 0x1.000002p+27 mpfr gives 0x1.000004p+27
Please can you confirm?
I got similar results with binary64:
Checking hypot with llvm-project and rndu Using seed 1078001 NEW hypot 0 -1 0x1.ccbbbcfef3c02p-523,0x1.924bf639c1a94p+500 [1.00] 1 1 libm gives 0x1.924bf639c1a94p+500 mpfr gives 0x1.924bf639c1a95p+500
Thanks Paul for catching the problem! For some reason before my previous patch update, merging to the head had put the -fround-math compiler flags into the wrong entrypoints that I didn't notice. After moving them back to hypotf and hypot, the results are correct on my side now.
I'm ok with the new revision. However I see there are still some calls to get_round(). Did you try to replace them by floating-point operations?
You might also want to add the following hard-to-round cases (for binary32) in your test cases:
/* the following are hard-to-round cases with many identical bits after the round bit */ {0x1.900004p+34,0x1.400002p+23}, /* 45 identical bits */ {0x1.05555p+34,0x1.bffffep+23}, /* 44 identical bits */ {0x1.e5fffap+34,0x1.affffep+23}, /* 45 identical bits */ {0x1.260002p+34,0x1.500002p+23}, /* 45 identical bits */ {0x1.fffffap+34,0x1.fffffep+23}, /* 45 identical bits */ {0x1.8ffffap+34,0x1.3ffffep+23}, /* 45 identical bits */ {0x1.87fffcp+35,0x1.bffffep+23}, /* 47 identical bits */
By the way, none of the other libraries is correctly rounded for binary32, here are the corresponding worst cases:
/* hypot(x,y) */ {0x1.b8e50ap-52,-0x1.db1e78p-64}, /* GNU libc 0.500001 */ {0x1.03b54cp-33,0x1.6ca6bep-45}, /* icc 0.500001 */ {0x1.e2eff6p+97,-0x1.044cb2p+108}, /* AMD LibM 0.500001 */ {-0x1.6b05c4p-127,0x1.6b3146p-126}, /* Newlib 1.20805 */ {-0x1.6b05c4p-127,0x1.6b3146p-126}, /* OpenLibm 1.20805 */ {0x1.26b188p-127,-0x1.a4f2fp-128}, /* Musl 0.926707 */ {0x1.e2eff6p+97,-0x1.044cb2p+108}, /* Darwin 20.4.0 0.500001 */
attached is a file with 1200 binary32 exact cases with ulp(x)=2^12*ulp(y), x^2+y^2=z^2 having up to 72 bits. You might add them to your test cases.
Thanks Paul for the test cases! I've added these and the other 1.2k cases in your attachment to the test.
I have also tried replacing the final get_round with the corresponding floating point operations, but for some reason, -O3 always partially optimized it away even with -frounding-math flag together, and that make the results incorrect.
If I change the flag to -O2, then it is correct, but I see virtually no performance improvement. So I decided to keep the get_round() at the end so that it is easier to read and less sensitive to optimization flags.
a performance graph is available at https://core-math.gitlabpages.inria.fr/graph_perf_hypotf.pdf
Dear Tue,
I have also tried replacing the final get_round with the corresponding floating point operations, but for some reason, -O3 always partially optimized it away even with -frounding-math flag together, and that make the results incorrect.
If I change the flag to -O2, then it is correct, but I see virtually no performance improvement. So I decided to keep the get_round() at the end so that it is easier to read and less sensitive to optimization flags.
this might be a compiler bug, since -frounding-math should be supported
whatever the optimization level (-O2 or -O3). You might want to investigate
further and report the bug if any.
Best regards,
Paul
this might be a compiler bug, since -frounding-math should be supported whatever the optimization level (-O2 or -O3). You might want to investigate further and report the bug if any.
Hi, what's the target in the context? Or can you show the partial IR after -frounding-math? Currently only X86/SystemZ/PowerPC in LLVM has rather complete support for rounding-mode-aware float operations.
This is the error messages that we got on aarch64-ubuntu: Buildbot log
[1/3] Building CXX object projects/libc/src/math/generic/CMakeFiles/libc.src.math.generic.hypotf.dir/hypotf.cpp.o
clang-9: warning: optimization flag '-frounding-math' is not supported [-Wignored-optimization-argument]
[2/3] Building CXX object projects/libc/src/math/generic/CMakeFiles/libc.src.math.generic.hypot.dir/hypot.cpp.o
clang-9: warning: optimization flag '-frounding-math' is not supported [-Wignored-optimization-argument]
It looks like our buildbot for aarch64 uses clang-9 and -frounding-math is not supported there?
This is the error messages that we got on aarch64-ubuntu: Buildbot log https://lab.llvm.org/buildbot/#/builders/138/builds/16983/steps/4/logs/stdio
it would be better to print the inputs with %e, %g or %a in the checking code,
since with %f tiny inputs simply appear as 0:
Input decimal: x: 0.00000000000000000000000000000000000000000000000000 y: 1797693134862315708145274237317043567980705675258449965989174768031572607800285387605895586327668781715404589535143824642343213268894641827684675467035375169860499105765512820762454900903893289440758
Is the call to func happening with the intended rounding mode?