Download Raw Diff

Details

Reviewers

sivachandra
michaelrj
cqlauter
zimmermann6

Commits

rGf1ec99f973bd: [libc] Improve hypotf performance with different algorithm correctly rounded to…

Summary

Algorithm for hypotf: compute (a*a + b*b) in double precision, then use Dekker's algorithm to find the rounding error, and then correcting it after taking its square-root.

Diff Detail

Repository: rG LLVM Github Monorepo

Event Timeline

lntue created this revision.Jan 25 2022, 8:52 AM

Herald added a project: Restricted Project. · View Herald TranscriptJan 25 2022, 8:52 AM

Herald added subscribers: ecnelises, tschuett. · View Herald Transcript

lntue requested review of this revision.Jan 25 2022, 8:52 AM

sivachandra added inline comments.Jan 25 2022, 9:19 AM

libc/src/math/generic/hypotf.cpp
24	Incorrect variable naming style at many places in this function.
34	A call to a target independent builtin for a standard function can lead to a call back to the libc. That is, compilers are free to call the `sqrt` function from the libc. We can refactor our `sqrt` implementation so that we can replace this call to `__builtin_sqrt` with a call to LLVM libc's `sqrt`.

lntue added inline comments.Jan 25 2022, 12:20 PM

libc/src/math/generic/hypotf.cpp
34	I refactor our `sqrt` implementation in https://reviews.llvm.org/D118173 Will wait for that patch to be landed.

I get some errors for rounding to nearest:

Difference for 0x1.faf49ep+25,0x1.480002p+23
llvm_hypot: 0x1.00c5bp+26
as_hypot:   0x1.00c5b2p+26
pz_hypot:   0x1.00c5b2p+26

libc/src/math/generic/hypotf.cpp
31–32	I hadn't seen that trick to compute the rounding error, do you have a reference? By the way, I'm not sure the reference to Dekker is appropriate. For me, Dekker's algorithm splits two floating-point numbers in two each, and computes their product (high + low part) using 4 multiplies.

This revision now requires changes to proceed.Jan 26 2022, 6:07 AM

vinc17 added a subscriber: vinc17.Jan 26 2022, 6:36 AM

vinc17 added inline comments.

libc/src/math/generic/hypotf.cpp
31–32	If I understand correctly, `err` should get the rounding error of the sum. The algorithm is known as TwoSum. It needs 6 operations, including the sum `sumSq`, and this is the same number of operations as you have. But with 6 add/sub operations, I proved that there is only one algorithm (up to obvious symmetries) that works. And the above one is different. Thus it will be sometimes incorrect. I think that if `xSq` and `ySq` are close to each other and their sum is not exact, then the above algorithm will give you twice the rounding error.

Remove the check for non-zero error.

In D118157#3272333, @zimmermann6 wrote:
I get some errors for rounding to nearest:
Difference for 0x1.faf49ep+25,0x1.480002p+23
llvm_hypot: 0x1.00c5bp+26
as_hypot:   0x1.00c5b2p+26
pz_hypot:   0x1.00c5b2p+26

Thanks Paul for finding the error! For this example, the sum square is actually exact, but the rounding errors still need to be updated in order to avoid double rounding errors. By removing the check (err != 0) in line 36, I got it back correctly.

lntue marked an inline comment as not done.Jan 26 2022, 8:18 AM

lntue added inline comments.

libc/src/math/generic/hypotf.cpp
31–32	Thanks Paul and Vincent for finding the issue with this! Actually I was trying to implement the Fast2Sum version since we are in radix-2. Moreover, since both `xSq` and `ySq` are non-negative: max(xSq, ySq) <= sumSq <= sqrt(2) max(xSq, ySq) and so `sumSq - max(xSq, ySq)` is exact. I was trying to implement it without branch and thought that `(sumSq - min(xSq, ySq)) - max(xSq, ySq) = 0`, which could be easily disproved by the following example: Consider single precision with `xSq = 1 + 2^(-23)` (I know it's not a square) and `ySq = 2^(-24)` with default rounding mode, then `sumSq = xSq + ySq = 1 + 2^(-22)`. Then: sumSq - xSq = 2^(-23) and sumSq - ySq = 1 + 2^(-22) = sumSq And hence: (sumSq - xSq) - ySq = 2^-24 (sumSq - ySq) - xSq = 2^-23 ((sumSq - xSq) - ySq) + ((sumSq - ySq) - xSq) != 2^(-24) which is the rounding error. I change it back to the normal Fast2Sum implementation with branching now, so the rounding error computation should be correct.

Harbormaster completed remote builds in B145747: Diff 403268.Jan 27 2022, 2:31 AM

I'm still running semi-exhaustive tests, it takes some time. I wonder whether a full exhaustive test is possible, by comparing the LLVM implementation with the code from Alexei at https://core-math.gitlabpages.inria.fr/. On a 64-core machine (Intel Xeon Gold 6130 @ 2.10GHz), it takes 4.6s to check 2^33 pairs (x,y). If one tests only positive x,y and x>=y, as exhaustive comparison would have to check 2^61 pairs for each rounding mode, which would take less than 1.5 month using 10000 such machines. This would not be a proof, but the probability that both codes are wrong for the same inputs and give exactly the same wrong answer is quite small.

In D118157#3278638, @zimmermann6 wrote:

I'm still running semi-exhaustive tests, it takes some time. I wonder whether a full exhaustive test is possible, by comparing the LLVM implementation with the code from Alexei at https://core-math.gitlabpages.inria.fr/. On a 64-core machine (Intel Xeon Gold 6130 @ 2.10GHz), it takes 4.6s to check 2^33 pairs (x,y). If one tests only positive x,y and x>=y, as exhaustive comparison would have to check 2^61 pairs for each rounding mode, which would take less than 1.5 month using 10000 such machines. This would not be a proof, but the probability that both codes are wrong for the same inputs and give exactly the same wrong answer is quite small.

Or if you don't mind to be slower, you can compare it with the shift-and-add algorithm implemented in the LLVM-libc that this one is trying to speed up, since that can be proved mathematically to be correct.

Another option is that since the idea of this algorithm is scalable, we can have a version of it for half precision (essentially just change the data types and masks/constants), where we it can be tested exhaustively? That should at least increase the confidence with single, and maybe double precision later?

Use fputil::sqrt instead of __builtin_sqrtf.

Herald added a subscriber: mgorny. · View Herald TranscriptJan 28 2022, 3:14 PM

Harbormaster completed remote builds in B146393: Diff 404181.Jan 28 2022, 3:19 PM

Fix variable names and ignore compiler warnings about C++17.

Add a quick return when the exponent difference is at least 2 more than the mantissa length.

Harbormaster completed remote builds in B146508: Diff 404339.Jan 29 2022, 9:53 PM

lntue marked an inline comment as done.Jan 30 2022, 6:38 AM

the version of last Friday is fine for me: I did run exhaustive tests for 2^23 <= y < 2^24, and 2^(23+k) <= x < 2^(24+k) for 0 <= k <= 13.
However since it changed in the meantime, I don't have resources any more to review the new version.

In D118157#3283205, @zimmermann6 wrote:

the version of last Friday is fine for me: I did run exhaustive tests for 2^23 <= y < 2^24, and 2^(23+k) <= x < 2^(24+k) for 0 <= k <= 13.
However since it changed in the meantime, I don't have resources any more to review the new version.

Thanks Paul for checking! The new version only changes when exponent of x >= exponent of y + 25, so your verification should still hold.

Add exhaustive testing to test for inputs (x, y) with 2^23 <= x < 2^24, 2^(23 + 14) <= y < 2^(23 + 25).

Fix names in the exhaustive test.

Harbormaster completed remote builds in B146718: Diff 404641.Jan 31 2022, 11:44 AM

OK for the structuring aspects.

This revision is now accepted and ready to land.Jan 31 2022, 3:12 PM

In D118157#3283205, @zimmermann6 wrote:

the version of last Friday is fine for me: I did run exhaustive tests for 2^23 <= y < 2^24, and 2^(23+k) <= x < 2^(24+k) for 0 <= k <= 13.
However since it changed in the meantime, I don't have resources any more to review the new version.

@zimmermann6 : I've finished testing the remaining pairs: (x, y) with 2^23 <= y < 2^24, and 2^(23+k) <= x < 2^(24+k) for 14 <= k <= 24.

Sync to HEAD.

Harbormaster completed remote builds in B149322: Diff 408320.Feb 13 2022, 8:27 PM

Closed by commit rGf1ec99f973bd: [libc] Improve hypotf performance with different algorithm correctly rounded to… (authored by lntue). · Explain WhyFeb 16 2022, 6:49 AM

This revision was automatically updated to reflect the committed changes.

lntue added a commit: rGf1ec99f973bd: [libc] Improve hypotf performance with different algorithm correctly rounded to….

Diff 409235

libc/src/math/generic/CMakeLists.txt

	Show First 20 Lines • Show All 966 Lines • ▼ Show 20 Lines
	add_entrypoint_object(			add_entrypoint_object(
	hypotf			hypotf
	SRCS			SRCS
	hypotf.cpp			hypotf.cpp
	HDRS			HDRS
	../hypotf.h			../hypotf.h
	DEPENDS			DEPENDS
	libc.src.__support.FPUtil.fputil			libc.src.__support.FPUtil.fputil
				libc.src.__support.FPUtil.sqrt
	COMPILE_OPTIONS			COMPILE_OPTIONS
	-O3			-O3
	)			)

	add_entrypoint_object(			add_entrypoint_object(
	fdim			fdim
	SRCS			SRCS
	fdim.cpp			fdim.cpp
	▲ Show 20 Lines • Show All 91 Lines • Show Last 20 Lines

libc/src/math/generic/hypotf.cpp

	//===-- Implementation of hypotf function ---------------------------------===//			//===-- Implementation of hypotf function ---------------------------------===//
	//			//
	// Part of the LLVM Project, under the Apache License v2.0 with LLVM Exceptions.			// Part of the LLVM Project, under the Apache License v2.0 with LLVM Exceptions.
	// See https://llvm.org/LICENSE.txt for license information.			// See https://llvm.org/LICENSE.txt for license information.
	// SPDX-License-Identifier: Apache-2.0 WITH LLVM-exception			// SPDX-License-Identifier: Apache-2.0 WITH LLVM-exception
	//			//
	//===----------------------------------------------------------------------===//			//===----------------------------------------------------------------------===//
	#include "src/math/hypotf.h"			#include "src/math/hypotf.h"
	#include "src/__support/FPUtil/Hypot.h"			#include "src/__support/FPUtil/BasicOperations.h"
				#include "src/__support/FPUtil/FPBits.h"
				#include "src/__support/FPUtil/sqrt.h"
	#include "src/__support/common.h"			#include "src/__support/common.h"

	namespace __llvm_libc {			namespace __llvm_libc {

	LLVM_LIBC_FUNCTION(float, hypotf, (float x, float y)) {			LLVM_LIBC_FUNCTION(float, hypotf, (float x, float y)) {
	return __llvm_libc::fputil::hypot(x, y);			using DoubleBits = fputil::FPBits<double>;
				using FPBits = fputil::FPBits<float>;

				FPBits x_bits(x), y_bits(y);

				uint16_t x_exp = x_bits.get_unbiased_exponent();
				uint16_t y_exp = y_bits.get_unbiased_exponent();
				uint16_t exp_diff = (x_exp > y_exp) ? (x_exp - y_exp) : (y_exp - x_exp);
				sivachandraUnsubmitted Done Reply Inline Actions Incorrect variable naming style at many places in this function. sivachandra: Incorrect variable naming style at many places in this function.

				if (exp_diff >= fputil::MantissaWidth<float>::VALUE + 2) {
				return fputil::abs(x) + fputil::abs(y);
				}

				double xd = static_cast<double>(x);
				double yd = static_cast<double>(y);

				zimmermann6Unsubmitted Not Done Reply Inline Actions I hadn't seen that trick to compute the rounding error, do you have a reference? By the way, I'm not sure the reference to Dekker is appropriate. For me, Dekker's algorithm splits two floating-point numbers in two each, and computes their product (high + low part) using 4 multiplies. zimmermann6: I hadn't seen that trick to compute the rounding error, do you have a reference? By the way…
				vinc17Unsubmitted Not Done Reply Inline Actions If I understand correctly, `err` should get the rounding error of the sum. The algorithm is known as TwoSum. It needs 6 operations, including the sum `sumSq`, and this is the same number of operations as you have. But with 6 add/sub operations, I proved that there is only one algorithm (up to obvious symmetries) that works. And the above one is different. Thus it will be sometimes incorrect. I think that if `xSq` and `ySq` are close to each other and their sum is not exact, then the above algorithm will give you twice the rounding error. vinc17: If I understand correctly, `err` should get the rounding error of the sum. The algorithm is…
				lntueAuthorUnsubmitted Done Reply Inline Actions Thanks Paul and Vincent for finding the issue with this! Actually I was trying to implement the Fast2Sum version since we are in radix-2. Moreover, since both `xSq` and `ySq` are non-negative: max(xSq, ySq) <= sumSq <= sqrt(2) max(xSq, ySq) and so `sumSq - max(xSq, ySq)` is exact. I was trying to implement it without branch and thought that `(sumSq - min(xSq, ySq)) - max(xSq, ySq) = 0`, which could be easily disproved by the following example: Consider single precision with `xSq = 1 + 2^(-23)` (I know it's not a square) and `ySq = 2^(-24)` with default rounding mode, then `sumSq = xSq + ySq = 1 + 2^(-22)`. Then: sumSq - xSq = 2^(-23) and sumSq - ySq = 1 + 2^(-22) = sumSq And hence: (sumSq - xSq) - ySq = 2^-24 (sumSq - ySq) - xSq = 2^-23 ((sumSq - xSq) - ySq) + ((sumSq - ySq) - xSq) != 2^(-24) which is the rounding error. I change it back to the normal Fast2Sum implementation with branching now, so the rounding error computation should be correct. lntue: Thanks Paul and Vincent for finding the issue with this! Actually I was trying to implement…
				// These squares are exact.
				double x_sq = xd * xd;
				sivachandraUnsubmitted Done Reply Inline Actions A call to a target independent builtin for a standard function can lead to a call back to the libc. That is, compilers are free to call the `sqrt` function from the libc. We can refactor our `sqrt` implementation so that we can replace this call to `__builtin_sqrt` with a call to LLVM libc's `sqrt`. sivachandra: A call to a target independent builtin for a standard function can lead to a call back to the…
				lntueAuthorUnsubmitted Done Reply Inline Actions I refactor our `sqrt` implementation in https://reviews.llvm.org/D118173 Will wait for that patch to be landed. lntue: I refactor our `sqrt` implementation in https://reviews.llvm.org/D118173 Will wait for that…
				double y_sq = yd * yd;

				// Compute the sum of squares.
				double sum_sq = x_sq + y_sq;

				// Compute the rounding error with Fast2Sum algorithm:
				// x_sq + y_sq = sum_sq - err
				double err = (x_sq >= y_sq) ? (sum_sq - x_sq) - y_sq : (sum_sq - y_sq) - x_sq;

				// Take sqrt in double precision.
				DoubleBits result(fputil::sqrt(sum_sq));

				if (!DoubleBits(sum_sq).is_inf_or_nan()) {
				// Correct rounding.
				double r_sq = static_cast<double>(result) * static_cast<double>(result);
				double diff = sum_sq - r_sq;
				constexpr uint64_t mask = 0x0000'0000'3FFF'FFFFULL;
				uint64_t lrs = result.uintval() & mask;

				if (lrs == 0x0000'0000'1000'0000ULL && err < diff) {
				result.bits \|= 1ULL;
				} else if (lrs == 0x0000'0000'3000'0000ULL && err > diff) {
				result.bits -= 1ULL;
				}
				} else {
				FPBits bits_x(x), bits_y(y);
				if (bits_x.is_inf_or_nan() \|\| bits_y.is_inf_or_nan()) {
				if (bits_x.is_inf() \|\| bits_y.is_inf())
				return static_cast<float>(FPBits::inf());
				if (bits_x.is_nan())
				return x;
				return y;
				}
				}

				return static_cast<float>(static_cast<double>(result));
	}			}

	} // namespace __llvm_libc			} // namespace __llvm_libc

libc/test/src/math/differential_testing/BinaryOpSingleOutputDiff.h

Show First 20 Lines • Show All 72 Lines • ▼ Show 20 Lines	public:
static void run_perf(Func myFunc, Func otherFunc, const char *logFile) {		static void run_perf(Func myFunc, Func otherFunc, const char *logFile) {
testutils::OutputFileStream log(logFile);		testutils::OutputFileStream log(logFile);
log << " Performance tests with inputs in denormal range:\n";		log << " Performance tests with inputs in denormal range:\n";
run_perf_in_range(myFunc, otherFunc, /* startingBit= */ UIntType(0),		run_perf_in_range(myFunc, otherFunc, /* startingBit= */ UIntType(0),
/* endingBit= */ FPBits::MAX_SUBNORMAL, 1'000'001, log);		/* endingBit= */ FPBits::MAX_SUBNORMAL, 1'000'001, log);
log << "\n Performance tests with inputs in normal range:\n";		log << "\n Performance tests with inputs in normal range:\n";
run_perf_in_range(myFunc, otherFunc, /* startingBit= */ FPBits::MIN_NORMAL,		run_perf_in_range(myFunc, otherFunc, /* startingBit= */ FPBits::MIN_NORMAL,
/* endingBit= */ FPBits::MAX_NORMAL, 100'000'001, log);		/* endingBit= */ FPBits::MAX_NORMAL, 100'000'001, log);
		log << "\n Performance tests with inputs in normal range with exponents "
		"close to each other:\n";
		run_perf_in_range(
		myFunc, otherFunc, /* startingBit= */ FPBits(T(0x1.0p-10)).uintval(),
		/* endingBit= */ FPBits(T(0x1.0p+10)).uintval(), 10'000'001, log);
}		}
};		};

} // namespace testing		} // namespace testing
} // namespace __llvm_libc		} // namespace __llvm_libc

#define BINARY_OP_SINGLE_OUTPUT_DIFF(T, myFunc, otherFunc, filename) \		#define BINARY_OP_SINGLE_OUTPUT_DIFF(T, myFunc, otherFunc, filename) \
int main() { \		int main() { \
Show All 11 Lines

libc/test/src/math/exhaustive/CMakeLists.txt

Show First 20 Lines • Show All 123 Lines • ▼ Show 20 Lines	add_fp_unittest(
DEPENDS		DEPENDS
.exhaustive_test		.exhaustive_test
libc.include.math		libc.include.math
libc.src.math.log2f		libc.src.math.log2f
libc.src.__support.FPUtil.fputil		libc.src.__support.FPUtil.fputil
LINK_OPTIONS		LINK_OPTIONS
-lpthread		-lpthread
)		)

		add_fp_unittest(
		hypotf_test
		NO_RUN_POSTBUILD
		NEED_MPFR
		SUITE
		libc_math_exhaustive_tests
		SRCS
		hypotf_test.cpp
		DEPENDS
		.exhaustive_test
		libc.include.math
		libc.src.math.hypotf
		libc.src.__support.FPUtil.fputil
		COMPILE_OPTIONS
		-O3
		LINK_OPTIONS
		-lpthread
		)

libc/test/src/math/exhaustive/hypotf_test.cpp

This file was added.

				//===-- Exhaustive test for hypotf ----------------------------------------===//
				//
				// Part of the LLVM Project, under the Apache License v2.0 with LLVM Exceptions.
				// See https://llvm.org/LICENSE.txt for license information.
				// SPDX-License-Identifier: Apache-2.0 WITH LLVM-exception
				//
				//===----------------------------------------------------------------------===//

				#include "exhaustive_test.h"
				#include "src/__support/FPUtil/FPBits.h"
				#include "src/__support/FPUtil/Hypot.h"
				#include "src/math/hypotf.h"
				#include "utils/MPFRWrapper/MPFRUtils.h"
				#include "utils/UnitTest/FPMatcher.h"

				using FPBits = __llvm_libc::fputil::FPBits<float>;

				namespace mpfr = __llvm_libc::testing::mpfr;

				struct LlvmLibcHypotfExhaustiveTest : public LlvmLibcExhaustiveTest<uint32_t> {
				void check(uint32_t start, uint32_t stop,
				mpfr::RoundingMode rounding) override {
				// Range of the second input: [2^37, 2^48).
				constexpr uint32_t Y_START = (37U + 127U) << 23;
				constexpr uint32_t Y_STOP = (48U + 127U) << 23;

				mpfr::ForceRoundingMode r(rounding);
				uint32_t xbits = start;
				do {
				float x = float(FPBits(xbits));
				uint32_t ybits = Y_START;
				do {
				float y = float(FPBits(ybits));
				EXPECT_FP_EQ(__llvm_libc::fputil::hypot(x, y),
				__llvm_libc::hypotf(x, y));
				// Using MPFR will be much slower.
				// mpfr::BinaryInput<float> input{x, y};
				// EXPECT_MPFR_MATCH(mpfr::Operation::Hypot, input,
				// __llvm_libc::hypotf(x, y), 0.5,
				// rounding);
				} while (ybits++ < Y_STOP);
				} while (xbits++ < stop);
				}
				};

				// Range of the first input: [2^23, 2^24);
				static constexpr uint32_t START = (23U + 127U) << 23;
				static constexpr uint32_t STOP = ((23U + 127U) << 23) + 1;
				static constexpr int NUM_THREADS = 1;

				TEST_F(LlvmLibcHypotfExhaustiveTest, RoundNearestTieToEven) {
				test_full_range(START, STOP, NUM_THREADS, mpfr::RoundingMode::Nearest);
				}

				TEST_F(LlvmLibcHypotfExhaustiveTest, RoundUp) {
				test_full_range(START, STOP, NUM_THREADS, mpfr::RoundingMode::Upward);
				}

				TEST_F(LlvmLibcHypotfExhaustiveTest, RoundDown) {
				test_full_range(START, STOP, NUM_THREADS, mpfr::RoundingMode::Downward);
				}

				TEST_F(LlvmLibcHypotfExhaustiveTest, RoundTowardZero) {
				test_full_range(START, STOP, NUM_THREADS, mpfr::RoundingMode::TowardZero);
				}

libc/test/src/math/hypotf_hard_to_round.h

	//===-- Hard-to-round inputs for hypotf ------------------------------C++--===//			//===-- Hard-to-round inputs for hypotf ------------------------------C++--===//
	//			//
	// Part of the LLVM Project, under the Apache License v2.0 with LLVM Exceptions.			// Part of the LLVM Project, under the Apache License v2.0 with LLVM Exceptions.
	// See https://llvm.org/LICENSE.txt for license information.			// See https://llvm.org/LICENSE.txt for license information.
	// SPDX-License-Identifier: Apache-2.0 WITH LLVM-exception			// SPDX-License-Identifier: Apache-2.0 WITH LLVM-exception
	//			//
	//===----------------------------------------------------------------------===//			//===----------------------------------------------------------------------===//

	#ifndef LLVM_LIBC_TEST_SRC_MATH_HYPOTTEST_HARD_TO_ROUND_H			#ifndef LLVM_LIBC_TEST_SRC_MATH_HYPOTTEST_HARD_TO_ROUND_H
	#define LLVM_LIBC_TEST_SRC_MATH_HYPOTTEST_HARD_TO_ROUND_H			#define LLVM_LIBC_TEST_SRC_MATH_HYPOTTEST_HARD_TO_ROUND_H

	#include "utils/MPFRWrapper/MPFRUtils.h"			#include "utils/MPFRWrapper/MPFRUtils.h"

	namespace mpfr = __llvm_libc::testing::mpfr;			namespace mpfr = __llvm_libc::testing::mpfr;

	constexpr int N_HARD_TO_ROUND = 1216;			constexpr int N_HARD_TO_ROUND = 1217;
	constexpr mpfr::BinaryInput<float> HYPOTF_HARD_TO_ROUND[N_HARD_TO_ROUND] = {			constexpr mpfr::BinaryInput<float> HYPOTF_HARD_TO_ROUND[N_HARD_TO_ROUND] = {
				{0x1.faf49ep+25f, 0x1.480002p+23f},
	{0x1.ffffecp-1f, 0x1.000002p+27},			{0x1.ffffecp-1f, 0x1.000002p+27},
	{0x1.900004p+34, 0x1.400002p+23}, /* 45 identical bits */			{0x1.900004p+34, 0x1.400002p+23}, /* 45 identical bits */
	{0x1.05555p+34, 0x1.bffffep+23}, /* 44 identical bits */			{0x1.05555p+34, 0x1.bffffep+23}, /* 44 identical bits */
	{0x1.e5fffap+34, 0x1.affffep+23}, /* 45 identical bits */			{0x1.e5fffap+34, 0x1.affffep+23}, /* 45 identical bits */
	{0x1.260002p+34, 0x1.500002p+23}, /* 45 identical bits */			{0x1.260002p+34, 0x1.500002p+23}, /* 45 identical bits */
	{0x1.fffffap+34, 0x1.fffffep+23}, /* 45 identical bits */			{0x1.fffffap+34, 0x1.fffffep+23}, /* 45 identical bits */
	{0x1.8ffffap+34, 0x1.3ffffep+23}, /* 45 identical bits */			{0x1.8ffffap+34, 0x1.3ffffep+23}, /* 45 identical bits */
	{0x1.87fffcp+35, 0x1.bffffep+23}, /* 47 identical bits */			{0x1.87fffcp+35, 0x1.bffffep+23}, /* 47 identical bits */
	▲ Show 20 Lines • Show All 1,213 Lines • Show Last 20 Lines

This is an archive of the discontinued LLVM Phabricator instance.

[libc] Improve hypotf performance with different algorithm correctly rounded to all rounding modes.
ClosedPublic

Details

Diff Detail

Event Timeline

Revision Contents

Diff 409235

libc/src/math/generic/CMakeLists.txt

libc/src/math/generic/hypotf.cpp

libc/test/src/math/differential_testing/BinaryOpSingleOutputDiff.h

libc/test/src/math/exhaustive/CMakeLists.txt

libc/test/src/math/exhaustive/hypotf_test.cpp

libc/test/src/math/hypotf_hard_to_round.h

This is an archive of the discontinued LLVM Phabricator instance.

[libc] Improve hypotf performance with different algorithm correctly rounded to all rounding modes.ClosedPublic

Details

Diff Detail

Event Timeline

Revision Contents

Diff 409235

libc/src/math/generic/CMakeLists.txt

libc/src/math/generic/hypotf.cpp

libc/test/src/math/differential_testing/BinaryOpSingleOutputDiff.h

libc/test/src/math/exhaustive/CMakeLists.txt

libc/test/src/math/exhaustive/hypotf_test.cpp

libc/test/src/math/hypotf_hard_to_round.h

[libc] Improve hypotf performance with different algorithm correctly rounded to all rounding modes.
ClosedPublic