This is an archive of the discontinued LLVM Phabricator instance.

Paths

Table of Contentst

-
compiler-rt/
-
lib/builtins/
-
builtins/
-
divdf3.c
-
divsf3.c
-
divtf3.c
6/23
fp_div_impl.inc
-
fp_lib.h
-
int_util.h
-
test/builtins/Unit/
-
builtins/
-
Unit/
-
divdf3_test.c

Differential D85031

[builtins] Unify the softfloat division implementation
ClosedPublic

Authored by atrosinenko on Jul 31 2020, 8:00 AM.

Download Raw Diff

Details

Reviewers

koviankevin
joerg
efriedma
compnerd
scanon
sepavloff

Commits

rG0e90d8d4fed8: [builtins] Unify the softfloat division implementation

Summary

This patch replaces three different pre-existing implementations of div[sdt]f3 LibCalls with a generic one - like it is already done for many other LibCalls.

The patch was written with intent of the proof being as self-contained as possible, so future contributors do not have to re-prove it again. On the other hand, this may make it look somewhat cluttered, so feedback on both correctness and readability is highly appreciated.

When fuzzing with AFL++ (25M iterations each type width), just one error was found: for single precision, 0x1.fffffep-126F divided by 2.F was not correctly rounded to exactly 1.0. On the other hand, this patch is just an intentionally simplified version of the full patch introducing a proper support for subnormal results - the full version fixes this issue as well.

This particular diff pretends to be an NFC refactoring and technically the above issue is not a regression because the original implementation yields the same result. :)

Diff Detail

Repository: rG LLVM Github Monorepo

Event Timeline

atrosinenko created this revision.Jul 31 2020, 8:00 AM

Herald added a project: Restricted Project. · View Herald TranscriptJul 31 2020, 8:00 AM

Herald added a subscriber: Restricted Project. · View Herald Transcript

atrosinenko requested review of this revision.Jul 31 2020, 8:01 AM

Revert auto-linting

atrosinenko added a parent revision: D84932: [builtins] Add more test cases for __div[sdt]f3 LibCalls.Jul 31 2020, 8:10 AM

atrosinenko added a child revision: D85032: [builtins] Make divXf3 handle denormal results.Jul 31 2020, 8:12 AM

Here are the benchmark and fuzzing harness used to test this patch.

fuzz_divXf3.sh1 KBDownload

divXf3_fuzzer.c2 KBDownload

bench_divXf3.sh817 BDownload

divXf3_bench.c1 KBDownload

compiler-rt/lib/builtins/fp_div_impl.inc
249–250	Interesting fact: swapping these two seemingly commuting lines makes code slower by 15-25%. This applies to current Clang as well as to `clang-8` from Ubuntu 20.04 repository.

Harbormaster completed remote builds in B66564: Diff 282224.Jul 31 2020, 9:47 AM

Harbormaster completed remote builds in B66565: Diff 282225.Jul 31 2020, 9:57 AM

Refactoring

Harbormaster completed remote builds in B67864: Diff 284623.Aug 11 2020, 3:22 AM

Re-upload after D85731: [NFC][builtins] Make softfloat-related errors less noisy to get rid of "error: unknown type name 'fp_t'" and similar clang-tidy diagnostics for fp_div_impl.inc.

Harbormaster completed remote builds in B67935: Diff 284777.Aug 11 2020, 11:19 AM

atrosinenko edited the summary of this revision. (Show Details)Aug 12 2020, 2:01 AM

atrosinenko added a reviewer: sepavloff.

On linter diagnostics: error messages are due to linter trying to lint *.inc file that is not self-contained by design. D85731: [NFC][builtins] Make softfloat-related errors less noisy tries to make those errors more meaningful, at least.

Some of readability-identifier-naming warnings are just due to code style of runtime library written in C being different from the most of LLVM C++ code base, while some may signify actual issues with my variables' names (but here the preference was to carry as much hints as possible in names).

Ping.

Rebase the entire patch stack against the up-to-date master and re-upload.

Harbormaster completed remote builds in B68873: Diff 286516.Aug 19 2020, 3:23 AM

sepavloff added inline comments.Aug 20 2020, 9:17 AM

compiler-rt/lib/builtins/fp_div_impl.inc
100	This estimation is absent from the original comment. Do you have reference where it came from? Also the original comment states `This is accurate to about 3.5 binary digits`. Is still true? If yes, it could be worth copying here.
102–103	The original comment states: // This doubles the number of correct binary digits in the approximation // with each iteration. It is true in this implementation? If yes, it could be worth copying here.
110	It is good optimization. Could you please put a comment shortly describing the idea of using half-sized temporaries?
115	In what cases 16-bit temporaries are used? `NUMBER_OF_HALF_ITERATIONS` is set to zero in `divsf3.c`.
131	It would be better to put short comment to explain using 0 instead of 2.
185	`x_UQ0_hw` and `b_UQ1_hw` are declared inside the conditional block `#if NUMBER_OF_HALF_ITERATIONS > 0`. Does `NUMBER_OF_FULL_ITERATIONS != 1` always imply `NUMBER_OF_HALF_ITERATIONS > 0` ?

Thank you @sepavloff !

Some general context: The final goal was to have an explanation why this particular number of iteration (3, 4 or 5, depending on type) are enough for any a and b passed as input arguments taking into account errors due to particular finite precision computations. Initially, I was trying to just "mechanically" unify the three implementations and their comments like this. After trying for a while, turned out I cannot simply pick the original proof up and add the subnormal case. Some of statements were too vague, some seemed not explained enough, etc. Then, an attempt was made to re-prove it from scratch with an intention for the resulting explanation to be as much self-contained as possible. The implementation, on the other hand, gathers various features of three original functions and some hacks to make it easier to prove.

compiler-rt/lib/builtins/fp_div_impl.inc
100	This approximation was deduced by writing down the derivative of `f` "in infinite precision" and finding its root. Then the values of `f` applied to its root, 1.0 and 2.0 were calculated -- as far as I remember, all of them were `3/4 - 1/sqrt(2)` or negated - this is what "minimax polynomial" probably means, that term was just copied from the original implementation :).
102–103	For me this looks too vague. This is probably approximately true but I don't know how exactly this should be interpreted.
110	The idea is just "I guess this takes less CPU time and I have managed to prove error bounds for it". :) Specifically, for float128, the rep_t * rep_t multiplication will be emulated with lots of CPU instructions while the lower half contain some noise at that point. This particular optimization did exist in the original implementation for float64 and float128. For float32 it had not much sense, I guess. Still, estimations were calculated for the case of float32 with half-size iterations as it may be useful for MSP430 and other 16-bit targets.
115	Agree, this needs to be re-evaluated and some comment should be added at least. This could be dead code for now, that was expected to speed things up on 16-bit targets that even lack hardware multiplication sometimes (such as MSP430).
131	Agree, it was expected to be something like `/* = 2.0 in UQ1.(HW-1) */`. Naming things is especially painful here...
185	Does NUMBER_OF_FULL_ITERATIONS != 1 always imply NUMBER_OF_HALF_ITERATIONS > 0 ? Hmm... It should imply `== 0`, at first glance... Generally, total number of iterations should be 3 for f32, 4 for f64 and 5 for f128. Then, error bounds are calculated. There are generally only two modes: n-1 half-size iteration + 1 full-size iteration OR n full-size iteration (as one generally has no performance gains from using 16x16-bit multiplications on one hand, and that particular case turned out to require extra rounding, on the other).

sepavloff added inline comments.Aug 20 2020, 10:09 PM

compiler-rt/lib/builtins/fp_div_impl.inc
100	IIUC, you don't want to put this statement here because you are us not sure it is true? Sounds reasonable.
110	The idea is clear but it require some study of the sources. I would propose to add a comment saying: At the first iterations number of significant digits is small, so we may use shorter type values. Operations on them are usually faster. or something like that.
131	2.0 cannot be represented in UQ1.X. I would add comment line like: Due to wrapping 2.0 in UQ1.X is equivalent to 0. or something similar.
185	I have concern that `x_UQ0_hw` and `x_UQ1_hw` are declared in the block with condition `NUMBER_OF_HALF_ITERATIONS > 0` but uses in the other block with condition `!USE_NATIVE_FULL_ITERATIONS && NUMBER_OF_HALF_ITERATIONS > 0`, so probably there may be a combination of the macros when the variables are used but not declared. Maybe it is impossible due to some reason, in this case a proper check may be put into the block `#ifdef USE_NATIVE_FULL_ITERATIONS` which asserts that `NUMBER_OF_HALF_ITERATIONS > 0`. Otherwise `x_UQ0_hw` and `x_UQ1_hw` need to be moved out of the conditional block.

Addressed the review comments mostly by clarifying the explanations.

I expect this code to have no unresolved review comments now. Please feel free to request further explanations in case of any unclarity.

Harbormaster completed remote builds in B69304: Diff 287351.Aug 24 2020, 6:04 AM

LGTM.

I don't fully understand the magic of fixing possible overflow, I hope you made enough investigation and testing to be sure it works as expected.
Please wait a couple of days before commit, so that other reviewers could make their notes.

compiler-rt/lib/builtins/fp_div_impl.inc
143	Should the right part contain `1/b`?

This revision is now accepted and ready to land.Aug 26 2020, 6:41 AM

atrosinenko removed a parent revision: D84932: [builtins] Add more test cases for __div[sdt]f3 LibCalls.Aug 27 2020, 8:32 AM

No-change re-upload: rebase onto current master branch.

atrosinenko added inline comments.Aug 27 2020, 9:14 AM

compiler-rt/lib/builtins/fp_div_impl.inc
143	What line are you referring to? For line 142, `e_0` is defined as `x_n - 1/b_hw` in infinite precision (please note it intentionally refers `b_hw` that is a truncated version of `b`, see lines 113-114).

Harbormaster completed remote builds in B69792: Diff 288348.Aug 27 2020, 9:23 AM

This update is expected to be completely NFC w.r.t. code behavior and significantly clarify the proof up to the end of half-width iterations.

Particularly, the reasoning about possible overflow of intermediate results turned out to be actually unclear/incorrect.

@sepavloff could you take a look on the new version in case it clarifies some of your questions? Another update for the second half of function may follow slightly later.

Harbormaster completed remote builds in B69915: Diff 288599.Aug 28 2020, 7:27 AM

Add some other explanations.

Harbormaster completed remote builds in B69927: Diff 288623.Aug 28 2020, 9:37 AM

Add more clarifications, fix explanation for "why it is enough to adjust only once in case of overflow".

Herald added a subscriber: danielkiss. · View Herald TranscriptAug 30 2020, 6:01 AM

Harbormaster completed remote builds in B70036: Diff 288850.Aug 30 2020, 6:32 AM

The new comments are much better, thank you!
I think this version may be committed.

Clarify rounding-related part of function.

Harbormaster completed remote builds in B70090: Diff 288932.Aug 31 2020, 6:40 AM

scanon added inline comments.Aug 31 2020, 6:54 AM

compiler-rt/lib/builtins/fp_div_impl.inc
100	To be precise, what _minimax polynomial_ means is that p(x) = 3/4 + 1/sqrt(2) - x/2 is the first-order polynomial that minimizes the error term max(\|1/x - p(x)\|) on the interval [1,2]. I.e. every other linear polynomial would achieve a larger maximum error. The bound of a minimax approximation to a well-behaved function is always achieved at the endpoints, so we can just evaluate at 1 to get the max error: \|1/1 - 3/4 - 1/sqrt(2) + 1/2\| = 3/4 - 1/sqrt(2) = 0.04289... (which is actually about _4.5_ bits).
102–103	N-R is quadratically convergent under a bunch of assumptions on how good the initial guess is and bounds on the second derivative, which are all satisfied here, but probably not worth going into in the comments. IIRC the usual reference here is Montuschi and Mezzalama's "Survey of square rooting algorithms" (1990).
110	This is absolutely standard in HW construction of pipelined iterative dividers and square root units, so I'm not sure how much explanation is really needed =)

sepavloff added inline comments.Aug 31 2020, 7:58 AM

compiler-rt/lib/builtins/fp_div_impl.inc
110	This is absolutely standard in HW construction of pipelined iterative dividers and square root units, so I'm not sure how much explanation is really needed =) I think now the code has enough explanations to be easily understood by mere mortals also :)

Update after the latest comments.

Thank you very much for review!

I have amended this diff based on the latest comment by @scanon.

So, I will land D85031 and then D85032: [builtins] Make divXf3 handle denormal results if there are no other objections.

Harbormaster completed remote builds in B70141: Diff 289022.Aug 31 2020, 2:16 PM

This revision was landed with ongoing or failed builds.Sep 1 2020, 9:25 AM

Closed by commit rG0e90d8d4fed8: [builtins] Unify the softfloat division implementation (authored by atrosinenko). · Explain Why

This revision was automatically updated to reflect the committed changes.

atrosinenko added a commit: rG0e90d8d4fed8: [builtins] Unify the softfloat division implementation.

atrosinenko mentioned this in rG93eed63d2f31: [builtins] Make __div[sdt]f3 handle denormal results.Sep 1 2020, 11:55 AM

Revision Contents

Path

Size

compiler-rt/

lib/

builtins/

189 lines

174 lines

203 lines

414 lines

7 lines

16 lines

test/

builtins/

Unit/

divdf3_test.c

8 lines

Diff 289022

compiler-rt/lib/builtins/divdf3.c

	//===-- lib/divdf3.c - Double-precision division ------------------- C --===//			//===-- lib/divdf3.c - Double-precision division ------------------- C --===//
	//			//
	// Part of the LLVM Project, under the Apache License v2.0 with LLVM Exceptions.			// Part of the LLVM Project, under the Apache License v2.0 with LLVM Exceptions.
	// See https://llvm.org/LICENSE.txt for license information.			// See https://llvm.org/LICENSE.txt for license information.
	// SPDX-License-Identifier: Apache-2.0 WITH LLVM-exception			// SPDX-License-Identifier: Apache-2.0 WITH LLVM-exception
	//			//
	//===----------------------------------------------------------------------===//			//===----------------------------------------------------------------------===//
	//			//
	// This file implements double-precision soft-float division			// This file implements double-precision soft-float division
	// with the IEEE-754 default rounding (to nearest, ties to even).			// with the IEEE-754 default rounding (to nearest, ties to even).
	//			//
	// For simplicity, this implementation currently flushes denormals to zero.
	// It should be a fairly straightforward exercise to implement gradual
	// underflow with correct rounding.
	//
	//===----------------------------------------------------------------------===//			//===----------------------------------------------------------------------===//

	#define DOUBLE_PRECISION			#define DOUBLE_PRECISION
	#include "fp_lib.h"

	COMPILER_RT_ABI fp_t __divdf3(fp_t a, fp_t b) {

	const unsigned int aExponent = toRep(a) >> significandBits & maxExponent;
	const unsigned int bExponent = toRep(b) >> significandBits & maxExponent;
	const rep_t quotientSign = (toRep(a) ^ toRep(b)) & signBit;

	rep_t aSignificand = toRep(a) & significandMask;
	rep_t bSignificand = toRep(b) & significandMask;
	int scale = 0;

	// Detect if a or b is zero, denormal, infinity, or NaN.
	if (aExponent - 1U >= maxExponent - 1U \|\|
	bExponent - 1U >= maxExponent - 1U) {

	const rep_t aAbs = toRep(a) & absMask;
	const rep_t bAbs = toRep(b) & absMask;

	// NaN / anything = qNaN
	if (aAbs > infRep)
	return fromRep(toRep(a) \| quietBit);
	// anything / NaN = qNaN
	if (bAbs > infRep)
	return fromRep(toRep(b) \| quietBit);

	if (aAbs == infRep) {
	// infinity / infinity = NaN
	if (bAbs == infRep)
	return fromRep(qnanRep);
	// infinity / anything else = +/- infinity
	else
	return fromRep(aAbs \| quotientSign);
	}

	// anything else / infinity = +/- 0
	if (bAbs == infRep)
	return fromRep(quotientSign);

	if (!aAbs) {
	// zero / zero = NaN
	if (!bAbs)
	return fromRep(qnanRep);
	// zero / anything else = +/- zero
	else
	return fromRep(quotientSign);
	}
	// anything else / zero = +/- infinity
	if (!bAbs)
	return fromRep(infRep \| quotientSign);

	// One or both of a or b is denormal. The other (if applicable) is a
	// normal number. Renormalize one or both of a and b, and set scale to
	// include the necessary exponent adjustment.
	if (aAbs < implicitBit)
	scale += normalize(&aSignificand);
	if (bAbs < implicitBit)
	scale -= normalize(&bSignificand);
	}

	// Set the implicit significand bit. If we fell through from the
	// denormal path it was already set by normalize( ), but setting it twice
	// won't hurt anything.
	aSignificand \|= implicitBit;
	bSignificand \|= implicitBit;
	int quotientExponent = aExponent - bExponent + scale;

	// Align the significand of b as a Q31 fixed-point number in the range
	// [1, 2.0) and get a Q32 approximate reciprocal using a small minimax
	// polynomial approximation: reciprocal = 3/4 + 1/sqrt(2) - b/2. This
	// is accurate to about 3.5 binary digits.
	const uint32_t q31b = bSignificand >> 21;
	uint32_t recip32 = UINT32_C(0x7504f333) - q31b;
	// 0x7504F333 / 2^32 + 1 = 3/4 + 1/sqrt(2)

	// Now refine the reciprocal estimate using a Newton-Raphson iteration:
	//
	// x1 = x0 * (2 - x0 * b)
	//
	// This doubles the number of correct binary digits in the approximation
	// with each iteration.
	uint32_t correction32;
	correction32 = -((uint64_t)recip32 * q31b >> 32);
	recip32 = (uint64_t)recip32 * correction32 >> 31;
	correction32 = -((uint64_t)recip32 * q31b >> 32);
	recip32 = (uint64_t)recip32 * correction32 >> 31;
	correction32 = -((uint64_t)recip32 * q31b >> 32);
	recip32 = (uint64_t)recip32 * correction32 >> 31;

	// The reciprocal may have overflowed to zero if the upper half of b is
	// exactly 1.0. This would sabatoge the full-width final stage of the
	// computation that follows, so we adjust the reciprocal down by one bit.
	recip32--;

	// We need to perform one more iteration to get us to 56 binary digits.
	// The last iteration needs to happen with extra precision.
	const uint32_t q63blo = bSignificand << 11;
	uint64_t correction, reciprocal;
	correction = -((uint64_t)recip32 * q31b + ((uint64_t)recip32 * q63blo >> 32));
	uint32_t cHi = correction >> 32;
	uint32_t cLo = correction;
	reciprocal = (uint64_t)recip32 * cHi + ((uint64_t)recip32 * cLo >> 32);

	// Adjust the final 64-bit reciprocal estimate downward to ensure that it is
	// strictly smaller than the infinitely precise exact reciprocal. Because
	// the computation of the Newton-Raphson step is truncating at every step,
	// this adjustment is small; most of the work is already done.
	reciprocal -= 2;

	// The numerical reciprocal is accurate to within 2^-56, lies in the
	// interval [0.5, 1.0), and is strictly smaller than the true reciprocal
	// of b. Multiplying a by this reciprocal thus gives a numerical q = a/b
	// in Q53 with the following properties:
	//
	// 1. q < a/b
	// 2. q is in the interval [0.5, 2.0)
	// 3. The error in q is bounded away from 2^-53 (actually, we have a
	// couple of bits to spare, but this is all we need).

	// We need a 64 x 64 multiply high to compute q, which isn't a basic
	// operation in C, so we need to be a little bit fussy.
	rep_t quotient, quotientLo;
	wideMultiply(aSignificand << 2, reciprocal, &quotient, &quotientLo);

	// Two cases: quotient is in [0.5, 1.0) or quotient is in [1.0, 2.0).
	// In either case, we are going to compute a residual of the form
	//
	// r = a - q*b
	//
	// We know from the construction of q that r satisfies:
	//
	// 0 <= r < ulp(q)*b
	//
	// If r is greater than 1/2 ulp(q)*b, then q rounds up. Otherwise, we
	// already have the correct result. The exact halfway case cannot occur.
	// We also take this time to right shift quotient if it falls in the [1,2)
	// range and adjust the exponent accordingly.
	rep_t residual;
	if (quotient < (implicitBit << 1)) {
	residual = (aSignificand << 53) - quotient * bSignificand;
	quotientExponent--;
	} else {
	quotient >>= 1;
	residual = (aSignificand << 52) - quotient * bSignificand;
	}

	const int writtenExponent = quotientExponent + exponentBias;

	if (writtenExponent >= maxExponent) {			#define NUMBER_OF_HALF_ITERATIONS 3
	// If we have overflowed the exponent, return infinity.			#define NUMBER_OF_FULL_ITERATIONS 1
	return fromRep(infRep \| quotientSign);
	}

	else if (writtenExponent < 1) {			#include "fp_div_impl.inc"
	if (writtenExponent == 0) {
	// Check whether the rounded result is normal.
	const bool round = (residual << 1) > bSignificand;
	// Clear the implicit bit.
	rep_t absResult = quotient & significandMask;
	// Round.
	absResult += round;
	if (absResult & ~significandMask) {
	// The rounded result is normal; return it.
	return fromRep(absResult \| quotientSign);
	}
	}
	// Flush denormals to zero. In the future, it would be nice to add
	// code to round them correctly.
	return fromRep(quotientSign);
	}

	else {			COMPILER_RT_ABI fp_t __divdf3(fp_t a, fp_t b) { return __divXf3__(a, b); }
				Lint: Pre-merge checks Inline Actions clang-tidy: warning: invalid case style for function '__divdf3' [readability-identifier-naming] not useful clang-tidy: warning: invalid case style for parameter 'a' [readability-identifier-naming] not useful clang-tidy: warning: invalid case style for parameter 'b' [readability-identifier-naming] not useful Lint: Pre-merge checks: clang-tidy: warning: invalid case style for function '__divdf3' [readability-identifier-naming]…
	const bool round = (residual << 1) > bSignificand;
	// Clear the implicit bit.
	rep_t absResult = quotient & significandMask;
	// Insert the exponent.
	absResult \|= (rep_t)writtenExponent << significandBits;
	// Round.
	absResult += round;
	// Insert the sign and return.
	const double result = fromRep(absResult \| quotientSign);
	return result;
	}
	}

	#if defined(__ARM_EABI__)			#if defined(__ARM_EABI__)
	#if defined(COMPILER_RT_ARMHF_TARGET)			#if defined(COMPILER_RT_ARMHF_TARGET)
	AEABI_RTABI fp_t __aeabi_ddiv(fp_t a, fp_t b) { return __divdf3(a, b); }			AEABI_RTABI fp_t __aeabi_ddiv(fp_t a, fp_t b) { return __divdf3(a, b); }
	#else			#else
	COMPILER_RT_ALIAS(__divdf3, __aeabi_ddiv)			COMPILER_RT_ALIAS(__divdf3, __aeabi_ddiv)
	#endif			#endif
	#endif			#endif

compiler-rt/lib/builtins/divsf3.c

	//===-- lib/divsf3.c - Single-precision division ------------------- C --===//			//===-- lib/divsf3.c - Single-precision division ------------------- C --===//
	//			//
	// Part of the LLVM Project, under the Apache License v2.0 with LLVM Exceptions.			// Part of the LLVM Project, under the Apache License v2.0 with LLVM Exceptions.
	// See https://llvm.org/LICENSE.txt for license information.			// See https://llvm.org/LICENSE.txt for license information.
	// SPDX-License-Identifier: Apache-2.0 WITH LLVM-exception			// SPDX-License-Identifier: Apache-2.0 WITH LLVM-exception
	//			//
	//===----------------------------------------------------------------------===//			//===----------------------------------------------------------------------===//
	//			//
	// This file implements single-precision soft-float division			// This file implements single-precision soft-float division
	// with the IEEE-754 default rounding (to nearest, ties to even).			// with the IEEE-754 default rounding (to nearest, ties to even).
	//			//
	// For simplicity, this implementation currently flushes denormals to zero.
	// It should be a fairly straightforward exercise to implement gradual
	// underflow with correct rounding.
	//
	//===----------------------------------------------------------------------===//			//===----------------------------------------------------------------------===//

	#define SINGLE_PRECISION			#define SINGLE_PRECISION
	#include "fp_lib.h"

	COMPILER_RT_ABI fp_t __divsf3(fp_t a, fp_t b) {

	const unsigned int aExponent = toRep(a) >> significandBits & maxExponent;
	const unsigned int bExponent = toRep(b) >> significandBits & maxExponent;
	const rep_t quotientSign = (toRep(a) ^ toRep(b)) & signBit;

	rep_t aSignificand = toRep(a) & significandMask;
	rep_t bSignificand = toRep(b) & significandMask;
	int scale = 0;

	// Detect if a or b is zero, denormal, infinity, or NaN.
	if (aExponent - 1U >= maxExponent - 1U \|\|
	bExponent - 1U >= maxExponent - 1U) {

	const rep_t aAbs = toRep(a) & absMask;
	const rep_t bAbs = toRep(b) & absMask;

	// NaN / anything = qNaN
	if (aAbs > infRep)
	return fromRep(toRep(a) \| quietBit);
	// anything / NaN = qNaN
	if (bAbs > infRep)
	return fromRep(toRep(b) \| quietBit);

	if (aAbs == infRep) {
	// infinity / infinity = NaN
	if (bAbs == infRep)
	return fromRep(qnanRep);
	// infinity / anything else = +/- infinity
	else
	return fromRep(aAbs \| quotientSign);
	}

	// anything else / infinity = +/- 0
	if (bAbs == infRep)
	return fromRep(quotientSign);

	if (!aAbs) {
	// zero / zero = NaN
	if (!bAbs)
	return fromRep(qnanRep);
	// zero / anything else = +/- zero
	else
	return fromRep(quotientSign);
	}
	// anything else / zero = +/- infinity
	if (!bAbs)
	return fromRep(infRep \| quotientSign);

	// One or both of a or b is denormal. The other (if applicable) is a
	// normal number. Renormalize one or both of a and b, and set scale to
	// include the necessary exponent adjustment.
	if (aAbs < implicitBit)
	scale += normalize(&aSignificand);
	if (bAbs < implicitBit)
	scale -= normalize(&bSignificand);
	}

	// Set the implicit significand bit. If we fell through from the
	// denormal path it was already set by normalize( ), but setting it twice
	// won't hurt anything.
	aSignificand \|= implicitBit;
	bSignificand \|= implicitBit;
	int quotientExponent = aExponent - bExponent + scale;
	// 0x7504F333 / 2^32 + 1 = 3/4 + 1/sqrt(2)

	// Align the significand of b as a Q31 fixed-point number in the range
	// [1, 2.0) and get a Q32 approximate reciprocal using a small minimax
	// polynomial approximation: reciprocal = 3/4 + 1/sqrt(2) - b/2. This
	// is accurate to about 3.5 binary digits.
	uint32_t q31b = bSignificand << 8;
	uint32_t reciprocal = UINT32_C(0x7504f333) - q31b;

	// Now refine the reciprocal estimate using a Newton-Raphson iteration:
	//
	// x1 = x0 * (2 - x0 * b)
	//
	// This doubles the number of correct binary digits in the approximation
	// with each iteration.
	uint32_t correction;
	correction = -((uint64_t)reciprocal * q31b >> 32);
	reciprocal = (uint64_t)reciprocal * correction >> 31;
	correction = -((uint64_t)reciprocal * q31b >> 32);
	reciprocal = (uint64_t)reciprocal * correction >> 31;
	correction = -((uint64_t)reciprocal * q31b >> 32);
	reciprocal = (uint64_t)reciprocal * correction >> 31;

	// Adust the final 32-bit reciprocal estimate downward to ensure that it is
	// strictly smaller than the infinitely precise exact reciprocal. Because
	// the computation of the Newton-Raphson step is truncating at every step,
	// this adjustment is small; most of the work is already done.
	reciprocal -= 2;

	// The numerical reciprocal is accurate to within 2^-28, lies in the
	// interval [0x1.000000eep-1, 0x1.fffffffcp-1], and is strictly smaller
	// than the true reciprocal of b. Multiplying a by this reciprocal thus
	// gives a numerical q = a/b in Q24 with the following properties:
	//
	// 1. q < a/b
	// 2. q is in the interval [0x1.000000eep-1, 0x1.fffffffcp0)
	// 3. The error in q is at most 2^-24 + 2^-27 -- the 2^24 term comes
	// from the fact that we truncate the product, and the 2^27 term
	// is the error in the reciprocal of b scaled by the maximum
	// possible value of a. As a consequence of this error bound,
	// either q or nextafter(q) is the correctly rounded.
	rep_t quotient = (uint64_t)reciprocal * (aSignificand << 1) >> 32;

	// Two cases: quotient is in [0.5, 1.0) or quotient is in [1.0, 2.0).
	// In either case, we are going to compute a residual of the form
	//
	// r = a - q*b
	//
	// We know from the construction of q that r satisfies:
	//
	// 0 <= r < ulp(q)*b
	//
	// If r is greater than 1/2 ulp(q)*b, then q rounds up. Otherwise, we
	// already have the correct result. The exact halfway case cannot occur.
	// We also take this time to right shift quotient if it falls in the [1,2)
	// range and adjust the exponent accordingly.
	rep_t residual;
	if (quotient < (implicitBit << 1)) {
	residual = (aSignificand << 24) - quotient * bSignificand;
	quotientExponent--;
	} else {
	quotient >>= 1;
	residual = (aSignificand << 23) - quotient * bSignificand;
	}

	const int writtenExponent = quotientExponent + exponentBias;

	if (writtenExponent >= maxExponent) {			#define NUMBER_OF_HALF_ITERATIONS 0
	// If we have overflowed the exponent, return infinity.			#define NUMBER_OF_FULL_ITERATIONS 3
	return fromRep(infRep \| quotientSign);			#define USE_NATIVE_FULL_ITERATIONS
	}

	else if (writtenExponent < 1) {			#include "fp_div_impl.inc"
	if (writtenExponent == 0) {
	// Check whether the rounded result is normal.
	const bool round = (residual << 1) > bSignificand;
	// Clear the implicit bit.
	rep_t absResult = quotient & significandMask;
	// Round.
	absResult += round;
	if (absResult & ~significandMask) {
	// The rounded result is normal; return it.
	return fromRep(absResult \| quotientSign);
	}
	}
	// Flush denormals to zero. In the future, it would be nice to add
	// code to round them correctly.
	return fromRep(quotientSign);
	}

	else {			COMPILER_RT_ABI fp_t __divsf3(fp_t a, fp_t b) { return __divXf3__(a, b); }
				Lint: Pre-merge checks Inline Actions clang-tidy: warning: invalid case style for function '__divsf3' [readability-identifier-naming] not useful clang-tidy: warning: invalid case style for parameter 'a' [readability-identifier-naming] not useful clang-tidy: warning: invalid case style for parameter 'b' [readability-identifier-naming] not useful Lint: Pre-merge checks: clang-tidy: warning: invalid case style for function '__divsf3' [readability-identifier-naming]…
	const bool round = (residual << 1) > bSignificand;
	// Clear the implicit bit.
	rep_t absResult = quotient & significandMask;
	// Insert the exponent.
	absResult \|= (rep_t)writtenExponent << significandBits;
	// Round.
	absResult += round;
	// Insert the sign and return.
	return fromRep(absResult \| quotientSign);
	}
	}

	#if defined(__ARM_EABI__)			#if defined(__ARM_EABI__)
	#if defined(COMPILER_RT_ARMHF_TARGET)			#if defined(COMPILER_RT_ARMHF_TARGET)
	AEABI_RTABI fp_t __aeabi_fdiv(fp_t a, fp_t b) { return __divsf3(a, b); }			AEABI_RTABI fp_t __aeabi_fdiv(fp_t a, fp_t b) { return __divsf3(a, b); }
	#else			#else
	COMPILER_RT_ALIAS(__divsf3, __aeabi_fdiv)			COMPILER_RT_ALIAS(__divsf3, __aeabi_fdiv)
	#endif			#endif
	#endif			#endif

compiler-rt/lib/builtins/divtf3.c

	//===-- lib/divtf3.c - Quad-precision division --------------------- C --===//			//===-- lib/divtf3.c - Quad-precision division --------------------- C --===//
	//			//
	// Part of the LLVM Project, under the Apache License v2.0 with LLVM Exceptions.			// Part of the LLVM Project, under the Apache License v2.0 with LLVM Exceptions.
	// See https://llvm.org/LICENSE.txt for license information.			// See https://llvm.org/LICENSE.txt for license information.
	// SPDX-License-Identifier: Apache-2.0 WITH LLVM-exception			// SPDX-License-Identifier: Apache-2.0 WITH LLVM-exception
	//			//
	//===----------------------------------------------------------------------===//			//===----------------------------------------------------------------------===//
	//			//
	// This file implements quad-precision soft-float division			// This file implements quad-precision soft-float division
	// with the IEEE-754 default rounding (to nearest, ties to even).			// with the IEEE-754 default rounding (to nearest, ties to even).
	//			//
	// For simplicity, this implementation currently flushes denormals to zero.
	// It should be a fairly straightforward exercise to implement gradual
	// underflow with correct rounding.
	//
	//===----------------------------------------------------------------------===//			//===----------------------------------------------------------------------===//

	#define QUAD_PRECISION			#define QUAD_PRECISION
	#include "fp_lib.h"			#include "fp_lib.h"

	#if defined(CRT_HAS_128BIT) && defined(CRT_LDBL_128BIT)			#if defined(CRT_HAS_128BIT) && defined(CRT_LDBL_128BIT)
	COMPILER_RT_ABI fp_t __divtf3(fp_t a, fp_t b) {

	const unsigned int aExponent = toRep(a) >> significandBits & maxExponent;			#define NUMBER_OF_HALF_ITERATIONS 4
	const unsigned int bExponent = toRep(b) >> significandBits & maxExponent;			#define NUMBER_OF_FULL_ITERATIONS 1
	const rep_t quotientSign = (toRep(a) ^ toRep(b)) & signBit;

	rep_t aSignificand = toRep(a) & significandMask;
	rep_t bSignificand = toRep(b) & significandMask;
	int scale = 0;

	// Detect if a or b is zero, denormal, infinity, or NaN.
	if (aExponent - 1U >= maxExponent - 1U \|\|
	bExponent - 1U >= maxExponent - 1U) {

	const rep_t aAbs = toRep(a) & absMask;
	const rep_t bAbs = toRep(b) & absMask;

	// NaN / anything = qNaN
	if (aAbs > infRep)
	return fromRep(toRep(a) \| quietBit);
	// anything / NaN = qNaN
	if (bAbs > infRep)
	return fromRep(toRep(b) \| quietBit);

	if (aAbs == infRep) {
	// infinity / infinity = NaN
	if (bAbs == infRep)
	return fromRep(qnanRep);
	// infinity / anything else = +/- infinity
	else
	return fromRep(aAbs \| quotientSign);
	}

	// anything else / infinity = +/- 0
	if (bAbs == infRep)
	return fromRep(quotientSign);

	if (!aAbs) {
	// zero / zero = NaN
	if (!bAbs)
	return fromRep(qnanRep);
	// zero / anything else = +/- zero
	else
	return fromRep(quotientSign);
	}
	// anything else / zero = +/- infinity
	if (!bAbs)
	return fromRep(infRep \| quotientSign);

	// One or both of a or b is denormal. The other (if applicable) is a
	// normal number. Renormalize one or both of a and b, and set scale to
	// include the necessary exponent adjustment.
	if (aAbs < implicitBit)
	scale += normalize(&aSignificand);
	if (bAbs < implicitBit)
	scale -= normalize(&bSignificand);
	}

	// Set the implicit significand bit. If we fell through from the
	// denormal path it was already set by normalize( ), but setting it twice
	// won't hurt anything.
	aSignificand \|= implicitBit;
	bSignificand \|= implicitBit;
	int quotientExponent = aExponent - bExponent + scale;

	// Align the significand of b as a Q63 fixed-point number in the range
	// [1, 2.0) and get a Q64 approximate reciprocal using a small minimax
	// polynomial approximation: reciprocal = 3/4 + 1/sqrt(2) - b/2. This
	// is accurate to about 3.5 binary digits.
	const uint64_t q63b = bSignificand >> 49;
	uint64_t recip64 = UINT64_C(0x7504f333F9DE6484) - q63b;
	// 0x7504f333F9DE6484 / 2^64 + 1 = 3/4 + 1/sqrt(2)

	// Now refine the reciprocal estimate using a Newton-Raphson iteration:			#include "fp_div_impl.inc"
	//
	// x1 = x0 * (2 - x0 * b)
	//
	// This doubles the number of correct binary digits in the approximation
	// with each iteration.
	uint64_t correction64;
	correction64 = -((rep_t)recip64 * q63b >> 64);
	recip64 = (rep_t)recip64 * correction64 >> 63;
	correction64 = -((rep_t)recip64 * q63b >> 64);
	recip64 = (rep_t)recip64 * correction64 >> 63;
	correction64 = -((rep_t)recip64 * q63b >> 64);
	recip64 = (rep_t)recip64 * correction64 >> 63;
	correction64 = -((rep_t)recip64 * q63b >> 64);
	recip64 = (rep_t)recip64 * correction64 >> 63;
	correction64 = -((rep_t)recip64 * q63b >> 64);
	recip64 = (rep_t)recip64 * correction64 >> 63;

	// The reciprocal may have overflowed to zero if the upper half of b is
	// exactly 1.0. This would sabatoge the full-width final stage of the
	// computation that follows, so we adjust the reciprocal down by one bit.
	recip64--;

	// We need to perform one more iteration to get us to 112 binary digits;
	// The last iteration needs to happen with extra precision.
	const uint64_t q127blo = bSignificand << 15;
	rep_t correction, reciprocal;

	// NOTE: This operation is equivalent to __multi3, which is not implemented
	// in some architechure
	rep_t r64q63, r64q127, r64cH, r64cL, dummy;
	wideMultiply((rep_t)recip64, (rep_t)q63b, &dummy, &r64q63);
	wideMultiply((rep_t)recip64, (rep_t)q127blo, &dummy, &r64q127);

	correction = -(r64q63 + (r64q127 >> 64));

	uint64_t cHi = correction >> 64;
	uint64_t cLo = correction;

	wideMultiply((rep_t)recip64, (rep_t)cHi, &dummy, &r64cH);
	wideMultiply((rep_t)recip64, (rep_t)cLo, &dummy, &r64cL);

	reciprocal = r64cH + (r64cL >> 64);

	// Adjust the final 128-bit reciprocal estimate downward to ensure that it
	// is strictly smaller than the infinitely precise exact reciprocal. Because
	// the computation of the Newton-Raphson step is truncating at every step,
	// this adjustment is small; most of the work is already done.
	reciprocal -= 2;

	// The numerical reciprocal is accurate to within 2^-112, lies in the
	// interval [0.5, 1.0), and is strictly smaller than the true reciprocal
	// of b. Multiplying a by this reciprocal thus gives a numerical q = a/b
	// in Q127 with the following properties:
	//
	// 1. q < a/b
	// 2. q is in the interval [0.5, 2.0)
	// 3. The error in q is bounded away from 2^-113 (actually, we have a
	// couple of bits to spare, but this is all we need).

	// We need a 128 x 128 multiply high to compute q, which isn't a basic
	// operation in C, so we need to be a little bit fussy.
	rep_t quotient, quotientLo;
	wideMultiply(aSignificand << 2, reciprocal, &quotient, &quotientLo);

	// Two cases: quotient is in [0.5, 1.0) or quotient is in [1.0, 2.0).			COMPILER_RT_ABI fp_t __divtf3(fp_t a, fp_t b) { return __divXf3__(a, b); }
	// In either case, we are going to compute a residual of the form
	//
	// r = a - q*b
	//
	// We know from the construction of q that r satisfies:
	//
	// 0 <= r < ulp(q)*b
	//
	// If r is greater than 1/2 ulp(q)*b, then q rounds up. Otherwise, we
	// already have the correct result. The exact halfway case cannot occur.
	// We also take this time to right shift quotient if it falls in the [1,2)
	// range and adjust the exponent accordingly.
	rep_t residual;
	rep_t qb;

	if (quotient < (implicitBit << 1)) {
	wideMultiply(quotient, bSignificand, &dummy, &qb);
	residual = (aSignificand << 113) - qb;
	quotientExponent--;
	} else {
	quotient >>= 1;
	wideMultiply(quotient, bSignificand, &dummy, &qb);
	residual = (aSignificand << 112) - qb;
	}

	const int writtenExponent = quotientExponent + exponentBias;

	if (writtenExponent >= maxExponent) {
	// If we have overflowed the exponent, return infinity.
	return fromRep(infRep \| quotientSign);
	} else if (writtenExponent < 1) {
	if (writtenExponent == 0) {
	// Check whether the rounded result is normal.
	const bool round = (residual << 1) > bSignificand;
	// Clear the implicit bit.
	rep_t absResult = quotient & significandMask;
	// Round.
	absResult += round;
	if (absResult & ~significandMask) {
	// The rounded result is normal; return it.
	return fromRep(absResult \| quotientSign);
	}
	}
	// Flush denormals to zero. In the future, it would be nice to add
	// code to round them correctly.
	return fromRep(quotientSign);
	} else {
	const bool round = (residual << 1) >= bSignificand;
	// Clear the implicit bit.
	rep_t absResult = quotient & significandMask;
	// Insert the exponent.
	absResult \|= (rep_t)writtenExponent << significandBits;
	// Round.
	absResult += round;
	// Insert the sign and return.
	const fp_t result = fromRep(absResult \| quotientSign);
	return result;
	}
	}

	#endif			#endif

compiler-rt/lib/builtins/fp_div_impl.inc

This file was added.

				//===-- fp_div_impl.inc - Floating point division ------------------ C --===//
				//
				// Part of the LLVM Project, under the Apache License v2.0 with LLVM Exceptions.
				// See https://llvm.org/LICENSE.txt for license information.
				// SPDX-License-Identifier: Apache-2.0 WITH LLVM-exception
				//
				//===----------------------------------------------------------------------===//
				//
				// This file implements soft-float division with the IEEE-754 default
				// rounding (to nearest, ties to even).
				//
				//===----------------------------------------------------------------------===//

				#include "fp_lib.h"

				// The __divXf3__ function implements Newton-Raphson floating point division.
				// It uses 3 iterations for float32, 4 for float64 and 5 for float128,
				// respectively. Due to number of significant bits being roughly doubled
				// every iteration, the two modes are supported: N full-width iterations (as
				// it is done for float32 by default) and (N-1) half-width iteration plus one
				// final full-width iteration. It is expected that half-width integer
				// operations (w.r.t rep_t size) can be performed faster for some hardware but
				// they require error estimations to be computed separately due to larger
				// computational errors caused by truncating intermediate results.

				// Half the bit-size of rep_t
				#define HW (typeWidth / 2)
				// rep_t-sized bitmask with lower half of bits set to ones
				#define loMask (REP_C(-1) >> HW)

				#if NUMBER_OF_FULL_ITERATIONS < 1
				#error At least one full iteration is required
				Lint: Pre-merge checks Inline Actions clang-tidy: error: At least one full iteration is required [clang-diagnostic-error] not useful Lint: Pre-merge checks: clang-tidy: error: At least one full iteration is required [clang-diagnostic-error] [[https…
				#endif

				static __inline fp_t __divXf3__(fp_t a, fp_t b) {
				Lint: Pre-merge checks Inline Actions clang-tidy: error: unknown type name 'fp_t' [clang-diagnostic-error] not useful clang-tidy: warning: invalid case style for function 'divXf3' [readability-identifier-naming] not useful clang-tidy: error: unknown type name 'fp_t' [clang-diagnostic-error] not useful clang-tidy: error: unknown type name 'fp_t' [clang-diagnostic-error] not useful Lint: Pre-merge checks: clang-tidy: error: unknown type name 'fp_t' [clang-diagnostic-error] [[https://github.

				const unsigned int aExponent = toRep(a) >> significandBits & maxExponent;
				Lint: Pre-merge checks Inline Actions clang-tidy: warning: invalid case style for variable 'aExponent' [readability-identifier-naming] not useful clang-tidy: warning: implicit declaration of function 'toRep' is invalid in C99 [clang-diagnostic-implicit-function-declaration] not useful clang-tidy: error: use of undeclared identifier 'significandBits' [clang-diagnostic-error] not useful clang-tidy: error: use of undeclared identifier 'maxExponent'; did you mean 'aExponent'? [clang-diagnostic-error] not useful Lint: Pre-merge checks: clang-tidy: warning: invalid case style for variable 'aExponent' [readability-identifier…
				const unsigned int bExponent = toRep(b) >> significandBits & maxExponent;
				Lint: Pre-merge checks Inline Actions clang-tidy: warning: invalid case style for variable 'bExponent' [readability-identifier-naming] not useful clang-tidy: error: use of undeclared identifier 'significandBits' [clang-diagnostic-error] not useful clang-tidy: error: use of undeclared identifier 'maxExponent'; did you mean 'aExponent'? [clang-diagnostic-error] not useful Lint: Pre-merge checks: clang-tidy: warning: invalid case style for variable 'bExponent' [readability-identifier…
				const rep_t quotientSign = (toRep(a) ^ toRep(b)) & signBit;
				Lint: Pre-merge checks Inline Actions clang-tidy: error: unknown type name 'rep_t' [clang-diagnostic-error] not useful clang-tidy: warning: invalid case style for variable 'quotientSign' [readability-identifier-naming] not useful clang-tidy: error: use of undeclared identifier 'signBit' [clang-diagnostic-error] not useful Lint: Pre-merge checks: clang-tidy: error: unknown type name 'rep_t' [clang-diagnostic-error] [[https://github.

				rep_t aSignificand = toRep(a) & significandMask;
				Lint: Pre-merge checks Inline Actions clang-tidy: error: use of undeclared identifier 'rep_t' [clang-diagnostic-error] not useful Lint: Pre-merge checks: clang-tidy: error: use of undeclared identifier 'rep_t' [clang-diagnostic-error] [[https…
				rep_t bSignificand = toRep(b) & significandMask;
				Lint: Pre-merge checks Inline Actions clang-tidy: error: use of undeclared identifier 'rep_t' [clang-diagnostic-error] not useful Lint: Pre-merge checks: clang-tidy: error: use of undeclared identifier 'rep_t' [clang-diagnostic-error] [[https…
				int scale = 0;
				Lint: Pre-merge checks Inline Actions clang-tidy: warning: invalid case style for variable 'scale' [readability-identifier-naming] not useful Lint: Pre-merge checks: clang-tidy: warning: invalid case style for variable 'scale' [readability-identifier-naming]…

				// Detect if a or b is zero, denormal, infinity, or NaN.
				if (aExponent - 1U >= maxExponent - 1U \|\|
				Lint: Pre-merge checks Inline Actions clang-tidy: warning: both sides of operator are equivalent [misc-redundant-expression] not useful clang-tidy: error: use of undeclared identifier 'maxExponent'; did you mean 'aExponent'? [clang-diagnostic-error] not useful Lint: Pre-merge checks: clang-tidy: warning: both sides of operator are equivalent [misc-redundant-expression] [[https…
				bExponent - 1U >= maxExponent - 1U) {
				Lint: Pre-merge checks Inline Actions clang-tidy: error: use of undeclared identifier 'maxExponent'; did you mean 'aExponent'? [clang-diagnostic-error] not useful Lint: Pre-merge checks: clang-tidy: error: use of undeclared identifier 'maxExponent'; did you mean 'aExponent'? [clang…

				const rep_t aAbs = toRep(a) & absMask;
				Lint: Pre-merge checks Inline Actions clang-tidy: error: unknown type name 'rep_t' [clang-diagnostic-error] not useful clang-tidy: warning: invalid case style for variable 'aAbs' [readability-identifier-naming] not useful clang-tidy: error: use of undeclared identifier 'absMask' [clang-diagnostic-error] not useful Lint: Pre-merge checks: clang-tidy: error: unknown type name 'rep_t' [clang-diagnostic-error] [[https://github.
				const rep_t bAbs = toRep(b) & absMask;
				Lint: Pre-merge checks Inline Actions clang-tidy: error: unknown type name 'rep_t' [clang-diagnostic-error] not useful clang-tidy: warning: invalid case style for variable 'bAbs' [readability-identifier-naming] not useful clang-tidy: error: use of undeclared identifier 'absMask' [clang-diagnostic-error] not useful Lint: Pre-merge checks: clang-tidy: error: unknown type name 'rep_t' [clang-diagnostic-error] [[https://github.

				// NaN / anything = qNaN
				if (aAbs > infRep)
				return fromRep(toRep(a) \| quietBit);
				// anything / NaN = qNaN
				if (bAbs > infRep)
				return fromRep(toRep(b) \| quietBit);

				if (aAbs == infRep) {
				// infinity / infinity = NaN
				if (bAbs == infRep)
				return fromRep(qnanRep);
				// infinity / anything else = +/- infinity
				else
				return fromRep(aAbs \| quotientSign);
				}

				// anything else / infinity = +/- 0
				if (bAbs == infRep)
				return fromRep(quotientSign);

				if (!aAbs) {
				// zero / zero = NaN
				if (!bAbs)
				return fromRep(qnanRep);
				// zero / anything else = +/- zero
				else
				return fromRep(quotientSign);
				}
				// anything else / zero = +/- infinity
				if (!bAbs)
				return fromRep(infRep \| quotientSign);

				// One or both of a or b is denormal. The other (if applicable) is a
				// normal number. Renormalize one or both of a and b, and set scale to
				// include the necessary exponent adjustment.
				if (aAbs < implicitBit)
				scale += normalize(&aSignificand);
				if (bAbs < implicitBit)
				scale -= normalize(&bSignificand);
				}

				// Set the implicit significand bit. If we fell through from the
				// denormal path it was already set by normalize( ), but setting it twice
				// won't hurt anything.
				aSignificand \|= implicitBit;
				bSignificand \|= implicitBit;

				int writtenExponent = (aExponent - bExponent + scale) + exponentBias;
				Lint: Pre-merge checks Inline Actions clang-tidy: warning: invalid case style for variable 'writtenExponent' [readability-identifier-naming] not useful Lint: Pre-merge checks: clang-tidy: warning: invalid case style for variable 'writtenExponent' [readability-identifier…

				sepavloffUnsubmitted Not Done Reply Inline Actions This estimation is absent from the original comment. Do you have reference where it came from? Also the original comment states `This is accurate to about 3.5 binary digits`. Is still true? If yes, it could be worth copying here. sepavloff: This estimation is absent from the original comment. Do you have reference where it came from?
				atrosinenkoAuthorUnsubmitted Done Reply Inline Actions This approximation was deduced by writing down the derivative of `f` "in infinite precision" and finding its root. Then the values of `f` applied to its root, 1.0 and 2.0 were calculated -- as far as I remember, all of them were `3/4 - 1/sqrt(2)` or negated - this is what "minimax polynomial" probably means, that term was just copied from the original implementation :). atrosinenko: This approximation was deduced by writing down the derivative of `f ` "in infinite precision"…
				sepavloffUnsubmitted Not Done Reply Inline Actions IIUC, you don't want to put this statement here because you are us not sure it is true? Sounds reasonable. sepavloff: IIUC, you don't want to put this statement here because you are us not sure it is true? Sounds…
				scanonUnsubmitted Not Done Reply Inline Actions To be precise, what _minimax polynomial_ means is that p(x) = 3/4 + 1/sqrt(2) - x/2 is the first-order polynomial that minimizes the error term max(\|1/x - p(x)\|) on the interval [1,2]. I.e. every other linear polynomial would achieve a larger maximum error. The bound of a minimax approximation to a well-behaved function is always achieved at the endpoints, so we can just evaluate at 1 to get the max error: \|1/1 - 3/4 - 1/sqrt(2) + 1/2\| = 3/4 - 1/sqrt(2) = 0.04289... (which is actually about _4.5_ bits). scanon: To be precise, what _minimax polynomial_ means is that p(x) = 3/4 + 1/sqrt(2) - x/2 is the…
				const rep_t b_UQ1 = bSignificand << (typeWidth - significandBits - 1);
				Lint: Pre-merge checks Inline Actions clang-tidy: warning: invalid case style for variable 'b_UQ1' [readability-identifier-naming] not useful Lint: Pre-merge checks: clang-tidy: warning: invalid case style for variable 'b_UQ1' [readability-identifier-naming]…

				// Align the significand of b as a UQ1.(n-1) fixed-point number in the range
				sepavloffUnsubmitted Not Done Reply Inline Actions The original comment states: // This doubles the number of correct binary digits in the approximation // with each iteration. It is true in this implementation? If yes, it could be worth copying here. sepavloff: The original comment states: ``` // This doubles the number of correct binary digits in the…
				atrosinenkoAuthorUnsubmitted Done Reply Inline Actions For me this looks too vague. This is probably approximately true but I don't know how exactly this should be interpreted. atrosinenko: For me this looks too vague. This is probably //approximately true// but I don't know how…
				scanonUnsubmitted Not Done Reply Inline Actions N-R is quadratically convergent under a bunch of assumptions on how good the initial guess is and bounds on the second derivative, which are all satisfied here, but probably not worth going into in the comments. IIRC the usual reference here is Montuschi and Mezzalama's "Survey of square rooting algorithms" (1990). scanon: N-R is quadratically convergent under a bunch of assumptions on how good the initial guess is…
				// [1.0, 2.0) and get a UQ0.n approximate reciprocal using a small minimax
				// polynomial approximation: x0 = 3/4 + 1/sqrt(2) - b/2.
				// The max error for this approximation is achieved at endpoints, so
				// abs(x0(b) - 1/b) <= abs(x0(1) - 1/1) = 3/4 - 1/sqrt(2) = 0.04289...,
				// which is about 4.5 bits.
				// The initial approximation is between x0(1.0) = 0.9571... and x0(2.0) = 0.4571...

				sepavloffUnsubmitted Not Done Reply Inline Actions It is good optimization. Could you please put a comment shortly describing the idea of using half-sized temporaries? sepavloff: It is good optimization. Could you please put a comment shortly describing the idea of using…
				atrosinenkoAuthorUnsubmitted Done Reply Inline Actions The idea is just "I guess this takes less CPU time and I have managed to prove error bounds for it". :) Specifically, for float128, the rep_t * rep_t multiplication will be emulated with lots of CPU instructions while the lower half contain some noise at that point. This particular optimization did exist in the original implementation for float64 and float128. For float32 it had not much sense, I guess. Still, estimations were calculated for the case of float32 with half-size iterations as it may be useful for MSP430 and other 16-bit targets. atrosinenko: The idea is just "I guess this takes less CPU time and I have managed to prove error bounds for…
				scanonUnsubmitted Not Done Reply Inline Actions This is absolutely standard in HW construction of pipelined iterative dividers and square root units, so I'm not sure how much explanation is really needed =) scanon: This is absolutely standard in HW construction of pipelined iterative dividers and square root…
				sepavloffUnsubmitted Not Done Reply Inline Actions The idea is clear but it require some study of the sources. I would propose to add a comment saying: At the first iterations number of significant digits is small, so we may use shorter type values. Operations on them are usually faster. or something like that. sepavloff: The idea is clear but it require some study of the sources. I would propose to add a comment…
				sepavloffUnsubmitted Not Done Reply Inline Actions This is absolutely standard in HW construction of pipelined iterative dividers and square root units, so I'm not sure how much explanation is really needed =) I think now the code has enough explanations to be easily understood by mere mortals also :) sepavloff: > This is absolutely standard in HW construction of pipelined iterative dividers and square…
				// Then, refine the reciprocal estimate using a quadratically converging
				// Newton-Raphson iteration:
				// x_{n+1} = x_n * (2 - x_n * b)
				//
				// Let b be the original divisor considered "in infinite precision" and
				sepavloffUnsubmitted Not Done Reply Inline Actions In what cases 16-bit temporaries are used? `NUMBER_OF_HALF_ITERATIONS` is set to zero in `divsf3.c`. sepavloff: In what cases 16-bit temporaries are used? `NUMBER_OF_HALF_ITERATIONS` is set to zero in…
				atrosinenkoAuthorUnsubmitted Done Reply Inline Actions Agree, this needs to be re-evaluated and some comment should be added at least. This could be dead code for now, that was expected to speed things up on 16-bit targets that even lack hardware multiplication sometimes (such as MSP430). atrosinenko: Agree, this needs to be re-evaluated and some comment should be added at least. This could be…
				// obtained from IEEE754 representation of function argument (with the
				// implicit bit set). Corresponds to rep_t-sized b_UQ1 represented in
				// UQ1.(W-1).
				//
				// Let b_hw be an infinitely precise number obtained from the highest (HW-1)
				// bits of divisor significand (with the implicit bit set). Corresponds to
				// half_rep_t-sized b_UQ1_hw represented in UQ1.(HW-1) that is a truncated
				// version of b_UQ1.
				//
				// Let e_n := x_n - 1/b_hw
				// E_n := x_n - 1/b
				// abs(E_n) <= abs(e_n) + (1/b_hw - 1/b)
				// = abs(e_n) + (b - b_hw) / (b*b_hw)
				// <= abs(e_n) + 2 * 2^-HW

				// rep_t-sized iterations may be slower than the corresponding half-width
				sepavloffUnsubmitted Not Done Reply Inline Actions It would be better to put short comment to explain using 0 instead of 2. sepavloff: It would be better to put short comment to explain using 0 instead of 2.
				atrosinenkoAuthorUnsubmitted Done Reply Inline Actions Agree, it was expected to be something like `/* = 2.0 in UQ1.(HW-1) /`. Naming things is especially painful here... atrosinenko:* Agree, it was expected to be something like `/* = 2.0 in UQ1.(HW-1) */`. Naming things is…
				sepavloffUnsubmitted Not Done Reply Inline Actions 2.0 cannot be represented in UQ1.X. I would add comment line like: Due to wrapping 2.0 in UQ1.X is equivalent to 0. or something similar. sepavloff: 2.0 cannot be represented in UQ1.X. I would add comment line like: ``` Due to wrapping 2.0 in…
				// variant depending on the handware and whether single/double/quad precision
				// is selected.
				// NB: Using half-width iterations increases computation errors due to
				// rounding, so error estimations have to be computed taking the selected
				// mode into account!
				#if NUMBER_OF_HALF_ITERATIONS > 0
				// Starting with (n-1) half-width iterations
				const half_rep_t b_UQ1_hw = bSignificand >> (significandBits + 1 - HW);

				// C is (3/4 + 1/sqrt(2)) - 1 truncated to W0 fractional bits as UQ0.HW
				// with W0 being either 16 or 32 and W0 <= HW.
				// That is, C is the aforementioned 3/4 + 1/sqrt(2) constant (from which
				sepavloffUnsubmitted Not Done Reply Inline Actions Should the right part contain `1/b`? sepavloff: Should the right part contain `1/b`?
				atrosinenkoAuthorUnsubmitted Not Done Reply Inline Actions What line are you referring to? For line 142, `e_0` is defined as `x_n - 1/b_hw` in infinite precision (please note it intentionally refers `b_hw` that is a truncated version of `b`, see lines 113-114). atrosinenko: What line are you referring to? For line 142, `e_0` is defined as `x_n - 1/b_hw` in infinite…
				// b/2 is subtracted to obtain x0) wrapped to [0, 1) range.
				#if defined(SINGLE_PRECISION)
				// Use 16-bit initial estimation in case we are using half-width iterations
				// for float32 division. This is expected to be useful for some 16-bit
				// targets. Not used by default as it requires performing more work during
				// rounding and would hardly help on regular 32- or 64-bit targets.
				const half_rep_t C_hw = HALF_REP_C(0x7504);
				#else
				// HW is at least 32. Shifting into the highest bits if needed.
				const half_rep_t C_hw = HALF_REP_C(0x7504F333) << (HW - 32);
				#endif

				// b >= 1, thus an upper bound for 3/4 + 1/sqrt(2) - b/2 is about 0.9572,
				// so x0 fits to UQ0.HW without wrapping.
				half_rep_t x_UQ0_hw = C_hw - (b_UQ1_hw /* exact b_hw/2 as UQ0.HW */);
				// An e_0 error is comprised of errors due to
				// * x0 being an inherently imprecise first approximation of 1/b_hw
				// * C_hw being some (irrational) number truncated to W0 bits
				// Please note that e_0 is calculated against the infinitely precise
				// reciprocal of b_hw (that is, truncated version of b).
				//
				// e_0 <= 3/4 - 1/sqrt(2) + 2^-W0

				// By construction, 1 <= b < 2
				// f(x) = x * (2 - bx) = 2x - b*x^2
				// f'(x) = 2 * (1 - b*x)
				//
				// On the [0, 1] interval, f(0) = 0,
				// then it increses until f(1/b) = 1 / b, maximum on (0, 1),
				// then it decreses to f(1) = 2 - b
				//
				// Let g(x) = x - f(x) = b*x^2 - x.
				// On (0, 1/b), g(x) < 0 <=> f(x) > x
				// On (1/b, 1], g(x) > 0 <=> f(x) < x
				//
				// For half-width iterations, b_hw is used instead of b.
				REPEAT_N_TIMES(NUMBER_OF_HALF_ITERATIONS, {
				// corr_UQ1_hw can be larger than 2 - b_hwx by at most 1Ulp
				// of corr_UQ1_hw.
				// "0.0 - (...)" is equivalent to "2.0 - (...)" in UQ1.(HW-1).
				// On the other hand, corr_UQ1_hw should not overflow from 2.0 to 0.0 provided
				// no overflow occurred earlier: ((rep_t)x_UQ0_hw * b_UQ1_hw >> HW) is
				sepavloffUnsubmitted Not Done Reply Inline Actions `x_UQ0_hw` and `b_UQ1_hw` are declared inside the conditional block `#if NUMBER_OF_HALF_ITERATIONS > 0`. Does `NUMBER_OF_FULL_ITERATIONS != 1` always imply `NUMBER_OF_HALF_ITERATIONS > 0` ? sepavloff: `x_UQ0_hw` and `b_UQ1_hw` are declared inside the conditional block `#if…
				atrosinenkoAuthorUnsubmitted Done Reply Inline Actions Does NUMBER_OF_FULL_ITERATIONS != 1 always imply NUMBER_OF_HALF_ITERATIONS > 0 ? Hmm... It should imply `== 0`, at first glance... Generally, total number of iterations should be 3 for f32, 4 for f64 and 5 for f128. Then, error bounds are calculated. There are generally only two modes: n-1 half-size iteration + 1 full-size iteration OR n full-size iteration (as one generally has no performance gains from using 16x16-bit multiplications on one hand, and that particular case turned out to require extra rounding, on the other). atrosinenko: > Does NUMBER_OF_FULL_ITERATIONS != 1 always imply NUMBER_OF_HALF_ITERATIONS > 0 ? Hmm... It…
				sepavloffUnsubmitted Not Done Reply Inline Actions I have concern that `x_UQ0_hw` and `x_UQ1_hw` are declared in the block with condition `NUMBER_OF_HALF_ITERATIONS > 0` but uses in the other block with condition `!USE_NATIVE_FULL_ITERATIONS && NUMBER_OF_HALF_ITERATIONS > 0`, so probably there may be a combination of the macros when the variables are used but not declared. Maybe it is impossible due to some reason, in this case a proper check may be put into the block `#ifdef USE_NATIVE_FULL_ITERATIONS` which asserts that `NUMBER_OF_HALF_ITERATIONS > 0`. Otherwise `x_UQ0_hw` and `x_UQ1_hw` need to be moved out of the conditional block. sepavloff: I have concern that `x_UQ0_hw` and `x_UQ1_hw` are declared in the block with condition…
				// expected to be strictly positive because b_UQ1_hw has its highest bit set
				// and x_UQ0_hw should be rather large (it converges to 1/2 < 1/b_hw <= 1).
				half_rep_t corr_UQ1_hw = 0 - ((rep_t)x_UQ0_hw * b_UQ1_hw >> HW);

				// Now, we should multiply UQ0.HW and UQ1.(HW-1) numbers, naturally
				// obtaining an UQ1.(HW-1) number and proving its highest bit could be
				// considered to be 0 to be able to represent it in UQ0.HW.
				// From the above analysis of f(x), if corr_UQ1_hw would be represented
				// without any intermediate loss of precision (that is, in twice_rep_t)
				// x_UQ0_hw could be at most [1.]000... if b_hw is exactly 1.0 and strictly
				// less otherwise. On the other hand, to obtain [1.]000..., one have to pass
				// 1/b_hw == 1.0 to f(x), so this cannot occur at all without overflow (due
				// to 1.0 being not representable as UQ0.HW).
				// The fact corr_UQ1_hw was virtually round up (due to result of
				// multiplication being first truncated, then negated - to improve
				// error estimations) can increase x_UQ0_hw by up to 2*Ulp of x_UQ0_hw.
				x_UQ0_hw = (rep_t)x_UQ0_hw * corr_UQ1_hw >> (HW - 1);
				// Now, either no overflow occurred or x_UQ0_hw is 0 or 1 in its half_rep_t
				// representation. In the latter case, x_UQ0_hw will be either 0 or 1 after
				// any number of iterations, so just subtract 2 from the reciprocal
				// approximation after last iteration.

				// In infinite precision, with 0 <= eps1, eps2 <= U = 2^-HW:
				// corr_UQ1_hw = 2 - (1/b_hw + e_n) * b_hw + 2*eps1
				// = 1 - e_n * b_hw + 2*eps1
				// x_UQ0_hw = (1/b_hw + e_n) * (1 - e_nb_hw + 2eps1) - eps2
				// = 1/b_hw - e_n + 2eps1/b_hw + e_n - e_n^2b_hw + 2e_neps1 - eps2
				// = 1/b_hw + 2eps1/b_hw - e_n^2b_hw + 2e_neps1 - eps2
				// e_{n+1} = -e_n^2b_hw + 2eps1/b_hw + 2e_neps1 - eps2
				// = 2e_neps1 - (e_n^2b_hw + eps2) + 2eps1/b_hw
				// \------ >0 -------/ \-- >0 ---/
				// abs(e_{n+1}) <= 2abs(e_n)U + max(2e_n^2 + U, 2 U)
				})
				// For initial half-width iterations, U = 2^-HW
				// Let abs(e_n) <= u_n * U,
				// then abs(e_{n+1}) <= 2 * u_n * U^2 + max(2 * u_n^2 * U^2 + U, 2 * U)
				// u_{n+1} <= 2 * u_n * U + max(2 * u_n^2 * U + 1, 2)

				// Account for possible overflow (see above). For an overflow to occur for the
				// first time, for "ideal" corr_UQ1_hw (that is, without intermediate
				// truncation), the result of x_UQ0_hw * corr_UQ1_hw should be either maximum
				// value representable in UQ0.HW or less by 1. This means that 1/b_hw have to
				// be not below that value (see g(x) above), so it is safe to decrement just
				// once after the final iteration. On the other hand, an effective value of
				// divisor changes after this point (from b_hw to b), so adjust here.
				x_UQ0_hw -= 1U;
				rep_t x_UQ0 = (rep_t)x_UQ0_hw << HW;
				x_UQ0 -= 1U;

				#else
				// C is (3/4 + 1/sqrt(2)) - 1 truncated to 32 fractional bits as UQ0.n
				const rep_t C = REP_C(0x7504F333) << (typeWidth - 32);
				rep_t x_UQ0 = C - b_UQ1;
				// E_0 <= 3/4 - 1/sqrt(2) + 2 * 2^-32
				#endif

				// Error estimations for full-precision iterations are calculated just
				// as above, but with U := 2^-W and taking extra decrementing into account.
				// We need at least one such iteration.

				#ifdef USE_NATIVE_FULL_ITERATIONS
				REPEAT_N_TIMES(NUMBER_OF_FULL_ITERATIONS, {
				rep_t corr_UQ1 = 0 - ((twice_rep_t)x_UQ0 * b_UQ1 >> typeWidth);
				x_UQ0 = (twice_rep_t)x_UQ0 * corr_UQ1 >> (typeWidth - 1);
				})
				atrosinenkoAuthorUnsubmitted Not Done Reply Inline Actions Interesting fact: swapping these two seemingly commuting lines makes code slower by 15-25%. This applies to current Clang as well as to `clang-8` from Ubuntu 20.04 repository. atrosinenko: Interesting fact: swapping these two seemingly commuting lines makes code slower by 15-25%.
				#else
				#if NUMBER_OF_FULL_ITERATIONS != 1
				#error Only a single emulated full iteration is supported
				#endif
				#if !(NUMBER_OF_HALF_ITERATIONS > 0)
				// Cannot normally reach here: only one full-width iteration is requested and
				// the total number of iterations should be at least 3 even for float32.
				#error Check NUMBER_OF_HALF_ITERATIONS, NUMBER_OF_FULL_ITERATIONS and USE_NATIVE_FULL_ITERATIONS.
				#endif
				// Simulating operations on a twice_rep_t to perform a single final full-width
				// iteration. Using ad-hoc multiplication implementations to take advantage
				// of particular structure of operands.
				rep_t blo = b_UQ1 & loMask;
				// x_UQ0 = x_UQ0_hw * 2^HW - 1
				// x_UQ0 * b_UQ1 = (x_UQ0_hw * 2^HW) * (b_UQ1_hw * 2^HW + blo) - b_UQ1
				//
				// <--- higher half ---><--- lower half --->
				// [x_UQ0_hw * b_UQ1_hw]
				// + [ x_UQ0_hw * blo ]
				// - [ b_UQ1 ]
				// = [ result ][.... discarded ...]
				rep_t corr_UQ1 = 0U - ( (rep_t)x_UQ0_hw * b_UQ1_hw
				+ ((rep_t)x_UQ0_hw * blo >> HW)
				- REP_C(1)); // account for possible carry
				rep_t lo_corr = corr_UQ1 & loMask;
				rep_t hi_corr = corr_UQ1 >> HW;
				// x_UQ0 * corr_UQ1 = (x_UQ0_hw * 2^HW) * (hi_corr * 2^HW + lo_corr) - corr_UQ1
				x_UQ0 = ((rep_t)x_UQ0_hw * hi_corr << 1)
				+ ((rep_t)x_UQ0_hw * lo_corr >> (HW - 1))
				- REP_C(2); // 1 to account for the highest bit of corr_UQ1 can be 1
				// 1 to account for possible carry
				// Just like the case of half-width iterations but with possibility
				// of overflowing by one extra Ulp of x_UQ0.
				x_UQ0 -= 1U;
				// ... and then traditional fixup by 2 should work

				// On error estimation:
				// abs(E_{N-1}) <= (u_{N-1} + 2 /* due to conversion e_n -> E_n /) 2^-HW
				// + (2^-HW + 2^-W))
				// abs(E_{N-1}) <= (u_{N-1} + 3.01) * 2^-HW

				// Then like for the half-width iterations:
				// With 0 <= eps1, eps2 < 2^-W
				// E_N = 4 * E_{N-1} * eps1 - (E_{N-1}^2 * b + 4 * eps2) + 4 * eps1 / b
				// abs(E_N) <= 2^-W * [ 4 * abs(E_{N-1}) + max(2 * abs(E_{N-1})^2 * 2^W + 4, 8)) ]
				// abs(E_N) <= 2^-W * [ 4 * (u_{N-1} + 3.01) * 2^-HW + max(4 + 2 * (u_{N-1} + 3.01)^2, 8) ]
				#endif

				// Finally, account for possible overflow, as explained above.
				x_UQ0 -= 2U;

				// u_n for different precisions (with N-1 half-width iterations):
				// W0 is the precision of C
				// u_0 = (3/4 - 1/sqrt(2) + 2^-W0) * 2^HW

				// Estimated with bc:
				// define half1(un) { return 2.0 * (un + un^2) / 2.0^hw + 1.0; }
				// define half2(un) { return 2.0 * un / 2.0^hw + 2.0; }
				// define full1(un) { return 4.0 * (un + 3.01) / 2.0^hw + 2.0 * (un + 3.01)^2 + 4.0; }
				// define full2(un) { return 4.0 * (un + 3.01) / 2.0^hw + 8.0; }

				// \| f32 (0 + 3) \| f32 (2 + 1) \| f64 (3 + 1) \| f128 (4 + 1)
				// u_0 \| < 184224974 \| < 2812.1 \| < 184224974 \| < 791240234244348797
				// u_1 \| < 15804007 \| < 242.7 \| < 15804007 \| < 67877681371350440
				// u_2 \| < 116308 \| < 2.81 \| < 116308 \| < 499533100252317
				// u_3 \| < 7.31 \| \| < 7.31 \| < 27054456580
				// u_4 \| \| \| \| < 80.4
				// Final (U_N) \| same as u_3 \| < 72 \| < 218 \| < 13920

				// Add 2 to U_N due to final decrement.

				#if defined(SINGLE_PRECISION) && NUMBER_OF_HALF_ITERATIONS == 2 && NUMBER_OF_FULL_ITERATIONS == 1
				#define RECIPROCAL_PRECISION REP_C(74)
				#elif defined(SINGLE_PRECISION) && NUMBER_OF_HALF_ITERATIONS == 0 && NUMBER_OF_FULL_ITERATIONS == 3
				#define RECIPROCAL_PRECISION REP_C(10)
				#elif defined(DOUBLE_PRECISION) && NUMBER_OF_HALF_ITERATIONS == 3 && NUMBER_OF_FULL_ITERATIONS == 1
				#define RECIPROCAL_PRECISION REP_C(220)
				#elif defined(QUAD_PRECISION) && NUMBER_OF_HALF_ITERATIONS == 4 && NUMBER_OF_FULL_ITERATIONS == 1
				#define RECIPROCAL_PRECISION REP_C(13922)
				#else
				#error Invalid number of iterations
				#endif

				// Suppose 1/b - P * 2^-W < x < 1/b + P * 2^-W
				x_UQ0 -= RECIPROCAL_PRECISION;
				// Now 1/b - (2P) 2^-W < x < 1/b
				// FIXME Is x_UQ0 still >= 0.5?

				rep_t quotient_UQ1, dummy;
				wideMultiply(x_UQ0, aSignificand << 1, &quotient_UQ1, &dummy);
				// Now, a/b - 4P 2^-W < q < a/b for q=<quotient_UQ1:dummy> in UQ1.(SB+1+W).

				// quotient_UQ1 is in [0.5, 2.0) as UQ1.(SB+1),
				// adjust it to be in [1.0, 2.0) as UQ1.SB.
				rep_t residualLo;
				if (quotient_UQ1 < (implicitBit << 1)) {
				// Highest bit is 0, so just reinterpret quotient_UQ1 as UQ1.SB,
				// effectively doubling its value as well as its error estimation.
				residualLo = (aSignificand << (significandBits + 1)) - quotient_UQ1 * bSignificand;
				writtenExponent -= 1;
				} else {
				// Highest bit is 1 (the UQ1.(SB+1) value is in [1, 2)), convert it
				// to UQ1.SB by right shifting by 1. Least significant bit is omitted.
				quotient_UQ1 >>= 1;
				residualLo = (aSignificand << significandBits) - quotient_UQ1 * bSignificand;
				}
				// Now, q cannot be greater than a/b and can differ by at most 8P 2^-W + 2^-SB
				// Each NextAfter() increments the floating point value by at least 2^-SB
				// (more, if exponent was incremented).
				// Different cases (<---> is of 2^-SB length, * = a/b that is shown as a midpoint):
				// q
				// \| \| * \| \| \| \| \|
				// <---> 2^t
				// \| \| \| \| \| * \| \|
				// q
				// To require at most one NextAfter(), an error should be less than 1.5 * 2^-SB.
				// (8P) 2^-W + 2^-SB < 1.5 * 2^-SB
				// (8P) 2^-W < 0.5 * 2^-SB
				// P < 2^(W-4-SB)
				// Generally, for at most R NextAfter() to be enough,
				// P < (2R - 1) 2^(W-4-SB)
				// For f32 (0+3): 10 < 32 (OK)
				// For f32 (2+1): 32 < 74 < 32 * 3, so two NextAfter() are required
				// For f64: 220 < 256 (OK)
				// For f128: 4096 * 3 < 13922 < 4096 * 5 (three NextAfter() are required)

				// If we have overflowed the exponent, return infinity
				if (writtenExponent >= maxExponent)
				return fromRep(infRep \| quotientSign);

				// Now, quotient_UQ1_SB <= the correctly-rounded result
				// and may need taking NextAfter() up to 3 times (see error estimates above)
				// r = a - b * q

				if (writtenExponent < 0) {
				// Result is definitely subnormal, flushing to zero
				return fromRep(quotientSign);
				}

				// Clear the implicit bit
				rep_t absResult = quotient_UQ1 & significandMask;
				// Insert the exponent
				absResult \|= (rep_t)writtenExponent << significandBits;

				// Round
				residualLo <<= 1;
				residualLo += absResult & 1; // tie to even
				// The above line conditionally turns the below LT comparison into LTE
				absResult += residualLo > bSignificand;
				#if defined(QUAD_PRECISION) \|\| (defined(SINGLE_PRECISION) && NUMBER_OF_HALF_ITERATIONS > 0)
				// Do not round Infinity to NaN
				absResult += absResult < infRep && residualLo > (2 + 1) * bSignificand;
				#endif
				#if defined(QUAD_PRECISION)
				absResult += absResult < infRep && residualLo > (4 + 1) * bSignificand;
				#endif

				if ((absResult & ~significandMask) == 0) {
				// Result is subnormal, flushing to zero
				return fromRep(quotientSign);
				}
				// Result is normal, insert the sign and return
				return fromRep(absResult \| quotientSign);
				}

compiler-rt/lib/builtins/fp_lib.h

	Show All 34 Lines
	#define int64_t long long			#define int64_t long long
	#undef UINT64_C			#undef UINT64_C
	#define UINT64_C(c) (c##ULL)			#define UINT64_C(c) (c##ULL)
	#endif			#endif
	#endif			#endif

	#if defined SINGLE_PRECISION			#if defined SINGLE_PRECISION

				typedef uint16_t half_rep_t;
	typedef uint32_t rep_t;			typedef uint32_t rep_t;
				typedef uint64_t twice_rep_t;
	typedef int32_t srep_t;			typedef int32_t srep_t;
	typedef float fp_t;			typedef float fp_t;
				#define HALF_REP_C UINT16_C
	#define REP_C UINT32_C			#define REP_C UINT32_C
	#define significandBits 23			#define significandBits 23

	static __inline int rep_clz(rep_t a) { return clzsi(a); }			static __inline int rep_clz(rep_t a) { return clzsi(a); }

	// 32x32 --> 64 bit multiply			// 32x32 --> 64 bit multiply
	static __inline void wideMultiply(rep_t a, rep_t b, rep_t hi, rep_t lo) {			static __inline void wideMultiply(rep_t a, rep_t b, rep_t hi, rep_t lo) {
	const uint64_t product = (uint64_t)a * b;			const uint64_t product = (uint64_t)a * b;
	*hi = product >> 32;			*hi = product >> 32;
	*lo = product;			*lo = product;
	}			}
	COMPILER_RT_ABI fp_t __addsf3(fp_t a, fp_t b);			COMPILER_RT_ABI fp_t __addsf3(fp_t a, fp_t b);

	#elif defined DOUBLE_PRECISION			#elif defined DOUBLE_PRECISION

				typedef uint32_t half_rep_t;
	typedef uint64_t rep_t;			typedef uint64_t rep_t;
	typedef int64_t srep_t;			typedef int64_t srep_t;
	typedef double fp_t;			typedef double fp_t;
				#define HALF_REP_C UINT32_C
	#define REP_C UINT64_C			#define REP_C UINT64_C
	#define significandBits 52			#define significandBits 52

	static __inline int rep_clz(rep_t a) {			static __inline int rep_clz(rep_t a) {
	#if defined __LP64__			#if defined __LP64__
	return __builtin_clzl(a);			return __builtin_clzl(a);
	#else			#else
	if (a & REP_C(0xffffffff00000000))			if (a & REP_C(0xffffffff00000000))
	Show All 25 Lines
	#undef loWord			#undef loWord
	#undef hiWord			#undef hiWord

	COMPILER_RT_ABI fp_t __adddf3(fp_t a, fp_t b);			COMPILER_RT_ABI fp_t __adddf3(fp_t a, fp_t b);

	#elif defined QUAD_PRECISION			#elif defined QUAD_PRECISION
	#if __LDBL_MANT_DIG__ == 113 && defined(__SIZEOF_INT128__)			#if __LDBL_MANT_DIG__ == 113 && defined(__SIZEOF_INT128__)
	#define CRT_LDBL_128BIT			#define CRT_LDBL_128BIT
				typedef uint64_t half_rep_t;
	typedef __uint128_t rep_t;			typedef __uint128_t rep_t;
	typedef __int128_t srep_t;			typedef __int128_t srep_t;
	typedef long double fp_t;			typedef long double fp_t;
				#define HALF_REP_C UINT64_C
	#define REP_C (__uint128_t)			#define REP_C (__uint128_t)
	// Note: Since there is no explicit way to tell compiler the constant is a			// Note: Since there is no explicit way to tell compiler the constant is a
	// 128-bit integer, we let the constant be casted to 128-bit integer			// 128-bit integer, we let the constant be casted to 128-bit integer
	#define significandBits 112			#define significandBits 112

	static __inline int rep_clz(rep_t a) {			static __inline int rep_clz(rep_t a) {
	const union {			const union {
	__uint128_t ll;			__uint128_t ll;
	▲ Show 20 Lines • Show All 74 Lines • ▼ Show 20 Lines
	#undef Word_2			#undef Word_2
	#undef Word_3			#undef Word_3
	#undef Word_4			#undef Word_4
	#undef Word_HiMask			#undef Word_HiMask
	#undef Word_LoMask			#undef Word_LoMask
	#undef Word_FullMask			#undef Word_FullMask
	#endif // __LDBL_MANT_DIG__ == 113 && __SIZEOF_INT128__			#endif // __LDBL_MANT_DIG__ == 113 && __SIZEOF_INT128__
	#else			#else
	#error SINGLE_PRECISION, DOUBLE_PRECISION or QUAD_PRECISION must be defined.			#error SINGLE_PRECISION, DOUBLE_PRECISION or QUAD_PRECISION must be defined.
				Lint: Pre-merge checks Inline Actions clang-tidy: error: SINGLE_PRECISION, DOUBLE_PRECISION or QUAD_PRECISION must be defined. [clang-diagnostic-error] not useful clang-tidy: error: SINGLE_PRECISION, DOUBLE_PRECISION or QUAD_PRECISION must be defined. [clang-diagnostic-error] not useful Lint: Pre-merge checks: clang-tidy: error: SINGLE_PRECISION, DOUBLE_PRECISION or QUAD_PRECISION must be defined. [clang…
	#endif			#endif

	#if defined(SINGLE_PRECISION) \|\| defined(DOUBLE_PRECISION) \|\| \			#if defined(SINGLE_PRECISION) \|\| defined(DOUBLE_PRECISION) \|\| \
	defined(CRT_LDBL_128BIT)			defined(CRT_LDBL_128BIT)
	#define typeWidth (sizeof(rep_t) * CHAR_BIT)			#define typeWidth (sizeof(rep_t) * CHAR_BIT)
	#define exponentBits (typeWidth - significandBits - 1)			#define exponentBits (typeWidth - significandBits - 1)
	#define maxExponent ((1 << exponentBits) - 1)			#define maxExponent ((1 << exponentBits) - 1)
	#define exponentBias (maxExponent >> 1)			#define exponentBias (maxExponent >> 1)
	▲ Show 20 Lines • Show All 113 Lines • Show Last 20 Lines

compiler-rt/lib/builtins/int_util.h

	Show All 14 Lines
	//===----------------------------------------------------------------------===//			//===----------------------------------------------------------------------===//

	#ifndef INT_UTIL_H			#ifndef INT_UTIL_H
	#define INT_UTIL_H			#define INT_UTIL_H

	/// \brief Trigger a program abort (or panic for kernel code).			/// \brief Trigger a program abort (or panic for kernel code).
	#define compilerrt_abort() __compilerrt_abort_impl(__FILE__, __LINE__, __func__)			#define compilerrt_abort() __compilerrt_abort_impl(__FILE__, __LINE__, __func__)

	NORETURN void __compilerrt_abort_impl(const char *file, int line,			NORETURN void __compilerrt_abort_impl(const char *file, int line,
				Lint: Pre-merge checks Inline Actions clang-tidy: error: unknown type name 'NORETURN' [clang-diagnostic-error] not useful Lint: Pre-merge checks: clang-tidy: error: unknown type name 'NORETURN' [clang-diagnostic-error] [[https://github.
	const char *function);			const char *function);

	#define COMPILE_TIME_ASSERT(expr) COMPILE_TIME_ASSERT1(expr, __COUNTER__)			#define COMPILE_TIME_ASSERT(expr) COMPILE_TIME_ASSERT1(expr, __COUNTER__)
	#define COMPILE_TIME_ASSERT1(expr, cnt) COMPILE_TIME_ASSERT2(expr, cnt)			#define COMPILE_TIME_ASSERT1(expr, cnt) COMPILE_TIME_ASSERT2(expr, cnt)
	#define COMPILE_TIME_ASSERT2(expr, cnt) \			#define COMPILE_TIME_ASSERT2(expr, cnt) \
	typedef char ct_assert_##cnt[(expr) ? 1 : -1] UNUSED			typedef char ct_assert_##cnt[(expr) ? 1 : -1] UNUSED

				// Force unrolling the code specified to be repeated N times.
				#define REPEAT_0_TIMES(code_to_repeat) /* do nothing */
				#define REPEAT_1_TIMES(code_to_repeat) code_to_repeat
				#define REPEAT_2_TIMES(code_to_repeat) \
				REPEAT_1_TIMES(code_to_repeat) \
				code_to_repeat
				#define REPEAT_3_TIMES(code_to_repeat) \
				REPEAT_2_TIMES(code_to_repeat) \
				code_to_repeat
				#define REPEAT_4_TIMES(code_to_repeat) \
				REPEAT_3_TIMES(code_to_repeat) \
				code_to_repeat

				#define REPEAT_N_TIMES_(N, code_to_repeat) REPEAT_##N##_TIMES(code_to_repeat)
				#define REPEAT_N_TIMES(N, code_to_repeat) REPEAT_N_TIMES_(N, code_to_repeat)

	#endif // INT_UTIL_H			#endif // INT_UTIL_H

compiler-rt/test/builtins/Unit/divdf3_test.c

Show First 20 Lines • Show All 86 Lines • ▼ Show 20 Lines	int main()

// divisor is exactly 1.0		// divisor is exactly 1.0
if (test__divdf3(0x1.0p+0, 0x1.0p+0, UINT64_C(0x3ff0000000000000)))		if (test__divdf3(0x1.0p+0, 0x1.0p+0, UINT64_C(0x3ff0000000000000)))
return 1;		return 1;
// divisor is truncated to exactly 1.0 in UQ1.31		// divisor is truncated to exactly 1.0 in UQ1.31
if (test__divdf3(0x1.0p+0, 0x1.00000001p+0, UINT64_C(0x3fefffffffe00000)))		if (test__divdf3(0x1.0p+0, 0x1.00000001p+0, UINT64_C(0x3fefffffffe00000)))
return 1;		return 1;

		// some misc test cases obtained by fuzzing against h/w implementation
		if (test__divdf3(0x1.fdc239dd64735p-658, -0x1.fff9364c0843fp-948, UINT64_C(0xd20fdc8fc0ceffb1)))
		return 1;
		if (test__divdf3(-0x1.78abb261d47c8p+794, 0x1.fb01d537cc5aep+266, UINT64_C(0xe0e7c6148ffc23e3)))
		return 1;
		if (test__divdf3(-0x1.da7dfe6048b8bp-875, 0x1.ffc7ea3ff60a4p-610, UINT64_C(0xaf5dab1fe0269e2a)))
		return 1;

return 0;		return 0;
}		}