This is an archive of the discontinued LLVM Phabricator instance.

[APFloat] Fix truncation of certain subnormal numbers
ClosedPublic

Authored by danilaml on Jun 6 2022, 11:33 AM.

Details

Summary

Certain subnormals would be incorrectly rounded away from zero.

Fixes #55838

Diff Detail

Event Timeline

danilaml created this revision.Jun 6 2022, 11:33 AM
Herald added a project: Restricted Project. · View Herald TranscriptJun 6 2022, 11:33 AM
Herald added a subscriber: hiraditya. · View Herald Transcript
danilaml requested review of this revision.Jun 6 2022, 11:33 AM
Herald added a project: Restricted Project. · View Herald TranscriptJun 6 2022, 11:33 AM
danilaml added inline comments.Jun 6 2022, 11:36 AM
llvm/lib/Support/APFloat.cpp
2242

Not 100% sure it is safe to use this function here (after the shift), rather than just exponent == semantics->minExponent or something.

I'm not sure it's safe to assume that the lost fraction is lfLessThanHalf, in general. At least, it isn't obvious to me, particularly for cases involving bfloat.

We already have a bunch of code to adjust the shift amount for truncations: the "If this is a truncation of a denormal number [...]" block. Can we adjust that code to reduce the shift amount in this case?

I'm not sure it's safe to assume that the lost fraction is lfLessThanHalf, in general. At least, it isn't obvious to me, particularly for cases involving bfloat.

We already have a bunch of code to adjust the shift amount for truncations: the "If this is a truncation of a denormal number [...]" block. Can we adjust that code to reduce the shift amount in this case?

If I understood correctly (which is not a given, ofc), using lfLessThanHalf would just prevent rounding up whichever number results after right shift, which is exactly the expected behavior for stuff like bfloat (that just truncates).
Conversely, I'm not sure if it's safe to adjust the shift amount for truncations either. Is normalize expected to handle arbitrary input and "normalize" it to the current semantics, with the only exception being the "zero significand, non-zero lost fraction"?
I've tried adding an assert and quickly found the place where the code does exactly this:

/* Underflow to zero and round.  */
category = fcNormal;
zeroSignificand();
fs = normalize(rounding_mode, lfLessThanHalf);

in APFloat.cpp:2770, IEEEFloat::convertFromDecimalString (from the EXPECT_TRUE(APFloat(APFloat::IEEEdouble(), "1e-99999").isPosZero()); test).

danilaml updated this revision to Diff 434877.Jun 7 2022, 10:24 AM

Submitted an alternative solution (adjusting the shift/exponent)

In some cases, we might need to round up a denormal number. Suppose, for example, the result of a conversion is exactly half of the smallest denormal number, or slightly greater than that. Then in "nearest" rounding mode, we need to round up. And in the case of float->bfloat, exactly half of the smallest denormal bfloat is a denormal float.

So I'm really not comfortable just overwriting the lost_fraction.

The "Underflow to zero and round" dance in convertFromDecimalString is meant to handle different rounding modes, for example, round away from zero. normalize() rounds the result appropriately. So either lfExactlyZero or lfLessThanHalf is meaningful with a zero signficand, but lfMoreThanHalf isn't really.

normalize() should handle "arbitrary" inputs, in the sense that it can correctly handle significands and exponents which aren't directly representable in the destination floating-point format.

New patch looks better.

Can you add unittests for the float->bfloat case I mentioned before?

llvm/lib/Support/APFloat.cpp
2217

"caseisn't" -> "case isn't"

danilaml updated this revision to Diff 435153.Jun 8 2022, 7:22 AM

Added float->bfloat conversion tests

@efriedma I've added float->bloat16 conversion tests but I'm not sure these are exact same you had in mind.

efriedma accepted this revision.Jun 8 2022, 9:49 AM

LGTM

This revision is now accepted and ready to land.Jun 8 2022, 9:49 AM
This revision was automatically updated to reflect the committed changes.