This is an archive of the discontinued LLVM Phabricator instance.

[llvm][APFloat] Add NaN-in-negative-zero formats by AMD and GraphCore
ClosedPublic

Authored by krzysz00 on Jan 16 2023, 9:59 AM.

Details

Summary

AMD, GraphCore, and Qualcom have published a standard for 8-bit floats that
differs from the 8-bit floats defined by Nvidia, Intel, and ARM. This
commit adds support for these alternate 8-bit floats to APFloat in
order to enable their usage in MLIR. These formats are presented in
the paper at https://arxiv.org/abs/2206.02915 and are implemented in
GRaphCore hardware whose ISA is available at
https://docs.graphcore.ai/projects/isa-mk2-with-fp8/en/latest/_static/TileVertexISA-IPU21-1.3.1.pdf .

In these formats, like the existing Float8E4M3FN, there are no
infinity values and there is only one NaN. Unlike in that format,
however, the NaN values is 0x80, which would be negative 0 in IEEE
formats. This means that these formats also make 0 unsigned.

To allow for these new variant semantics, this commit adds
fltNanEncoding, which can be IEEE (the default), AllOnes (used by
Fleat8E4M3FN), or NegativeZero (used by the new formats,
Float8E5M2FNUZ and Float8E4M3FNUZ). Normalization, arithmetic, and
other such routines have been updated to account for the potential
variant semantics.

The two new formats are Float8E5M2FNUZ (5 bits exponent, 2 bits
mantissa, finite, unsigned zero) and Float8E4M3FNUZ (4 bits exponent,
3 bits mantissa, finite, unsigned zero).

Diff Detail

Event Timeline

krzysz00 created this revision.Jan 16 2023, 9:59 AM
krzysz00 requested review of this revision.Jan 16 2023, 9:59 AM
krzysz00 updated this revision to Diff 490316.Jan 18 2023, 3:22 PM

Copy over missed tests from GraphCore's diff

jakeh-gc added inline comments.
llvm/unittests/ADT/APFloatTest.cpp
1877

There are now 20 test cases, not 16.
12 wasn't a correct value for this before.

My preference is to use std::size here to avoid these issues in future.

krzysz00 updated this revision to Diff 491459.Jan 23 2023, 11:11 AM

Address bug in tests

jakeh-gc accepted this revision.Jan 25 2023, 1:10 AM

Looks good 👍

This revision is now accepted and ready to land.Jan 25 2023, 1:10 AM

Adding people who've recently touched APFloat as reviewers.

krzysz00 updated this revision to Diff 492186.Jan 25 2023, 10:52 AM

Add tests for getSmallestNormalized(), changeSign()

mehdi_amini added inline comments.
llvm/lib/Support/APFloat.cpp
76

Typo: point

77

typo: represented

80

typo: behavior

lattner added a subscriber: lattner.Feb 1 2023, 5:54 PM
lattner added inline comments.
llvm/include/llvm/ADT/APFloat.h
161

Typo follwing

171

Typo follwing

lattner accepted this revision.Feb 1 2023, 5:54 PM

Looks fine with the requested changes.

reedwm added a comment.Feb 1 2023, 9:12 PM

Looks good, just have some minor comments.

llvm/include/llvm/ADT/APFloat.h
165

You should document that the exponent bias is 16, since it differs from the IEEE standard exponent bias of 2**(num_exponent_bits - 1) - 1. And same for S_Float8E4M3FNUZ.

llvm/lib/Support/APFloat.cpp
62

This comment should be updated to mention the new dtypes, and also the NaN description no longer is accurate for the new dtypes.

Also, "Float8E5M2" should be replaced with "Float8E4M3FN". I accidentally put the wrong dtype in the comment when adding Float8E4M3FN.

77

The sign bit can still be zero. I would clarify that for NaNs the exponent and mantissa are all 1s.

81

Also mention that for NaNs, the mantissa is also all 0s (you only mentioned the exponent is all 0s).

840–841

This comment should be moved to the "else" branch you added (or just delete the comment).

2506

Don't we only lose info if the source of the conversion is -0?

llvm/unittests/ADT/APFloatTest.cpp
1855

What does this TODO mean?

2078

Typo: castss -> casts

5280

Add the word "negative". And similarly in Float8E4M3FNUZNext

5342–5343

This is the same as the next test case

5376

typo: hives -> gives

5570

This looks pretty similar to Float8E5M2FNUZExhaustive. Maybe merge them.

krzysz00 updated this revision to Diff 494425.Feb 2 2023, 1:31 PM
krzysz00 marked 17 inline comments as done.

Fix typos, merge exhaustive tests, get hit by the clang-format stick

reedwm added a comment.Feb 2 2023, 4:37 PM

Just some more minor nits about comments.

llvm/lib/Support/APFloat.cpp
78–79

fltNanEncoding::AllOnes technically has two NaN representations: a NaN with the sign bit set and a NaN with the sign bit unset. I would change this setence to "The details of the NaN representation(s) are determined by the fltNanEncoding enum

100

typo: hhaving -> having

842

typo: signallling -> signalling

llvm/unittests/ADT/APFloatTest.cpp
5414

Typo: smalest -> smallest

krzysz00 updated this revision to Diff 495139.Feb 6 2023, 7:44 AM
krzysz00 marked 4 inline comments as done.

Fix a few more typos and an incorrect comment

llvm/lib/Support/APFloat.cpp
2506

I don't think so? If I understand losesInfo correctly, it's for (non-rounding-related?) cases where, for x : T, convert(T, convert(U, x)) != x is possible.

So since both +0 and -0 map to +0, we've lost information.

llvm/unittests/ADT/APFloatTest.cpp
1855

Stray comment that was a note about adding what's currently const bool signedZero; above.

reedwm added inline comments.Feb 8 2023, 10:05 PM
llvm/lib/Support/APFloat.cpp
2506

I don't think so? If I understand losesInfo correctly, it's for (non-rounding-related?) cases where, for x : T, convert(T, convert(U, x)) != x is possible.

By that logic, losesInfo should be false if the source of the conversion is +0.

E.g., in your equation convert(T, convert(U, x)) != x, if x is the FP32 value +0, T is FP32, and U is FP8, then the equation is false, since converting +0 to FP8 and back results in the FP32 value +0. So losesInfo should be false in that case. On the other hand, if x is the FP32 value -0, the equation is true, since converting -0 to FP8 and back results in the FP32 value +0, and so losesInfo should be true.

krzysz00 added inline comments.Feb 9 2023, 8:21 AM
llvm/lib/Support/APFloat.cpp
2506

Ok, to rephrase: we know that converting f32 Inf or NaN to any of the FN APFloat formats loses info, because there's two distinct concepts that get collapsed down to one value. Similarly, in our formats, putting in a 0 from a format that isn't unsigned-zero-only also loses info, as now +0 and -0 have been collapsed down to one value and you can't tell which one you had on the way back out.

krzysz00 added inline comments.Feb 9 2023, 8:23 AM
llvm/lib/Support/APFloat.cpp
2506

Ok, having looked at the comment, on the function again you were right, +0 should not lose info but -0 should.

krzysz00 updated this revision to Diff 496163.Feb 9 2023, 10:00 AM

Fix correctness of losesInfo for +0 and -0

reedwm accepted this revision.Feb 9 2023, 10:59 AM
This revision was landed with ongoing or failed builds.Feb 9 2023, 2:08 PM
This revision was automatically updated to reflect the committed changes.