Page MenuHomePhabricator

Add APFloat and MLIR type support for fp8 (e5m2).
ClosedPublic

Authored by stellaraccident on Sep 13 2022, 5:43 PM.

Details

Summary

This is a first step towards high level representation for fp8 types
that have been built in to hardware with near term roadmaps. Like the
BFLOAT16 type, the family of fp8 types are inspired by IEEE-754 binary
floating point formats but, due to the size limits, have been tweaked in
various ways in order to maximally use the range/precision in various
scenarios. The list of variants is small/finite and bounded by real
hardware.

This patch introduces the E5M2 FP8 format as proposed by Nvidia, ARM,
and Intel in the paper: https://arxiv.org/pdf/2209.05433.pdf

As the more conformant of the two implemented datatypes, we are plumbing
it through LLVM's APFloat type and MLIR's type system first as a
template. It will be followed by the range optimized E4M3 FP8 format
described in the paper. Since that format deviates further from the
IEEE-754 norms, it may require more debate and implementation
complexity.

Given that we see two parts of the FP8 implementation space represented
by these cases, we are recommending naming of:

  • F8M<N> : For FP8 types that can be conceived of as following the same rules as FP16 but with a smaller number of mantissa/exponent bits. Including the number of mantissa bits in the type name is enough to fully specify the type. This naming scheme is used to represent the E5M2 type described in the paper.
  • F8M<N>F : For FP8 types such as E4M3 which only support finite values.

The first of these (this patch) seems fairly non-controversial. The
second is previewed here to illustrate options for extending to the
other known variant (but can be discussed in detail in the patch
which implements it).

Many conversations about these types focus on the Machine-Learning
ecosystem where they are used to represent mixed-datatype computations
at a high level. At that level (which is why we also expose them in
MLIR), it is important to retain the actual type definition so that when
lowering to actual kernels or target specific code, the correct
promotions, casts and rescalings can be done as needed. We expect that
most LLVM backends will only experience these types as opaque I8
values that are applicable to some instructions.

MLIR does not make it particularly easy to add new floating point types
(i.e. the FloatType hierarchy is not open). Given the need to fully
model FloatTypes and make them interop with tooling, such types will
always be "heavy-weight" and it is not expected that a highly open type
system will be particularly helpful. There are also a bounded number of
floating point types in use for current and upcoming hardware, and we
can just implement them like this (perhaps looking for some cosmetic
ways to reduce the number of places that need to change). Creating a
more generic mechanism for extending floating point types seems like it
wouldn't be worth it and we should just deal with defining them one by
one on an as-needed basis when real hardware implements a new scheme.
Hopefully, with some additional production use and complete software
stacks, hardware makers will converge on a set of such types that is not
terribly divergent at the level that the compiler cares about.

(I cleaned up some old formatting and sorted some items for this case:
If we converge on landing this in some form, I will NFC commit format
only changes as a separate commit)

Diff Detail

Event Timeline

Herald added a project: Restricted Project. · View Herald Transcript
stellaraccident published this revision for review.Sep 14 2022, 9:05 AM
Herald added projects: Restricted Project, Restricted Project. · View Herald TranscriptSep 14 2022, 9:05 AM

I imagine APFloat changes likely require an RFC (though I'm not sure). Also, did you accidentally reformat all of the APFloat tests (4k lines change o.O)

I imagine APFloat changes likely require an RFC (though I'm not sure). Also, did you accidentally reformat all of the APFloat tests (4k lines change o.O)

Sorry: noted this patch in the RFC but didn't put the reverse link here: https://discourse.llvm.org/t/rfc-add-apfloat-and-mlir-type-support-for-fp8-e5m2/65279

Ugh, yeah, APFloatTest was very not formatted and clang-format "helped". I tried to stack an NFC clang-format under this one but seem to have messed something up. Could you offer any advice on what to do (I was thinking of just pre-landing an NFC clang-format of APFloatTest and then rebasing on top of that).

I imagine APFloat changes likely require an RFC (though I'm not sure). Also, did you accidentally reformat all of the APFloat tests (4k lines change o.O)

Sorry: noted this patch in the RFC but didn't put the reverse link here: https://discourse.llvm.org/t/rfc-add-apfloat-and-mlir-type-support-for-fp8-e5m2/65279

Ugh, yeah, APFloatTest was very not formatted and clang-format "helped". I tried to stack an NFC clang-format under this one but seem to have messed something up. Could you offer any advice on what to do (I was thinking of just pre-landing an NFC clang-format of APFloatTest and then rebasing on top of that).

There is a script to only run clang-format on the area you changed. https://clang.llvm.org/docs/ClangFormat.html#script-for-patch-reformatting

Format and apply rename suggestions.

git-clang-format

Formatted and naming suggestions from RFC applied. Ready for review.

bkramer accepted this revision.Sep 24 2022, 3:12 AM

Code looks good. Were the concerns about naming resolved?

This revision is now accepted and ready to land.Sep 24 2022, 3:12 AM

Code looks good. Were the concerns about naming resolved?

I believe so for this one, which is fairly uncontroversial. Would still like a review by someone on the MLIR side, though.

rengolin accepted this revision.Sep 26 2022, 9:34 AM
rengolin added a subscriber: rengolin.

Honestly, there isn't much meat here, other than the decision to support and the name bikeshedding, both of which I think are well covered in the RFC.

Most of the code is mechanical changes to adding a new type and in that are perfectly fine.

The main questions for me are:

  1. Do we want float8? YES!
  2. Do we want this specific variant? Yes, IMO.
  3. Is adding a single variant (amongst so many) worth it? Yes, at the very least for initial support. Adding others or changing this should be easy.
  4. Do we want a more generic implementation which we derive specific cases from? Maybe. Regardless, this is an initial implementation and we can generalise later, if desired.
  5. Is the end goal to have only concrete cases listed? Strong YES from me. This should be less than 1/2 dozen and we can cope with that, even if we don't have a generic implementation.
  6. Does this patch satisfy all the constraints above? Yes, IMO.

The only issue that could be raised for adding a specific one is that of backwards compatibility. As long as we're sure what we want before forking the next version (Jan-ish), it should be fine to experiment and gather opinions.

So, my approval of it is exactly that: my support for this change. As usual, please wait for more approvals to make sure we have enough support for the specific way you implemented it.

Renato

Honestly, there isn't much meat here, other than the decision to support and the name bikeshedding, both of which I think are well covered in the RFC.

I scoped it down to the hardest topic in computer science: what to call it ;)

Bkramer; was your intention to review both the APFloat and MLIR side of the patch? Just making sure that the MLIR side had a proper look before submitting and it wasn't clear to me from the commentary.

Bkramer; was your intention to review both the APFloat and MLIR side of the patch? Just making sure that the MLIR side had a proper look before submitting and it wasn't clear to me from the commentary.

Both the LLVM and MLIR sides look good to me (but I have less experience on the MLIR side).

jpienaar accepted this revision.Sep 28 2022, 6:14 AM

Looks good thanks, naming discussions always fun.

mlir/test/lib/Dialect/Test/TestOps.td
197

I'll file an issue if you haven't already.

This revision was landed with ongoing or failed builds.Oct 2 2022, 5:25 PM
This revision was automatically updated to reflect the committed changes.
chapuni added a subscriber: chapuni.Oct 2 2022, 6:54 PM
chapuni added inline comments.
llvm/include/llvm/ADT/APFloat.h
160

This triggers a warning in clang/lib/AST/MicrosoftMangle.cpp:mangleFloat. Any idea?

vitalybuka reopened this revision.Oct 2 2022, 9:23 PM
This revision is now accepted and ready to land.Oct 2 2022, 9:23 PM
llvm/include/llvm/ADT/APFloat.h
160

I'm not familiar with that code but it appears that it is assuming full coverage of every APFloat semantic to an msvc mangling -- which can't be valid. It looks like the way such things are handled within that file for errors is llvm_unreachable so should probably add a default with that.

craig.topper added inline comments.Oct 3 2022, 10:45 AM
llvm/include/llvm/ADT/APFloat.h
160

Probably best to use llvm_unreachable for S_Float8E5M2 instead of adding a default. That will keep the fully covered switch warning if another semantic is added in the future.

Add fix to MicrosoftMangle.cpp that caused buildbot failure.

Herald added a project: Restricted Project. · View Herald TranscriptOct 4 2022, 5:14 PM
Herald added a subscriber: cfe-commits. · View Herald Transcript

Switch to explicit case per comment from ctopper.

stellaraccident marked an inline comment as done.

Remove break after llvm_unreachable for consistency with other switches in file.

jpienaar accepted this revision.Oct 4 2022, 5:26 PM
This revision was landed with ongoing or failed builds.Oct 4 2022, 5:40 PM
This revision was automatically updated to reflect the committed changes.

Ran the full clang/llvm/mlir test suite in debug mode just to be safe.