Pursuant to RFC discussions, this change enhances the handling of the __bf16 type in Clang.
- Firstly, it upgrades __bf16 from a storage-only type to an arithmetic type.
- Secondly, it changes the mangling of __bf16 to DF16b on all architectures except ARM. This change has been made in accordance with the finalization of the mangling for the std::bfloat16_t type, as discussed at https://github.com/itanium-cxx-abi/cxx-abi/pull/147.
- Finally, this commit extends the existing excess precision support to the __bf16 type. This applies to hardware architectures that do not natively support bfloat16 arithmetic.
Appropriate tests have been added to verify the effects of these changes and ensure no regressions in other areas of the compiler.
Suggested rework:
Clang supports three half-precision (16-bit) floating point types: ``__fp16``, ``_Float16`` and ``__bf16``. These types are supported in all language modes, but not on all targets: - ``__fp16`` is supported on every target. - ``_Float16`` is currently supported on the following targets: * 32-bit ARM (natively on some architecture versions) * 64-bit ARM (AArch64) (natively on ARMv8.2a and above) * AMDGPU (natively) * SPIR (natively) * X86 (if SSE2 is available; natively if AVX512-FP16 is also available) - ``__bf16`` is currently supported on the following targets: * 32-bit ARM * 64-bit ARM (AArch64) * X86 (when SSE2 is available) (For X86, SSE2 is available on 64-bit and all recent 32-bit processors.) ``__fp16`` and ``_Float16`` both use the binary16 format from IEEE 754-2008, which provides a 5-bit exponent and an 11-bit significand (counting the implicit leading 1). ``__bf16`` uses the `bfloat16 <https://en.wikipedia.org/wiki/Bfloat16_floating-point_format>`_ format, which provides an 8-bit exponent and an 8-bit significand; this is the same exponent range as `float`, just with greatly reduced precision. ``_Float16`` and ``__bf16`` follow the usual rules for arithmetic floating-point types. Most importantly, this means that arithmetic operations on operands of these types are formally performed in the type and produce values of the type. ``__fp16`` does not follow those rules: most operations immediately promote operands of type ``__fp16`` to ``float``, and so arithmetic operations are defined to be performed in ``float`` and so result in a value of type ``float`` (unless further promoted because of other operands). See below for more information on the exact specifications of these types. Only some of the supported processors for ``__fp16`` and ``__bf16`` offer native hardware support for arithmetic in their corresponding formats. The exact conditions are described in the lists above. When compiling for a processor without native support, Clang will perform the arithmetic in ``float``, inserting extensions and truncations as necessary. This can be done in a way that exactly emulates the behavior of hardware support for arithmetic, but it can require many extra operations. By default, Clang takes advantage of the C standard's allowances for excess precision in intermediate operands in order to eliminate intermediate truncations within statements. This is generally much faster but can generate different results from strict operation-by-operation emulation. The use of excess precision can be independently controlled for these two types with the ``-ffloat16-excess-precision=`` and ``-fbfloat16-excess-precision=`` options. Valid values include: - ``none`` (meaning to perform strict operation-by-operation emulation) - ``standard`` (meaning that excess precision is permitted under the rules described in the standard, i.e. never across explicit casts or statements) - ``fast`` (meaning that excess precision is permitted whenever the optimizer sees an opportunity to avoid truncations; currently this has no effect beyond ``standard``) The ``_Float16`` type is an interchange floating type specified in ISO/IEC TS 18661-3:2015 ("Floating-point extensions for C"). It will be supported on more targets as they define ABIs for it. The ``__bf16`` type is a non-standard extension, but it generally follows the rules for arithmetic interchange floating types from ISO/IEC TS 18661-3:2015. In previous versions of Clang, it was a storage-only type that forbade arithmetic operations. It will be supported on more targets as they define ABIs for it. The ``__fp16`` type was originally an ARM extension and is specified by the `ARM C Language Extensions <https://github.com/ARM-software/acle/releases>`_. Clang uses the ``binary16`` format from IEEE 754-2008 for ``__fp16``, not the ARM alternative format. Operators that expect arithmetic operands immediately promote ``__fp16`` operands to ``float``. It is recommended that portable code use ``_Float16`` instead of ``__fp16``, as it has been defined by the C standards committee and has behavior that is more familiar to most programmers. Because ``__fp16`` operands are always immediately promoted to ``float``, the common real type of ``__fp16`` and ``_Float16`` for the purposes of the usual arithmetic conversions is ``float``. A literal can be given ``_Float16`` type using the suffix ``f16``. For example, ``3.14f16``. Because default argument promotion only applies to the standard floating-point types, ``_Float16`` values are not promoted to ``double`` when passed as variadic or untyped arguments. As a consequence, some caution must be taken when using certain library facilities with ``_Float16``; for example, there is no ``printf`` format specifier for ``_Float16``, and (unlike ``float``) it will not be implicitly promoted to ``double`` when passed to ``printf``, so the programmer must explicitly cast it to ``double`` before using it with an ``%f`` or similar specifier.