This is an archive of the discontinued LLVM Phabricator instance.

[AArch64] Support conversion between fp16 and fp128
AbandonedPublic

Authored by zatrazz on Oct 26 2020, 11:00 AM.

Details

Summary

This is an updated version of https://reviews.llvm.org/D86453

This patch adds both extendhftf2 and trunctfhf2 to support
conversion between half-precision and quad-precision floating-point
values. They are enabled iff the compiler supports _Float16.

It also adjust the extendhfsf2, truncdfhf2 __truncsfhf2 to use
_Float16 when compiler supports it. On AArch64 it allows use the
native FP16 ABI, while on other architectures the expected current
semantic is preserved (arm for instance).

Diff Detail

Event Timeline

zatrazz created this revision.Oct 26 2020, 11:00 AM
Herald added a project: Restricted Project. · View Herald TranscriptOct 26 2020, 11:00 AM
Herald added subscribers: Restricted Project, danielkiss, hiraditya and 2 others. · View Herald Transcript
zatrazz requested review of this revision.Oct 26 2020, 11:00 AM
MaskRay added a comment.EditedNov 17 2020, 10:26 PM

Apologies for my slow response.

Patches need to have a large context (see https://llvm.org/docs/Phabricator.html#requesting-a-review-via-the-web-interface I prefer arc diff)
The code generation part and the compiler-rt/lib/builtins part are logically independent and thus splitting the patch makes sense. The compiler-rt parts reasonable.

Adding folks with better AArch64 knowledge...

llvm/test/CodeGen/AArch64/arm64-fp128.ll
234

Add -NEXT whenever appropriate

MaskRay edited reviewers, added: psmith, efriedma; removed: howard.hinnant.Nov 17 2020, 10:27 PM

They are enabled iff the compiler supports _Float16.

Why are these not enabled for compilers which don't support _Float16? There is also the __fp16 type, which has been supported for much longer and could also generate code like this.

It also adjust the extendhfsf2, truncdfhf2 __truncsfhf2 to use _Float16 when compiler supports it.

This would be better (easier to review, bisect, etc) if split into a separate patch.

On AArch64 it allows use the native FP16 ABI,.

Why is changing the ABI of these functions (including the existing 16<->32 and 16<->64 bit ones) OK? Wouldn't this cause problems if linking clang-compiled code against libgcc? I also don't see any changes to tell the compiler that the ABI has changed.

while on other architectures the expected current semantic is preserved (arm for instance).

Are you sure about this? ARM has a hard-float ABI which passes fp16 values in FP registers, and I'd expect some other architectures to have different calling conventions for uint16_t vs _Float16.

Apologies for my slow response.

Patches need to have a large context (see https://llvm.org/docs/Phabricator.html#requesting-a-review-via-the-web-interface I prefer arc diff)

I keep forgetting this.

The code generation part and the compiler-rt/lib/builtins part are logically independent and thus splitting the patch makes sense. The compiler-rt parts reasonable.

Ok, I will split it.

Adding folks with better AArch64 knowledge...

Thanks.

They are enabled iff the compiler supports _Float16.

Why are these not enabled for compilers which don't support _Float16? There is also the __fp16 type, which has been supported for much longer and could also generate code like this.

Because fp16 is an ARM C extension and tying the builtin generation to ISO extended floating-point type allows to more generic code (once the backend supports the type the builtin will be built instead of tying to the target ABI, such as extendhfsf2.c with builds only for ARM_EABI__).

It also adjust the extendhfsf2, truncdfhf2 __truncsfhf2 to use _Float16 when compiler supports it.

This would be better (easier to review, bisect, etc) if split into a separate patch.

Right, this could be a subsequent path.

On AArch64 it allows use the native FP16 ABI,.

Why is changing the ABI of these functions (including the existing 16<->32 and 16<->64 bit ones) OK? Wouldn't this cause problems if linking clang-compiled code against libgcc? I also don't see any changes to tell the compiler that the ABI has changed.

While fp16 is supported on all architectures, _Float16 is supported only for 32-bit ARM, 64-bit ARM, and SPIR (as indicated by clang/docs/LanguageExtensions.rst). Also, fp16 is a storage format and promote to 'float' for argument passing and 64-bit ARM supports floating-point convert precision to half as base armv8.1-a instruction. It means that although extendhfsf2, truncdfhf2 __truncsfhf2 will be built for 64-bit ARM, they will be never used in practice (compiler won't emit libcall to them). And the patch does not change the ABI for 32-bit ARM, it will continue to pass _Float16 as uint16.

while on other architectures the expected current semantic is preserved (arm for instance).

Are you sure about this? ARM has a hard-float ABI which passes fp16 values in FP registers, and I'd expect some other architectures to have different calling conventions for uint16_t vs _Float16.

Even for the case where either Armv8.2-A 16-bit floating point extensions are available or for 32-bit ARM if neon-fp16 fpu is selected, the libcalls won't be generated (both have support instruction support for float conversion).

In fact, compiler-rt has an issue where gnu_d2h, gnu_f2h, gnu_h2d, gnu_h2f (same for __extendhfsf2, and related functions) rely on the target floating point ABI (it uses 'float' type, where libgcc uses the expected int types). As a noted in a previous patch review, I think these routines should be moved to 32-arm specific folder with the expected sematic (regardless of the float ABI).

They are enabled iff the compiler supports _Float16.

Why are these not enabled for compilers which don't support _Float16? There is also the __fp16 type, which has been supported for much longer and could also generate code like this.

Because fp16 is only supported for argument passing and function return only on specific 32-bit ARM FPU modes, otherwise it is a only a storage format. Even ACLE has deprecated it over _Float16, so I think the generic implementation for compiler-rt should focus on _Float16 support and make fp16 support (if required, since for architecture where it has argument/function return support such ARM there is no need to fp convertions libcalls).

To summarize, I will split the patch in:

  1. Add extendhftf2 and trunctfhf2
  2. extendhfsf2, truncdfhf2 __truncsfhf2 to use _Float16 when compiler supports it and maybe disable on AArch64 (since there is no need to provide them)
  3. Maybe fix the arm gnu_d2h, gnu_f2h, gnu_h2d, gnu_h2f

[1] https://gcc.gnu.org/legacy-ml/gcc-patches/2016-04/msg01766.html

zatrazz abandoned this revision.Nov 18 2020, 11:36 AM