Page MenuHomePhabricator

[X86][RFC] Enable `_Float16` type support on X86 following the psABI
Needs ReviewPublic

Authored by pengfei on Jul 29 2021, 8:18 AM.

Details

Summary

GCC and Clang/LLVM will support _Float16 on X86 in C/C++, following
the latest X86 psABI. (https://gitlab.com/x86-psABIs)

_Float16 arithmetic will be performed using native half-precision. If
native arithmetic instructions are not available, it will be performed
at a higher precision (currently always float) and then truncated down
to _Float16 immediately after each single arithmetic operation.

Diff Detail

Event Timeline

pengfei created this revision.Jul 29 2021, 8:18 AM
pengfei requested review of this revision.Jul 29 2021, 8:18 AM
Herald added projects: Restricted Project, Restricted Project. · View Herald TranscriptJul 29 2021, 8:18 AM

I sent out this patch mainly for PoC of the ABI changes, I'll fix the performance regressions in next phase.
LLVM was using a different calling conversion on x86 when passing and returning half type. It conflicts with current X86 psABI.
I have evaluated the risk internally and think the change of ABI has low risk due to Clang doesn't use such calling conversion. But I may not be thoughtful enough. Questions and comments are appreciated.

pengfei added inline comments.Jul 29 2021, 8:38 AM
llvm/include/llvm/IR/RuntimeLibcalls.def
293–294

GCC12 will provide functions __extendhfsf2 and __truncsfhf2. I wonder if I can change it directly here or do extra customization for ARM/AArch64? Other targets?

craig.topper added a comment.EditedJul 29 2021, 8:52 AM

I haven't had a chance to look at this patch in detail, but I wanted to ask if you considered doing what ARM and RISCV do for this. They pass the f16 in the lower bits on an f32 by only changing the ABI handling code in the backend. The type legalizer takes care of the rest. That seems simpler than this patch. See for example https://reviews.llvm.org/D98670

I haven't had a chance to look at this patch in detail, but I wanted to ask if you considered doing what ARM and RISCV do for this. They pass the f16 in the lower bits on an f32 by only changing the ABI handling code in the backend. The type legalizer takes care of the rest. That seems simpler than this patch. See for example https://reviews.llvm.org/D98670

Thanks Craig for the information. I referenced implementation in AArch64. I think we have to add a legal f16 type in this way because:

  1. We will support _Float16 type in Clang on SSE2 and above to keep the same behavior with GCC. So a legal type is a must.
  2. Using lower 16bits of f32 may not satisfice the requirment from calling conversion of aggregation type and complex type defined by psABI.
  3. We have some optimizations to leverage F16C or AVX512 ps2ph/ph2ps instructions. A legal type is easy to customize.

Besides, we have full arithmatic f16 support in AVX512FP16. Most of the code here are shared and served for both scenarios. We just need to promote for most FP operations and expand or customize FP_ROUND and FP_EXTEND here.

pengfei updated this revision to Diff 362958.Jul 29 2021, 7:37 PM

Remove unused vector f16 definitions.

pengfei updated this revision to Diff 362974.Jul 29 2021, 11:43 PM

Add more conversion tests.

pengfei updated this revision to Diff 363461.Aug 2 2021, 6:02 AM
  1. Reverted several unrelated changes.
  2. Improved conversions to/from f64/f80 etc under f16c.
  3. Added combine to reduce intermediate move instructions.
  4. Refactor for several trivial problems.

After the last refactor, I think this patch is mostly ready.
This patch strips most of the ABI and _Float16 type related code from D105263, which can be leaving with only AVX512-FP16 ISA enabling code.
I think it should be more friendly for review. The defect is we make all FP16 enabling patches depend on and been blocked by this one. So I hope we could have a quick review and land it earlier.