X86 psABI has updated to support __bf16 type, the ABI of which is the
same as FP16. See https://discourse.llvm.org/t/patch-add-optional-bfloat16-support/63149
Details
Diff Detail
- Repository
- rG LLVM Github Monorepo
Event Timeline
How are you actually implementing __bf16 on these targets? There isn't even hardware support for conversions.
bf16 -> float is really just a bit shift. The other direction gets lowered to a libcall, compiler-rt has a conversion function with proper rounding. I added some support to make the backend promote all other arithmetic to float, but I think that's only enabled on x86 so far.
We support float -> bf16 in AVX512BF16. https://www.intel.com/content/www/us/en/docs/intrinsics-guide/index.html#avx512techs=AVX512_BF16
And we found some problems in how to represent bf16 types in intrinsics. For example, we currently defined __bfloat16 as unsigned short. We cannot stop user e.g., adding 2 __bfloat16 in C code and getting the wrong result. So we want to introduce the type on X86. For more information, please see the discussions in D120395,
Yes, we can view x86 backend has been dealing with __bf16. And with https://reviews.llvm.org/D130832, it will complete follow psABI. About hardware support, x86 actually has supported bf16 since AVX512BF16 (https://reviews.llvm.org/D60552), which has vector conversion support between float and bf16. However, at that time we chose a typedef short as C type. In the future, we can support backend lowering for those instructions: VCVTNE2PS2BF16, VCVTNEPS2BF16 and DPBF16PS
Right, but this patch is adding x86 support whenever SSE2 is available. AVX512BF16 is available on a *very* small slice of processors. In contrast, e.g. F16C is relatively broadly available, although I understand that we formally support _Float16 all the way back to SSE2 and thus on some processors that lack F16C.
But okay, pure intrinsic support is fine if that's what we're doing.
I think the patch looks fine.
Right, but this patch is adding x86 support whenever SSE2 is available. AVX512BF16 is available on a *very* small slice of processors. In contrast, e.g. F16C is relatively broadly available, although I understand that we formally support _Float16 all the way back to SSE2 and thus on some processors that lack F16C.
But okay, pure intrinsic support is fine if that's what we're doing.
I think the patch looks fine.
Yes. This type is for pure intrinsic support. Thanks for your review. Let's wait for the backend patch to land first.