X86 psABI has updated to support __bf16 type, the ABI of which is the
same as FP16. See https://discourse.llvm.org/t/patch-add-optional-bfloat16-support/63149
We have discussed this problem in D120395. Since we have the ABI for
__bf16 now, we can re-implement these AVX512-BF16 intrinsics with the
new type.
Please notice we will meet the same problem as FP16 in compiler-rt, I'll
change that part in this patch too.
I put this patch here mainly to make sure if it is the right direction. We
can also follow what we have done for FP16, but I think it's a bit heavy
to BF16.