Currently, clang miscompiles operations on __fp16 vectors.
For example, when the following code is compiled:
typedef __fp16 half4 __attribute__ ((vector_size (8))); half4 hv0, hv1, hv2; void test() { hv0 = hv1 + hv2; }
clang generates the following IR on ARM64:
%1 = load <4 x half>, <4 x half>* @hv1, align 8 %2 = load <4 x half>, <4 x half>* @hv2, align 8 %3 = fadd <4 x half> %1, %2 store <4 x half> %3, <4 x half>* @hv0, align 8
This isn't correct since fp16 values in C or C++ expressions have to be promoted to float if fp16 is not a natively supported type (see gcc's documentation).
https://gcc.gnu.org/onlinedocs/gcc/Half-Precision.html
The IR is incorrect on X86 too. The addition is done on <4xi16>vectors:
%1 = load <4 x i16>, <4 x i16>* @hv1, align 8 %2 = load <4 x i16>, <4 x i16>* @hv2, align 8 %3 = add <4 x i16> %2, %1 store <4 x i16> %3, <4 x i16>* @hv0, align 8
This patch makes the changes needed in Sema and IRGen to generate the correct IR on targets that set HalfArgsAndReturns to true but don't support fp16 natively (ARM and ARM64). It inserts implicit casts to promote fp16 vector operands to float vectors and truncate the result back to a __fp16 vector.
I plan to fix X86 and other targets that don't set HalfArgsAndReturns to true in another patch.