This is the groundwork for Armv8.2-A FP16 code generation .
Clang passes and returns _Float16 values as floats, together with the required
bitconverts and truncs etc. to implement correct AAPCS behaviour, see D42318.
We will implement half-precision argument passing/returning lowering
in the ARM backend soon, but for now this means that this:
_Float16 sub(_Float16 a, _Float16 b) {
return a + b;
}
gets lowered to this:
define float @sub(float %a.coerce, float %b.coerce) {
entry:
%0 = bitcast float %a.coerce to i32 %tmp.0.extract.trunc = trunc i32 %0 to i16 %1 = bitcast i16 %tmp.0.extract.trunc to half <SNIP> %add = fadd half %1, %3 <SNIP>
}
When FullFP16 is *not* supported, we don't make f16 a
legal type, and we get legalization for "free", i.e. nothing changes
and everything works as before. And also f16 argument passing/returning
is handled.
When FullFP16 is supported, we do make f16 a legal type,
and have 2 places that we need to patch up: f16 argument passing and
returning, which involves minor tweaks to avoid unnecessary code generation
for some bitcasts.
As a "demonstrator" that this works for the different FP16, FullFP16, softfp
modes, etc., I've added match rules to the VSUB instruction description showing that
we can codegen this instruction from IR, but more importantly, also to some
conversion instructions. These conversions were causing issue before in the FP16
and FullFP16 cases.
I've also added match rules to the VLDRH and VSTRH desriptions, so that we can
actually compile the entire half-precision sub code example above. This showed that
these loads and stores had the wrong addressing mode specified: AddrMode5 instead
of AddrMode5FP16, which turned out not be implemented at all, so that has also been added.
This is the minimal patch that shows all the different moving parts. In patch 2/3 I will
add some efficient lowering of bitcasts, and in 2/3 I will add the remaining Armv8.2-A
FP16 instruction descriptions.