This change permits scalar bfloats to be loaded, stored, moved and
used as function call arguments and return values, whenever the bf16
feature is supported by the subtarget.
Previously that was only supported in the presence of the fullfp16
feature, because the code generation strategy depended on instructions
from that extension. This change adds alternative code generation
strategies so that those operations can be done even without fullfp16.
The strategy for loads and stores is to replace VLDRH/VSTRH with
integer LDRH/STRH plus a move between register classes. I've written
isel patterns for those, conditional on not having the fullfp16
feature (so that in the fullfp16 case, the existing patterns will
still be used).
For function arguments and returns, instead of writing isel patterns
to match VMOVhr and VMOVrh, I've avoided generating those SDNodes
in the first place, by factoring out the code that constructs them
into helper functions MoveToHPR and MoveFromHPR which have a
fallback for non-fullfp16 subtargets.
The current output code is not especially pretty: in the new test file
you can see unnecessary store/load pairs implementing no-op bitcasts,
and lots of pointless moves back and forth between FP registers and
GPRs. But it at least works, which is an improvement on the previous
situation.
I'm not sure if setAllExpand should be guarded by !hasFullFP16? In either case they would be in pretty much the same situation - most operations are illegal with bf16. Does something go wrong always doing this?