This change adds two FP16 extraction and two insertion patterns
(one per possible vector length).
Extractions are handled by copying a Q/D register into one of VFP2
class registers, where single FP32 sub-registers can be accessed. Then
the extraction of even lanes are simple sub-register extractions
(because we don't care about the top parts of registers for FP16
operations). Odd lanes need an additional VMOVX instruction.
Unfortunately, insertions cannot be handled in the same way, 64-bit operations are handled bybecause:
* There is no instruction to insert FP16 into an even lane (VINS only
existing i16 machinery works with some conversion between GPRs and FPodd lanes)
registers (VMOVRH and VMOVHR insns). 128-bit vectors are handled* The patterns for odd lanes will have a form of a DAG (not a tree),
similarly, but the operation is only performed on one of the 64 and will not be implementable in pure tablegen
Because of this insertions are handled in the same way as 16-bit
parts (using INSERT_SUBREG and EXTRACT_SUBREG with appropriate laneinteger insertions (with conversions between FP registers and GPRs
index conversusing VMOVHR instructions).
Without these patterns the ARM backend would sometimes fail during
This patch also adds patterns which combine:
* an FP16 element extraction and a store into a single VST1
* an FP16 load and an insertion into a single VLD1 instruction