This patch adds tablegen patterns for pairs of i16/f16 insert/extracts. If we are inserting into two adjacent vector lanes (0 and 1 for example), we can use either a vmov; vins or vmovx; vins` to insert the pair together, avoiding a round-trip from GRP registers. This is quite a large patterns with a number of EXTRACT_SUBREG/INSERT_SUBREG/COPY_TO_REGCLASS nodes, but hopefully as most of those become copies all that will be cleaned up by further optimizations.
The VINS pattern was also adjusted to allow it to represent that it is inserting into the top half of an existing register.
It's unclear to me why we need 2 inputs to model that it is inserting into the top half of an existing register. I forgot if this is how that's done, is there precedent for this?