This adds another tablegen fold that converts an i16 odd-lane-insert of an even-lane-extract into a VINS. We extract the existing f32 value from the destination register and VINS the new value into it. The rest of the backend then is able to optimize the INSERT_SUBREG/COPY_TO_REGCLASS/EXTRACT_SUBREG.
Hmm. I added it as a typecast, essentially. Otherwise the INSERT_SUBREG fails to make it through the tablegen type checks. Trying it as (INSERT_SUBREG (v4f32 MQPR:$src1), ... gives an error that looks like the insertsubreg has conflicting input types
Ah, I see – it's not that you needed to convert from MQPR to MQPR, it's just that a side effect of that is that you also get to convert from v8i16 to v4f32, which was what you really needed.
In that case, is that implicit conversion from v8i16 to v4f32 acting as a bitcast, or a VECTOR_REG_CAST? Does this pattern do the right thing when tested big-endian?
The COPY machine instruction (which is what a COPY_TO_REGCLASS will eventually turn into) loses any type info so will always act as a VECTOR_REG_CAST. So should be fine in big and little endian - and seems to be OK in my testing.