These instructions make a vector of <4 x float> by widening every
other lane of a vector of <8 x half>.
I wondered about representing these using standard IR, along the lines
of a shufflevector to extract elements of the input into a <4 x half>
followed by an fpext to turn that into <4 x float>. But it looks as
if that would take a lot of work in isel lowering to make it match any
pattern I could sensibly write in Tablegen, and also I haven't been
able to think of any other case where that pattern might be generated
in IR, so there wouldn't be any extra code generation win from doing
it that way.
Therefore, I've just used another target-specific intrinsic. We can
always change it to the other way later if anyone thinks of a good
reason.
(In order to put the intrinsic definition near similar things in
IntrinsicsARM.td, I've also lifted the definition of the
MVEMXPredicated multiclass higher up the file, without changing it.)