We current extract and convert from a top lane of a f16 vector using a VMOVX;VCVTB pair. We can simplify that to use a single VCVTT. The pattern is mostly copied from a vector extract pattern, but produces a VCVTTHS f32 directly.
I had to move some code around so that ARMInstrVFP had access to the required pattern frags that were previously part of ARMInstrNEON. I could also split the pattern into MVE and NEON separately if that is better. The v8f16 is currently a bit "MVE-y", but seems to apply fine for NEON.