Simon pointed out that this function is doing a bitcast, which can be incorrect for big endian. This makes the lowering of VMOVN in MVE incorrect, but the function is shared between Neon and MVE so both can be incorrect.
This attempts to fix things by using the newly added VECTOR_REG_CAST instead of the BITCAST. As it may now be used on Neon, I've added the relevant patterns for it there too. I've also added a quick dag combine for it, to remove them where possible.
This reference to MVE-specific instruction names might be out of place now this comment is shared with NEON :-) But I don't know enough NEON to be sure of what the analogous load/store instructions there look like. Are those VST1 / VLD1, perhaps?