The vectorizer can sometimes make reverse shuffles from indices that count down. In MVE, we don't have a 128bit rev instruction, but we can select this to a VREV64 with some lane movs to swap the two halfs.
Ideally this would use VMOVD's, but only gets as far as VMOVS's at the moment.
I think keeping some kind of assert is a good idea here.