This adds MVE vmull patterns, which are conceptually the same as mul(vmovl, vmovl), and so the tablegen patterns follow the same structure.
For i8 and i16 this is simple enough, but in the i32 version the multiply (in 64bits) is illegal, meaning we need to catch the pattern earlier in a dag fold. Because bitcasts are involved in the zext versions and the patterns are a little different in little and big endian. I have only added little endian support in this patch.
I'm a bit confused here... this looks like the AND mask is taking the 'top' parts of the elements, and if so, why don't we have to handle a 'bottom' element mask? Is this to do with the revs I see in the isel patterns?