This patch enables forming vector extloads (introduced in D6904) for ARM.
It only does so for legal types, and when we can't fold the extension in a wide/long form of the user instruction.
Enabling it for larger types isn't as good an idea on ARM as it is on X86, because:
- we pretend that extloads are legal, but end up generating vld+vmov, and
- we have instructions like vld {dN, dM}, which can't be generated when we "manually expand" extloads to vld+vmov.
For instance, for a 16i16 -> 16i64 sextload, we generate something like:
...
vld1.64 {d16, d17}, [r2:128]
vmovl.s16 q9, d16
vmovl.s16 q8, d17
...
Whereas with the combine enabled for illegal types, we would generate:
...
vld1.32 {d18[0]}, [r1:32]
...
vmovl.s16 q9, d18
...
For legal types, the combine doesn't fire that often: in the integration tests only in a big endian testcase, where it removes a pointless AND.