Currently performExtendCombine assumes that the src-element bitwidth * 2
is a valid MVT. But this is not the case for i1 and it causes a crash on
the v64i1 test cases added in this patch.
It turns out that this code appears to not be needed; the same patterns are
handled by other code and we end up with the same results, even without the
custom lowering. I also added additional test cases in a50037aaa6d5df.
Let's just remove the unneeded code.
This looks odd, with the lanes being interchanged. I presume there's a lot of other code that converts the i1 vector over a call into vectors, and that doesn't preserve the register order?