This patch extends performMaskedGatherScatterCombine to find gathers
& scatters with a stride of two in their indices, which can be converted
to a pair of contiguous loads or stores with zips & uzps and the
appropriate predicates.
There were no performance improvements found using this combine for scatter
stores of 64 bit data, so we just return SDValue() in this case.
clang-format: please reformat the code