Given a bundle of binary operations, we normally try to vectorize their operands by bundling all operand-zero values together and bundling all operand-one values together. Thus, assuming all operands are instructions, each operand bundle must have the same opcode to be vectorizable. If this is not the case, the operand bundles must be gathered. Instead of gathering, we can try to "transpose" the operand bundles such that each resulting bundle will have a single opcode.
This patch adds the ability to transpose binary operand bundles to enable vectorization. When we transpose the operand bundles, we bundle together all operands of one or more instructions, rather than bundling together a single operand (e.g, operand-zero) of several instructions. This bundling is similar to that performed by tryToVectorizePair. If a transpose is performed while building a vectorizable tree, we must re-transpose the vectors at run-time to restore the correct bundling. The transpose is performed with shufflevector instructions that read corresponding even- or odd-numbered vector elements from two n-dimensional source vectors and write each result into consecutive elements of an n-dimensional destination vector. The cost of these added shufflevector instructions is included in the cost model.
Please refer to the modified test cases for examples.
Could createStrideMask be (re)used?