Implement traversal algorithm to match trees to i8 vperms. For ors that can be combined into perms, we expect to see some pattern that combine four 8 bit operands (actually 16 bit operands, with only 8 nonzero bits) into two 16 bit operand, and combine thes two 16 bit operands via the or (after an zext, and ext-shift), The trees that do this type of combination are one of the two classes of trees relevant, and are matched in calculateByteProvider. The 8 bit operands used in this tree are typically produced via an AND op or a SRL op, and are the leaves of the trees in calculateByteProvider. The other relevant class of trees are those that map a leaf of calculateByteProvider to an ultimate source. This class of trees is matched in calculateSrcByte.
Through this recusive process, we track an Index (SrcIndex in calculateSrcByte) which is the byte of the current op that maps to the byte of the dest of the or we are currently mapping. For example, the 4th byte of the dest of SHL Src, 16 maps to the 2nd byte of Src. Through basic rules like this we can map src bytes to the dest byte of the or. Using this mapping we can create perm masks.
Much of the code for calculateByteProvider was borrowed from CodeGen/SelectionDAG/DAGCombiner.cpp (MatchLoadCombine). There are still many candidate trees that can be matched into perms that this patch does not attempt to. Those are saved for future iterations.
Depends on: https://reviews.llvm.org/D143018
Can you keep this as a generic utility?