Patch to add support for target shuffle combining of X86ISD::VPERMV3 nodes, including support for detecting unary shuffles.
This uncovered several issues with the X86ISD::VPERMV shuffle mask decoding of non-64 bit shuffle mask elements - the bit masking wasn't being correctly computed. A similar issue exists with X86ISD::VPERMV nodes that will need to be fixed when this is setup as well, but I will create repro test cases before pushing another patch for review.
Let's assume that VT is v16i32. RawMask.size() = 16
log2(16*2) = log2(32) = 5
M &= 5 ??
I suppose it should be: Mask &= (RawMask.size() * 2 -1)