Some shuffles can be lowered to blend mask instruction (VPBLENDMB/VPBLENDMW/VPBLENDMD/VPBLENDMQ) .
In this patch, I added new pattern match for this case.
This pattern only catches zmm, since we are using a more efficient blend instruction (without a mask) in the other cases.
you can use getVectorMaskingNode to simplify the code, all logic already implemented.