This patch brings better splat-matching to our VP support, by sinking
splat operands of VP intrinsics back into the same block as the VP
operation. The list of VP intrinsics we are interested in matches that
of the regular instructions.
Some optimization is still lacking. For instance, our VL nodes aren't
recognized as commutative, so splats must be on the RHS. Because of
this, we limit our sinking of splats to just the RHS operand for now.
Improvement in this regard can come in another patch.
Ah hah, I see what's going on here. Our VL patterns are commutative, but only the unmasked ones. The masked ones use V0 which is a subclass of Register and so is skipped during the commutative NC calculation in CodeGenDAGPatterns's GenerateVariantsOf. Then, since NC != N->getNumChildren(), the commutative variants aren't generated.
I don't fully understand this part about Register leaves but isn't it sufficient to check whether the first 2 or 3 operands aren't Registers (the actual commutable operands) and let the tail operands do as they wish?