This addresses a significant portion of regressions that would otherwise appear in D140677.
@RKSimon this seems obviously good in general overall,
but there are some dubious changes here, at mostly for SSE2:
we fail to simplify some and/andn masks,
and pull identical target-specific shuffles out of commutative opcodes.
Please can you indicate which of the test changes should be dealt with?
Instead of the count() - it might be worth adding a bool arg to isSplat()/isSplatMask() to only match splats with more than a single matching element?