Try to keep simple v2s16 cases as-is. This will more naturally map to
how the VOP3P op_sel modifiers work compared to the expansion
involving bitcasts and bitshifts.
This could maybe try harder with wider source vector types, although
that could be handled with a pre-legalize combine.
Given that the function only does what the name implies when the MI is an s16 shuffle, maybe either inline it at the single use site or figure out a better name for it?