This replaces the MOVMSK combine introduced at D52121/rL342326
(movmsk (setne (and X, (1 << C)), 0)) -> (movmsk (X << C))
with the more general icmp lowering so it can pick up more cases through bitcasts - notably vXi8 cases which use vXi16 shifts+masks, this patch can remove the mask and use pcmpgtb(0,x) for the sra.