If C0 is a mask and C1 shifts out all the masked bits (to
essentially compare two subsets of X), we can arbitrarily re-order
shift as srl or shl.
If C1 (shift amount) is a power of 2, we can replace the and+shift
with a rotate.
Otherwise, based on target preference we can arbitrarily swap shl
and shl in/out to get better constants.
On x86 we can use this re-ordering to:
- get better and constants for C0 (zero extended moves or avoid imm64).
- covert srl to shl if shl will be implementable with lea or add (both of which can be preferable).