Inspired by gcc's assembly: https://godbolt.org/z/54hbzsGYn, while referring to D130203
Replace AND+IMM{32,64} with a slli.
But gcc does not handle 0xffff and 0xffffffff, which also seem to be optimizable.
The testcases copies all the bits in D130203 and adds 16, 32, and 64 bits.
is isShiftedMask_64(Mask) && TrailingZeros == 0 the same as isMask_64(Mask)