vperm2x128 instructions have the special ability (aka free hardware capability) to shuffle zero values into a vector.
This patch recognizes that type of shuffle and generates the appropriate control byte.
Note: I have a follow-on patch to convert vperm2 intrinsics with zero masks into generic shuffles. That should close the loop on this special-purpose x86 permute.