This patch lowers the _mm[256|512]_cvtepi{64|32|16}_epi{32|16|8} intrinsics to native IR in cases where the result's length is less than 128 bits.
The resulting IR for 256-bit inputs is folded into VPMOV instructions in D46957, while for 128-bit inputs the vpshufb (or, in the 64-to-32-bit case, vinsertps) instructions are generated instead. D48822 adds fast-isel tests that demonstrate generated instructions.