This patch lowers the _mm[256|512]_cvtepi{64|32|16}_epi{32|16|8} intrinsics to native IR in cases where the result's length is less than 128 bits.
The resulting IR for 256-bit inputs is folded into VPMOV instructions in D46957, while for 128-bit inputs the vpshufb (or, in the 64-to-32-bit case, vinsertps) instructions are generated instead. D48822 adds fast-isel tests that demonstrate generated instructions.
Are we happy with using illegal types like this? What about flipping the shuffle and convert?
return (__m128i)__builtin_convertvector( __builtin_shufflevector((__v8hi)__A, (__v8hi){0, 0, 0, 0, 0, 0, 0, 0}, 0, 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15), __v16qi);