[x86] Lower some trunc + shuffle patterns to vpmov[q|d][b|w]
This should help in lowering the following four intrinsics:
_mm256_cvtepi32_epi8
_mm256_cvtepi64_epi16
_mm256_cvtepi64_epi8
_mm512_cvtepi64_epi8
Differential Revision: https://reviews.llvm.org/D46957