This patch extends the existing lowerShuffleWithVPMOV with a lowerShuffleWithVTRUNC wrapper that handles basic binary shuffles that can be lowered either as a pure ISD::TRUNCATE or a X86ISD::VTRUNC (with undef/zero values in the remaining upper elements).
We concat the binary sources together into a single 256-bit source vector. To avoid regressions we perform this after we've tried to lower with PACKS/PACKUS which typically does a cleaner job than a concat.
For non-AVX512VL cases we have to canonicalize VTRUNC cases to use a 512-bit source vectors (inserting undefs/zeros in the upper elements as necessary), truncate and then (possibly) extract the 128-bit result.
This should address the last regressions in D66004
When does this condition happen? Doesn't Mask always follow VT in shuffle lowering?