[x86] Enable some support for lowerVectorShuffleWithUndefHalf with AVX-512
This teaches 512-bit shuffles to detect unused halfs in order to reduce shuffle size.
We may need to refine the 512-bit exit point. I couldn't remember if we had good cross lane shuffles for 8/16 bit with AVX-512 or not.
I believe this is step towards being able to handle D36454 without a special case.
From here we need to improve our ability to combine extract_subvector with insert_subvector and other extract_subvectors. And we need to support narrowing binary operations where we don't demand all elements. This may be improvements to DAGCombiner::narrowExtractedVectorBinOp(by recognizing an insert_subvector in addition to concat) or we may need a target specific combiner.
Reviewers: RKSimon, zvi, delena, jbhateja
Reviewed By: RKSimon, jbhateja
Differential Revision: https://reviews.llvm.org/D36601