This patch promotes i1 shuffles to a wider type during DAG combine instead of lowering. This gives more opportunities for the truncate and extends needed by this process to be combined with other operations.
There is a regression on v32i1 in here. I'm hoping D42031 will help with that, but I haven't checked yet.
I had to add an additional target independent DAG combine to combine vector shuffle broadcasts and insert_element to the broadcasted element. Normally we catch this by turning the insert_element into a build_vector and then the shuffle will combine with that. But the new X86 combine here was promoting the vector_shuffle in some cases before we got a chance to convert the insert_element to build_vector so we missed the combine.