Previously we used a custom lowering for this because of the AVX1 splitting requirement. But we can do the split during DAG combine if we check the types and subtarget.
I tried to use SplitBinaryOpsAndApply but the 512-bit path through that code assumes a destination VT with i16 or i8 elements since it checks BWI. But I don't need a BWI check here nor do I need any splitting for 512 bits. So I just made a AVX1 helper. I used SelectionDAG::SplitVector and SelectionDAG::SplitDestVT since those already know how to split in half. Since this is during DAG combine it doesn't need the BUILD_VECTOR check that the extractSubvector helper uses.
The test change is a regression, but its not strictly related to this change. There's a DAG combine in visitBUILD_VECTOR that only runs during the post type legalization DAG combine or the DAG combine after legalizing vector ops. But those DAG combines only run if the legal types or legal vector ops process changed the DAG. The DAG combine I've added here runs before type legalization, there are no illegal types, and no vector ops to legalize so the DAG combine in visitBUILD_VECTOR didn't get a chance to run. I've tried to fix the build_vector DAG combine to work before type legalization, but I've hit other issues that I'm trying to work through. Hopefully we can try to resolve that separately.
You're setting the return type as uint64_t but returning -1 ?