I suggested this change in D7898: hoist the lowerVectorShuffleAsElementInsertion() call into lower256BitVectorShuffle().
It improves the v4i64 case although not optimally. This AVX codegen:
vmovq {{.*#+}} xmm0 = mem[0],zero vxorpd %ymm1, %ymm1, %ymm1 vblendpd {{.*#+}} ymm0 = ymm0[0],ymm1[1,2,3]
Becomes:
vmovsd {{.*#+}} xmm0 = mem[0],zero
Unfortunately, this doesn't completely solve PR22685. There are still at least 2 problems under here:
- We're not handling v32i8 / v16i16.
- We're not getting the FP / int domains right for instruction selection.
But since this patch alone appears to do no harm, reduces code duplication, and helps v4i64, I'm submitting this patch ahead of fixing the above.
So, this is what you meant when you said that we don't get the correct fp/int domain.
In X86InstrSSE.td we have patterns like this:
Do you plan to send a follow-up patch to fix tablegen patterns so that VMOVQI2PQIrm is used instead of VMOVSDrm for the integer domain?. If that's the case, then it makes sense to commit this patch first and fix the fp/int domain issue in a separate patch.