This is a partial fix for PR21711 ( http://llvm.org/bugs/show_bug.cgi?id=21711 ). When we extract multiple consecutive elements from a vector to create a build_vector, we should try to form an extract_subvector instead of relying solely on getVectorShuffle().
The difference in output for the simplest v4f64 test case looks like this:
vextractf128 $1, %ymm0, %xmm0 vpermilpd $1, %xmm0, %xmm1 ## xmm1 = xmm0[1,0] vunpcklpd %xmm1, %xmm0, %xmm0 ## xmm0 = xmm0[0],xmm1[0] vmovapd %xmm0, (%rdi) vzeroupper retq
Becomes:
vextractf128 $1, %ymm0, (%rdi) vzeroupper retq
We should still fix the shuffle problem in the x86 backend, but I thought it was best to solve the higher-level problem first. There's also a bug in the x86 backend dealing with arbitrary indexing and lowering the EXTRACT_SUBVECTOR node, so I've limited this patch to firing on the (most common?) case of half-vector extractions. This pattern emerges in particular on SandyBridge because it cracks 32-byte memops in half causing mismatches in vector sizes.