This fixes PR15872, the code improves from:
vpextrd $1, %xmm0, %eax vmovd %xmm0, %ecx vmovd %ecx, %xmm1 vpinsrd $1, %eax, %xmm1, %xmm1 vextractf128 $1, %ymm0, %xmm2 vmovd %xmm2, %eax vpinsrd $2, %eax, %xmm1, %xmm1 vpextrd $1, %xmm2, %eax vpinsrd $3, %eax, %xmm1, %xmm1 vpextrd $3, %xmm0, %eax vpextrd $2, %xmm0, %ecx vmovd %ecx, %xmm0 vpinsrd $1, %eax, %xmm0, %xmm0 vpextrd $2, %xmm2, %eax vpinsrd $2, %eax, %xmm0, %xmm0 vpextrd $3, %xmm2, %eax vpinsrd $3, %eax, %xmm0, %xmm0 vmovdqa %xmm1, (%rdi) vzeroupper retq
to
vextractf128 $1, %ymm0, %xmm1 vpunpcklqdq %xmm1, %xmm0, %xmm2 # xmm2 = xmm0[0],xmm1[0] vpunpckhqdq %xmm1, %xmm0, %xmm0 # xmm0 = xmm0[1],xmm1[1] vmovdqa %xmm2, (%rdi) vzeroupper retq
Sanjay has a fix for PR21711 which apparently has the same underlying issue here: http://reviews.llvm.org/D6622
This version is more general, but it may be too general, I'm fine with anything in this vein that fixes both PRs.
This isn't passing 'make check' for me. On an AVX2 machine, we generate 'vextracti128' (the integer flavor of the extract instruction).