This fixes PR15872, the code improves from:
vpextrd $1, %xmm0, %eax vmovd %xmm0, %ecx vmovd %ecx, %xmm1 vpinsrd $1, %eax, %xmm1, %xmm1 vextractf128 $1, %ymm0, %xmm2 vmovd %xmm2, %eax vpinsrd $2, %eax, %xmm1, %xmm1 vpextrd $1, %xmm2, %eax vpinsrd $3, %eax, %xmm1, %xmm1 vpextrd $3, %xmm0, %eax vpextrd $2, %xmm0, %ecx vmovd %ecx, %xmm0 vpinsrd $1, %eax, %xmm0, %xmm0 vpextrd $2, %xmm2, %eax vpinsrd $2, %eax, %xmm0, %xmm0 vpextrd $3, %xmm2, %eax vpinsrd $3, %eax, %xmm0, %xmm0 vmovdqa %xmm1, (%rdi) vzeroupper retq
to
vextractf128 $1, %ymm0, %xmm1 vpunpcklqdq %xmm1, %xmm0, %xmm2 # xmm2 = xmm0[0],xmm1[0] vpunpckhqdq %xmm1, %xmm0, %xmm0 # xmm0 = xmm0[1],xmm1[1] vmovdqa %xmm2, (%rdi) vzeroupper retq
Sanjay has a fix for PR21711 which apparently has the same underlying issue here: http://reviews.llvm.org/D6622
This version is more general, but it may be too general, I'm fine with anything in this vein that fixes both PRs.