We generate much better code these days than we used to. And we use the same sequence for AVX1 and AVX2 for these
For v4i64->v4i32 we generate
vextractf128 xmm1, ymm0, 1 vshufps xmm0, xmm0, xmm1, 136 # xmm0 = xmm0[0,2],xmm1[0,2]
And for v8i64->v8i32 we generate
vperm2f128 ymm2, ymm0, ymm1, 49 # ymm2 = ymm0[2,3],ymm1[2,3] vinsertf128 ymm0, ymm0, xmm1, 1 vshufps ymm0, ymm0, ymm2, 136 # ymm0 = ymm0[0,2],ymm2[0,2],ymm0[4,6],ymm2[4,6]