We already use vmovq for v2i64/v2f64 vzmovl. But we were using a
blend with 0 for v4i64/v4f64 and vmovsd with 0 for v8i64/v8f64.
I think the blend with 0 or scalar movss/d is only needed for
vXi32 where we don't have an instruction that can move 32
bits from one xmm to another while zeroing upper bits.