The 64 bit BUILD_VECTOR is normally going to be promoted to 128 bits by widening each element. This changes the size of each element and requires a shuffle to compress it back down to 64-bits. We also do the copy from XMM to MMX through memory.
This patch pads the vector with undef elements and emits and explicit XMM->MMX copy operation.