When splitting 64-bit operations, create the correct
VALU instructions immediately.
This was splitting things like s_or_b64 into the two
s_or_b32s and then pushing the new instructions
onto the worklist. There's no reason we need
to do this intermediate step.