This fixes a case where the argument to a sendmsg intrinsic
ends up in a VGPR, for whatever reason.
The underlying performance issue is that a multiplication that
can be an s_mul_i32 is instead needlessly generated as
v_mul_u32_u24, but this is not addressed by this patch.
Change-Id: I61fd4034314d5acdf6074632c30b65364dfa7328
Do not you want to return SRegs[0] here (after the loop) if SubRegs == 1 and if not move creation of DstReg after that? It will save some code duplication.