In si-fix-sgpr-copies pass, lowering of COPY instruction (vgpr to sgpr) to VALU or to v_readfirstlane_b32 is done. It is decided based on the SALU instructions users of result of COPY. It misses the case where the use of result of COPY need to be scalar register only. Example: In buffer instructions, there are scalar operands (srsrc, sOffset) which will only accept scalar registers.
This change lowers the vgpr2sgpr copies to use v_readfirstlane_b32, for scalar operands of MUBUF/MTBUF.
This isn't really specific to MUBUF instructions; it's any operand that has to be scalar. We have to waterfall calls as well