As older waves execute long sequences of VALU instructions, this may
prevent younger waves from address calculation and then issuing their
VMEM loads, which in turn leads the VALU unit to idle. This patch tries
to prevent this by temporarily raising the wave's priority.
A few notes and questions:
- This intentionally avoids introducing any counting of VALU instructions as at this moment it's not entirely clear how we would want to do that in presence of branches and loops. However, the patch aims to make adding support for such counting as simple as possible by separating identification of VMEM loads we want to take into account from the rest of the logic.
- The implementation assumes that one s_setprio 0 followed by another one is acceptable when edge splitting would be the only way to avoid this.
- I understand we want this be enabled by default, but left the pass disabled for now until it's in production state -- for the sake of not touching tests that should not normally be affected by the pass.
- If the s_setprio instruction is not universally available, I guess we may want to make sure the selected target does actually support it?
- What do we do if s_setprios already present in the input code? Would doing nothing be acceptable and reasonable in such a case?
It's a bit more normal to put "Enable" options in AMDGPUTargetMachine, where you can use isPassEnabled.