For AMDGPU, if an operand requires an SGPR but is only available as a
VGPR, a loop needs to be introduced until to execute the instruction
with each unique combination of values across all lanes. The rest of
the instructions in the block will be moved to a new block following
the loop. Check if the next instruction's parent changed, and update
the iterators and insertion block if this happened.
Tests will be included in a future patch.
Does this potentially breaks the iterators of RPOT?