Ensure we only put the exec modification in a terminator instruction.
The trickiest part of this was dealing with SI_KILL_CLEANUP, which I
don't fully understand. It tries to preserve it as a terminator and
avoids removing it.
arsenm on Sep 11 2020, 12:35 PM.Authored by
SI_KILL_CLEANUP pseudos are inserted to mark points where control flow merges and hence the exec mask can be evaluated for early termination of a pixel shader.
It's unclear to me what this is trying to achieve. If it is to prevent
bb: <-- reload inserted here during live range splitting $exec = S_OR_B64 $exec, %other ... rest of code ...
... then this change only replaces it by:
bb: <-- reload inserted here during live range splitting $exec = S_OR_B64_term $exec, %other // fallthrough bb.new: ... rest of code ...
The inserted reload code is as incorrect as it was before.
I'm not trying to fully solve the live range splitting problem greedy regalloc hits. I'm trying to eliminate the isBasicBlockPrologue concept that fastregalloc trips over when inserting spills at the beginning of the block
What if the concept of a basic block prolog *is* the correct long term solution?
I still not not see how you can get away without isBasicBlockPrologue. I can see how splitting can help with it, but not without. You can split everything and have S_OR the only instruction, but that does not prevent RA from inserting a reload right before it into the same BB.