This has a couple of benefits:
- It can sometimes fix clusters that got broken apart when the register allocator inserted a copy.
- Post-RA scheduling does not have to worry about increasing register pressure, which in some cases gives it more freedom to reorder instructions.
Testing on a collection of 10,000 graphics shaders compiled for gfx1010
showed:
- The average length of each run of one or more load instructions increased by about 1%.
- The number of runs of two or more load instructions increased by about 4%.