The instruction addi is usually used to post increase the loop indvar, which looks like this:
label_X: load x, base(i) ... y = op x ... i = addi i, 1 goto label_X
However, for PowerPC, if there are too many vsx instructions that between y = op x and i = addi i, 1, it will use all the hw resource that block the execution of i = addi, i, 1, which result in the stall of the load instruction in next iteration. So, a heuristic is added to move the addi as early as possible to have the load hide the latency of vsx instructions, if other heuristic didn't apply to avoid the starve.
Our internal loops testing shows improvement with this change.