In some loops, we end up generating loop induction variables that look like:
{(-1 * (zext i16 (%i0 * %i1) to i32))<nsw>,+,1}
As opposed to the simpler:
{(zext i16 (%i0 * %i1) to i32),+,-1}
i.e we count up from -limit to 0, not the simpler counting down from limit to 0. This is because the scores, as LSR calculates them, are the same and the second is filtered in place of the first. We end up with a redundant SUB from 0 in the code.
This patch tries to make the calculation of the setup cost a little more thorough, recursing into the scev members to better approximate the setup required. The cost function for comparing LSR costs is:
return std::tie(C1.NumRegs, C1.AddRecCost, C1.NumIVMuls, C1.NumBaseAdds, C1.ScaleCost, C1.ImmCost, C1.SetupCost) < std::tie(C2.NumRegs, C2.AddRecCost, C2.NumIVMuls, C2.NumBaseAdds, C2.ScaleCost, C2.ImmCost, C2.SetupCost);
So this will only alter results if none of the other variables turn out to be different.
I've ran benchmarks and codesize on ARM and AArch64, but showed minor improvements in performance and some codesize improvements.
However, this does seem to alter some of the tests in hexagon, in ways that I'm not 100% sure for these tests are "better". I think swp-carried has too many undef's for it to calculate the costs correctly. swp-epilog-phi5.ll now has a "loop0" and a "loop1" which may mean it's not pipelined any more? And two-combinations-bug.ll isn't showing the same behaviour as the bug is trying to test, so I've turned this extra cost calculation off there. My understanding is that because we are only altering the SetupCost, the last in the list of compared variables, this shouldn't really be making the loops worse in most cases.