This is an archive of the discontinued LLVM Phabricator instance.

[LSR] Attempt to increase the accuracy of LSR's setup cost
ClosedPublic

Authored by dmgreen on Feb 28 2019, 3:09 AM.

Details

Summary

In some loops, we end up generating loop induction variables that look like:
{(-1 * (zext i16 (%i0 * %i1) to i32))<nsw>,+,1}
As opposed to the simpler:
{(zext i16 (%i0 * %i1) to i32),+,-1}
i.e we count up from -limit to 0, not the simpler counting down from limit to 0. This is because the scores, as LSR calculates them, are the same and the second is filtered in place of the first. We end up with a redundant SUB from 0 in the code.

This patch tries to make the calculation of the setup cost a little more thorough, recursing into the scev members to better approximate the setup required. The cost function for comparing LSR costs is:

return std::tie(C1.NumRegs, C1.AddRecCost, C1.NumIVMuls, C1.NumBaseAdds,
                C1.ScaleCost, C1.ImmCost, C1.SetupCost) <
       std::tie(C2.NumRegs, C2.AddRecCost, C2.NumIVMuls, C2.NumBaseAdds,
                C2.ScaleCost, C2.ImmCost, C2.SetupCost);

So this will only alter results if none of the other variables turn out to be different.

I've ran benchmarks and codesize on ARM and AArch64, but showed minor improvements in performance and some codesize improvements.

However, this does seem to alter some of the tests in hexagon, in ways that I'm not 100% sure for these tests are "better". I think swp-carried has too many undef's for it to calculate the costs correctly. swp-epilog-phi5.ll now has a "loop0" and a "loop1" which may mean it's not pipelined any more? And two-combinations-bug.ll isn't showing the same behaviour as the bug is trying to test, so I've turned this extra cost calculation off there. My understanding is that because we are only altering the SetupCost, the last in the list of compared variables, this shouldn't really be making the loops worse in most cases.

Diff Detail

Event Timeline

dmgreen created this revision.Feb 28 2019, 3:09 AM

Hi David,

The Hexagon related test changes look good to me. Neither of these test changes indicate that your patch causes any problems. In swp-carried-1.ll, the hardware loop is no longer generated. In this case, I should just change the test to use a real value for the initial loop induction variable, %v4, instead of undef, say 0 (I can do that later). In swp-epilog-phi5.ll, the compiler is now generating an extra, innermost, hardware loop, so that's why the test changes from loop0 to loop1. The loop1/endloop1 instructions represent an outer hardware loop. This is another test that I should fix since subtle changes in the generated code cause the compiler to create/not create a hardware loop. Thanks for letting us know about the changes to the Hexagon tests. Let me know if you have any further questions.

Thanks,
Brendon

Hi Dave,

Is the lsr-setupcost test new? I don't see it in my working directory.

lib/Transforms/Scalar/LoopStrengthReduce.cpp
1224

Shouldn't we have this before the first recursive call?

1303

Use a lamda instead?

The Hexagon related test changes look good to me. Neither of these test changes indicate that your patch causes any problems. In swp-carried-1.ll, the hardware loop is no longer generated. In this case, I should just change the test to use a real value for the initial loop induction variable, %v4, instead of undef, say 0 (I can do that later). In swp-epilog-phi5.ll, the compiler is now generating an extra, innermost, hardware loop, so that's why the test changes from loop0 to loop1. The loop1/endloop1 instructions represent an outer hardware loop. This is another test that I should fix since subtle changes in the generated code cause the compiler to create/not create a hardware loop.

Thanks. I didn't realise there could be nested hardware loops. Good to see there are not worse. Are you OK with me leaving them as they are here, and you fixing them as you think is best? I had a go at fixing swp-carried-1.ll, but wasn't sure how best to keep testing the old behaviour without the -lsr-recursive-setupcost=0 option, even after removing the undefs. It was using a scev like {(128 + (-1 * undef)),+,-1} before.

Is the lsr-setupcost test new? I don't see it in my working directory.

Yep, that's not is tree yet. I was trying to show the differences caused by the patch. (It would have been clearer if I'd have explained that.)

lib/Transforms/Scalar/LoopStrengthReduce.cpp
1224

It's was trying to replicate the behaviour from before this patch, so allowing the recursion into the addrec. EnableRecursiveSetupCost may not have been the best name for that exactly. I'll update the code to match the name better.

1303

It needs to be recursive (and recursive into the lambda in std::accumulate). I got tired of trying to make that work and pulled it out into a separate function.

dmgreen updated this revision to Diff 189140.Mar 4 2019, 7:03 AM

Switchup the position of EnableRecursiveSetupCost.

samparker added inline comments.Mar 5 2019, 12:05 AM
lib/Transforms/Scalar/LoopStrengthReduce.cpp
1224

Fair enough, then maybe we should still check for a constant start for the AddRec even when recursion isn't enabled?

Thanks. I didn't realise there could be nested hardware loops. Good to see there are not worse. Are you OK with me leaving them as they are here, and you fixing them as you think is best?

Yes. I'm ok with leaving them the way that you have changed them. I'll update the tests later.

Thanks,
Brendon

dmgreen added inline comments.Mar 7 2019, 4:01 AM
lib/Transforms/Scalar/LoopStrengthReduce.cpp
1224

Not sure. This version here is probably cleanest going forward. The earlier version is more like the old behaviour, but not identical.

I don't think it's worth replicating the old behaviour exactly, if it's going to look ugly with explicitly checking AddRec start's, as opposed to just relying on the recursion. Unless we see regressions come up from this, which I don't think we should.

But I don't have a strong opinion. Let me know what you think.

samparker accepted this revision.Mar 7 2019, 4:20 AM

I agree this looks cleaner. LGTM.

This revision is now accepted and ready to land.Mar 7 2019, 4:20 AM

Thanks Sam. And thanks Brendon!

This revision was automatically updated to reflect the committed changes.
Herald added a project: Restricted Project. · View Herald TranscriptMar 7 2019, 5:46 AM
Herald added a subscriber: hiraditya. · View Herald Transcript