This is an archive of the discontinued LLVM Phabricator instance.

[PowerPC] Add profitablilty check for conversion to mtctr loops
ClosedPublic

Authored by lei on Sep 22 2017, 5:45 PM.

Details

Summary

Add profitability checks for modifying counted loops to use the mtctr instruction.

The latency of mtctr is only justified if there are more than 4 comparisons that will be removed as a result.

Usually counted loops are formed relatively early and before unrolling, so most low trip count loops often don't survive. However we want to ensure that if they do, we do not mistakenly update them to mtctr loops.

Diff Detail

Repository
rL LLVM

Event Timeline

lei created this revision.Sep 22 2017, 5:45 PM
hfinkel added inline comments.Sep 25 2017, 10:09 AM
test/CodeGen/PowerPC/ctr-minmaxnum.ll
63 ↗(On Diff #116448)

Please update the trip counts on these tests so we don't lose coverage.

lei added inline comments.Sep 25 2017, 3:04 PM
test/CodeGen/PowerPC/ctr-minmaxnum.ll
63 ↗(On Diff #116448)

This test case verifies that for short loops with a trip count of 2, we don't generate mtctr for for pwr7+ but we do generate mtctr for bluegene. On power, these short loops are usually unrolled and never make it this far so naturally we don't see mtctr. However now that we put in a check to not use mtctr for short loops, we should not see this instruction anymore even for short loops which were not unrolled previously. As far as I can tell this should be the desired behaviour now. Unless I am missing something, I don't think we are losing any coverage here.

I have added test case below to ensure we still generate the mtctr loops for trip counts >= 4.

nemanjai added inline comments.Sep 25 2017, 3:36 PM
test/CodeGen/PowerPC/ctr-minmaxnum.ll
63 ↗(On Diff #116448)

Well, there are still two open questions here:

  • Is this something we want on the A2 cores or would we always want an mtctr?
  • can we add a "trip count >= 4" test case for the A2 core as well?
lei updated this revision to Diff 116629.Sep 25 2017, 6:42 PM

Since we are preventing the compiler from converting short loops to ctr loops, updating the ctr-minmaxnum.ll checks to be more restrictive and add default behaviour tests for the A2 core.

hfinkel added inline comments.Sep 25 2017, 8:32 PM
test/CodeGen/PowerPC/ctr-minmaxnum.ll
63 ↗(On Diff #116448)

I'm not entirely sure I understand the premise for doing exactly this in any case. I understand that short loops are generally unrolled (for all cores). However:

  • I imagine this is not really optimal for the A2. mtctr has a 6-cycle latency, so the loop would need to be really small in order for that to cause a hazard.
  • Even on the POWER cores, is this really a function of a small trip count along, or also having a small loop? You can have a large loop with a small trip count, and will want to do this transformation in order to reduce register pressure within the loop.

In short, I suspect we'll want to restrict this to only some cores, and we'll want to use CodeMetrics to limit this to only do something on small loops. See, for example, ApproximateLoopSize in LoopUnrollPass.cpp.

lei updated this revision to Diff 117022.Sep 28 2017, 10:59 AM

Use CodeMetrics to ensure we are only doing this for small loops with small trip counts.

hfinkel added inline comments.Oct 3 2017, 8:13 PM
lib/Target/PowerPC/PPCCTRLoops.cpp
495 ↗(On Diff #117022)

Please add a comment here that 6 is an approximate latency for the mtctr instruction.

test/CodeGen/PowerPC/ctr-minmaxnum.ll
63 ↗(On Diff #116448)

Please increase the trip counts of these loops so they still test what they're supposed to test (i.e. whether we can form a ctr-based loop with these various functions calls in the loop based on our knowledge of how these will be lowered (i.e., some will be lowered using instructions, not actual calls, depending on the enabled target features, enabling us to nevertheless use the ctr-based loop)).

lei updated this revision to Diff 117917.Oct 5 2017, 3:38 PM

Address review comments.

lei marked 2 inline comments as done.Oct 5 2017, 3:38 PM
hfinkel accepted this revision.Oct 5 2017, 3:41 PM

LGTM

lib/Target/PowerPC/PPCCTRLoops.cpp
124 ↗(On Diff #117917)

Spaces around the =.

This revision is now accepted and ready to land.Oct 5 2017, 3:41 PM
lei updated this revision to Diff 117935.Oct 5 2017, 4:43 PM

tc update.

echristo accepted this revision.Oct 5 2017, 5:42 PM
echristo added inline comments.
lib/Target/PowerPC/PPCCTRLoops.cpp
124 ↗(On Diff #117917)

The initialization here looks a little weird since it's not anywhere else...

lei added inline comments.Oct 11 2017, 3:01 PM
lib/Target/PowerPC/PPCCTRLoops.cpp
124 ↗(On Diff #117917)

Yes, this is not needed. I will remove the initialization prior to committing this patch.

This revision was automatically updated to reflect the committed changes.