This is an archive of the discontinued LLVM Phabricator instance.

[ARM] Allow unrolling on multi-block loops.
ClosedPublic

Authored by samparker on Oct 16 2017, 6:18 AM.

Details

Summary

Before, loop unrolling was only enabled for loops with a single block. This restriction has been removed and replaced by:

  • allow a maximum of two exiting blocks,
  • a four basic block limit for cores with a branch predictor.

Diff Detail

Event Timeline

samparker created this revision.Oct 16 2017, 6:18 AM
fhahn added a subscriber: fhahn.Oct 16 2017, 6:22 AM
fhahn added a comment.Oct 16 2017, 6:25 AM

Did you do some performance experiments showing that those settings are better than the original ones? If so, could you share some numbers?

We don't run LNT on these cores, but I have numbers from running some industry standard benchmarks, but unfortunately I cannot share exact details. I've observed that half of the benchmarks gain a speedup and there's only one regression. The speedups are between 1 and 20%, with half of those being 5% and over.

efriedma added inline comments.
lib/Target/ARM/ARMTargetTransformInfo.cpp
606

The magic numbers "2" and "4" need comments explaining them.

628

Would it make sense to break out of this loop early if "Cost" is 12 or more?

samparker added inline comments.Oct 17 2017, 2:58 AM
lib/Target/ARM/ARMTargetTransformInfo.cpp
628

No, we still want to allow partial and runtime unrolling and the default thresholds in the unroller work well for loops which the unroller understands. The cost is used to force the unroller to do its thing on small loops that it otherwise wouldn't unroll.

samparker updated this revision to Diff 119276.Oct 17 2017, 3:00 AM

Hi Eli,

Thanks for taking a look, I've added comments to explain the magic numbers.

cheers,
sam

efriedma added inline comments.Oct 17 2017, 11:20 AM
lib/Target/ARM/ARMTargetTransformInfo.cpp
628

I just looked again. As far as I can tell, the only use of "Cost" is the "if (Cost < 12)" check.

samparker added inline comments.Oct 18 2017, 1:36 AM
lib/Target/ARM/ARMTargetTransformInfo.cpp
628

Yes, sorry, somehow I managed to misread your question. We don't want to break early because a call maybe discovered once Cost > 12, in this case we could still unroll and prevent inlining of the call.

This revision is now accepted and ready to land.Oct 20 2017, 12:47 PM
This revision was automatically updated to reflect the committed changes.