Before, loop unrolling was only enabled for loops with a single block. This restriction has been removed and replaced by:
- allow a maximum of two exiting blocks,
- a four basic block limit for cores with a branch predictor.
Paths
| Differential D38952
[ARM] Allow unrolling on multi-block loops. ClosedPublic Authored by samparker on Oct 16 2017, 6:18 AM.
Details Summary Before, loop unrolling was only enabled for loops with a single block. This restriction has been removed and replaced by:
Diff Detail Event TimelineHerald added subscribers: kristof.beyls, javed.absar, aemerson. · View Herald TranscriptOct 16 2017, 6:18 AM Comment Actions Did you do some performance experiments showing that those settings are better than the original ones? If so, could you share some numbers? Comment Actions We don't run LNT on these cores, but I have numbers from running some industry standard benchmarks, but unfortunately I cannot share exact details. I've observed that half of the benchmarks gain a speedup and there's only one regression. The speedups are between 1 and 20%, with half of those being 5% and over.
Comment Actions Hi Eli, Thanks for taking a look, I've added comments to explain the magic numbers. cheers,
This revision is now accepted and ready to land.Oct 20 2017, 12:47 PM Closed by commit rL316313: [ARM] Allow unrolling of multi-block loops. (authored by sam_parker). · Explain WhyOct 23 2017, 1:05 AM This revision was automatically updated to reflect the committed changes.
Revision Contents
Diff 119148 lib/Target/ARM/ARMTargetTransformInfo.cpp
test/Transforms/LoopUnroll/ARM/multi-blocks.ll
|
The magic numbers "2" and "4" need comments explaining them.