Page MenuHomePhabricator

[ARM] Loop unrolling preferences for LOB cores
Needs ReviewPublic

Authored by samparker on Sep 19 2019, 1:55 AM.

Details

Summary

Perform loop unrolling differently for cores with low-overhead branches:

  • Don't unroll the remainder loop, with the expectation that both the unrolled loop and the remainder will be converted into a loloop.
  • Don't force unroll small loops as we now try to use the non-decrement form of LE for uncountable loops. We need to use CBN/Z for that optimisation, so reducing code size is important due to their limited range.

Diff Detail

Event Timeline

samparker created this revision.Sep 19 2019, 1:55 AM

The added tests check exactly what's being changed/added here, so that's excellent.
I was just wondering if it would be good if we also add a more end-to-end/unit llc test, that shows actual loloop creation for these cases? That is, if these tests are not already there. I had only a very quick look, and am not sure, but you'll probably know.

test/Transforms/LoopUnroll/ARM/loop-unrolling.ll
1

nit: --check-prefixes=CHECK-UNROLL-A,CHECK is shorter

48

nit: this whole block looks exactly the same as CHECK-UNROLL-T2, so could be shared.

Cheers. I'm gonna put this patch on hold to do some more investigations into how hardware loops and loop unrolling interact... The current testing for this is in Transforms/HardwareLoops/ARM/structure.ll