This is still work in progress, as the interface for
UnrolledInstAnalyzer needs improving and also the code-size impact is
all over the place (see results for X86 -Oz -flto). Presumably the issue
is that rotating more loops exposes more loops to other loop
optimizations, which do not respect -Oz properly. For example, in the
attached test case, the loop in test2 will get fully unrolled.
Program master patch diff
test-suite...ing-flt/Equivalencing-flt.test 17144 21240 23.9%
test-suite...ing-dbl/Equivalencing-dbl.test 21240 17144 -19.3%
test-suite...pps-C/SimpleMOC/SimpleMOC.test 26256 22160 -15.6%
test-suite...cations/hexxagon/hexxagon.test 29916 25772 -13.9%
test-suite...langs-C/football/football.test 40320 44416 10.2%
test-suite...abench/jpeg/jpeg-6a/cjpeg.test 69648 73784 5.9%
test-suite...ce/Benchmarks/PAQ8p/paq8p.test 89404 93500 4.6%
test-suite...INT95/132.ijpeg/132.ijpeg.test 96772 100972 4.3%
test-suite...000/186.crafty/186.crafty.test 176188 180452 2.4%
test-suite...nsumer-lame/consumer-lame.test 189016 185032 -2.1%
test-suite...ications/JM/ldecod/ldecod.test 254648 249680 -2.0%
test-suite...ProxyApps-C++/CLAMR/CLAMR.test 244976 240696 -1.7%
test-suite...006/447.dealII/447.dealII.test 609760 603200 -1.1%
test-suite...nch/pcompress2/pcompress2.test 14500 14652 1.0%
test-suite...Rodinia/backprop/backprop.test 12828 12932 0.8%