This is an archive of the discontinued LLVM Phabricator instance.

[LoopUnroll] Enable advanced unrolling analysis by default.
ClosedPublic

Authored by mzolotukhin on May 20 2016, 12:33 PM.

Details

Summary

This patch turns on LoopUnrollAnalyzer by default. To mitigate compile
time regressions, I chose very conservative thresholds for now. Later we
can make them more aggressive, but it might require being smarter in
which loops we're optimizing. E.g. currently the biggest issue is that
with more agressive thresholds we unroll many cold loops, which
increases compile time for no performance benefit (performance of those
loops is improved, but it doesn't matter since they are cold).

Test results for compile time(using 4 samples to reduce noise):

MultiSource/Benchmarks/VersaBench/ecbdes/ecbdes 5.19%
SingleSource/Benchmarks/Polybench/medley/reg_detect/reg_detect  4.19%
MultiSource/Benchmarks/FreeBench/fourinarow/fourinarow  3.39%
MultiSource/Applications/JM/lencod/lencod 1.47%
MultiSource/Benchmarks/Fhourstones-3_1/fhourstones3_1 -6.06%

I didn't see any performance changes in the testsuite, but it improves
some internal tests.

Diff Detail

Repository
rL LLVM

Event Timeline

mzolotukhin retitled this revision from to [LoopUnroll] Enable advanced unrolling analysis by default..
mzolotukhin updated this object.
mzolotukhin added reviewers: hfinkel, chandlerc.
mzolotukhin added a subscriber: llvm-commits.
chandlerc accepted this revision.May 20 2016, 1:09 PM
chandlerc edited edge metadata.

LGTM.

Maybe work on a patch to use profile info to adjust the thresholds? Once we have that, I think we can help benchmarking a range of thresholds and produce some data about where the sweet spot lies.

This revision is now accepted and ready to land.May 20 2016, 1:09 PM

Hi Chandler,

Thanks for LGTM, I'll commit the patch on Monday as I won't be able to check bots on this weekend.

As for the use of profiling info - it indeed looks like the best way to handle this. Also, I think it would make sense to use it more extensively everywhere, not only in loop-unrolling. This way we can save some compile time by not doing expensive optimizations on code that doesn't matter, and then spend more time optimizing relevant pieces of the code. For now it's just general ideas, but I hope to get to something more real soon.

Thanks,
Michael

This revision was automatically updated to reflect the committed changes.