This is an archive of the discontinued LLVM Phabricator instance.

Moving loop vectorization pass before loop-unroller.
Needs RevisionPublic

Authored by laxmansole on Jun 20 2016, 12:46 PM.

Details

Summary

Loop with constant trip count get unrolled first due to which it does not get vectorized. Hence moving loop-vectorizer before loop unroller. It will fix the following bugs with 2-5% performance improvement on 300.twolf, 253.perlbmk, 256.bzip2, 176.gcc and 1-2% regression on 175.vpr and 252.eon while running on AArch64:

https://llvm.org/bugs/show_bug.cgi?id=25748
https://llvm.org/bugs/show_bug.cgi?id=28090

Benchmarks %improvement (lower is better) compared to llvm-trunk(4dfe58af9)
External/SPEC/CFP2000/177.mesa/177.mesa_secs 0.51
External/SPEC/CFP2000/179.art/179.art_secs 0.73
External/SPEC/CFP2000/183.equake/183.equake_secs 0.72
External/SPEC/CFP2000/188.ammp/188.ammp_secs 0.14
External/SPEC/CINT2000/164.gzip/164.gzip_secs -1.48
External/SPEC/CINT2000/175.vpr/175.vpr_secs 2.04
External/SPEC/CINT2000/176.gcc/176.gcc_secs -2.18
External/SPEC/CINT2000/181.mcf/181.mcf_secs -0.54
External/SPEC/CINT2000/186.crafty/186.crafty_secs 0.34
External/SPEC/CINT2000/197.parser/197.parser_secs 0.61
External/SPEC/CINT2000/252.eon/252.eon_secs 1.80
External/SPEC/CINT2000/253.perlbmk/253.perlbmk_secs -3.65
External/SPEC/CINT2000/254.gap/254.gap_secs -0.91
External/SPEC/CINT2000/255.vortex/255.vortex_secs 0.21
External/SPEC/CINT2000/256.bzip2/256.bzip2_secs -3.35
External/SPEC/CINT2000/300.twolf/300.twolf_secs -5.65

Worked in collaboration with Aditya Kumar and Sebastian Pop.

Diff Detail

Event Timeline

laxmansole retitled this revision from to Moving loop vectorization pass before loop-unroller..
laxmansole updated this object.
sebpop added inline comments.Jun 20 2016, 12:50 PM
llvm/test/Transforms/LoopVectorize/LoopWithConstTripCount1.ll
19

Misspell CHECK-LABLE.

llvm/test/Transforms/LoopVectorize/LoopWithConstTripCount2.ll
25

Misspell CHECK-LABEL.

laxmansole updated this revision to Diff 61297.Jun 20 2016, 1:16 PM
laxmansole marked 2 inline comments as done.
hfinkel requested changes to this revision.Jun 20 2016, 11:25 PM
hfinkel edited edge metadata.

This is not the right solution to this problem. Loop vectorization is not a canonicalization pass, but a lowering pass, and should not be moved into the canonicalization part of the pipeline.

Should these loops be fully unrolled at all? The targets can influence the thresholds used for unrolling, and perhaps those need better tuning. After a loop is fully unrolled, if we're missing vectorization opportunities, why is the SLP vectorizer not catching them?

This revision now requires changes to proceed.Jun 20 2016, 11:25 PM