Page MenuHomePhabricator

[Passes] Run peeling as part of simple/full loop unrolling.
AcceptedPublic

Authored by fhahn on Sep 29 2020, 2:18 AM.

Details

Summary

Loop peeling removes conditions from loop bodies that become invariant
after a small number of iterations. When triggered, this leads to fewer
compares and possibly PHIs in loop bodies, enabling further
optimizations. The current cost-model of loop peeling should be quite
conservative/safe, i.e. only peel if a condition in the loop becomes
known after peeling.

For example, see PR47671, where loop peeling enables vectorization by
removing a PHI the vectorizer does not understand. Granted, the
loop-vectorizer could also be taught about constant PHIs, but loop
peeling is likely to enable other optimizations as well.

This has an impact on quite a few benchmarks from
MultiSource/SPEC2000/SPEC2006 on X86 with -O3 -flto, for example

Same hash: 186 (filtered out)
Remaining: 51
Metric: loop-vectorize.LoopsVectorized

Program                                        base   patch  diff
 test-suite...ve-susan/automotive-susan.test     8.00   9.00 12.5%
 test-suite...nal/skidmarks10/skidmarks.test    35.00  31.00 -11.4%
 test-suite...lications/sqlite3/sqlite3.test    41.00  43.00  4.9%
 test-suite...s/ASC_Sequoia/AMGmk/AMGmk.test    25.00  26.00  4.0%
 test-suite...006/450.soplex/450.soplex.test    88.00  89.00  1.1%
 test-suite...TimberWolfMC/timberwolfmc.test   120.00 119.00 -0.8%
 test-suite.../CINT2006/403.gcc/403.gcc.test   215.00 216.00  0.5%
 test-suite...006/447.dealII/447.dealII.test   957.00 958.00  0.1%
 test-suite...ternal/HMMER/hmmcalibrate.test    75.00  75.00  0.0%

Same hash: 186 (filtered out)
Remaining: 51
Metric: loop-vectorize.LoopsAnalyzed

Program                                        base    patch   diff
 test-suite...ks/Prolangs-C/agrep/agrep.test   440.00  434.00  -1.4%
 test-suite...nal/skidmarks10/skidmarks.test   312.00  308.00  -1.3%
 test-suite...marks/7zip/7zip-benchmark.test   6399.00 6323.00 -1.2%
 test-suite...lications/minisat/minisat.test   134.00  135.00   0.7%
 test-suite...rks/FreeBench/pifft/pifft.test   295.00  297.00   0.7%
 test-suite...TimberWolfMC/timberwolfmc.test   1879.00 1869.00 -0.5%
 test-suite...pplications/treecc/treecc.test   689.00  691.00   0.3%
 test-suite...T2000/300.twolf/300.twolf.test   1593.00 1597.00  0.3%
 test-suite.../Benchmarks/Bullet/bullet.test   1394.00 1392.00 -0.1%
 test-suite...ications/JM/ldecod/ldecod.test   1431.00 1429.00 -0.1%
 test-suite...6/464.h264ref/464.h264ref.test   2229.00 2230.00  0.0%
 test-suite...lications/sqlite3/sqlite3.test   2590.00 2589.00 -0.0%
 test-suite...ications/JM/lencod/lencod.test   2732.00 2733.00  0.0%
 test-suite...006/453.povray/453.povray.test   3395.00 3394.00 -0.0%

Note the -11% regression in number of loops vectorized for skidmarks. I
suspect this corresponds to the fact that those loops are gone now (see
the reduction in number of loops analyzed by LV).

Diff Detail

Unit TestsFailed

TimeTest
850 mslinux > libarcher.parallel::parallel-nosuppression.c
Script: -- : 'RUN: at line 15'; /mnt/disks/ssd0/agent/llvm-project/build/./bin/clang -fopenmp -pthread -fno-experimental-isel -g -O1 -fsanitize=thread -I /mnt/disks/ssd0/agent/llvm-project/openmp/tools/archer/tests -I /mnt/disks/ssd0/agent/llvm-project/build/projects/openmp/runtime/src -L /mnt/disks/ssd0/agent/llvm-project/build/lib -Wl,-rpath,/mnt/disks/ssd0/agent/llvm-project/build/lib /mnt/disks/ssd0/agent/llvm-project/openmp/tools/archer/tests/parallel/parallel-nosuppression.c -o /mnt/disks/ssd0/agent/llvm-project/build/projects/openmp/tools/archer/tests/parallel/Output/parallel-nosuppression.c.tmp -latomic && env TSAN_OPTIONS='ignore_noninstrumented_modules=0' /mnt/disks/ssd0/agent/llvm-project/build/projects/openmp/tools/archer/tests/parallel/Output/parallel-nosuppression.c.tmp 2>&1 | tee /mnt/disks/ssd0/agent/llvm-project/build/projects/openmp/tools/archer/tests/parallel/Output/parallel-nosuppression.c.tmp.log | /mnt/disks/ssd0/agent/llvm-project/build/./bin/FileCheck /mnt/disks/ssd0/agent/llvm-project/openmp/tools/archer/tests/parallel/parallel-nosuppression.c

Event Timeline

fhahn created this revision.Sep 29 2020, 2:18 AM
Herald added a project: Restricted Project. · View Herald Transcript
fhahn requested review of this revision.Sep 29 2020, 2:18 AM

The patch doesn't seem to apply for me:

llvm-project$ arc patch --nobranch D88471 --skip-dependencies
You have untracked files in this working copy.

  Working copy: /repositories/llvm-project/

  Untracked changes in working copy:
  (To ignore these 4 change(s), add them to ".git/info/exclude".)
    llvm/CMakeLists.txt.user.4.9-pre1
    llvm/CMakeLists.txt.user.old
    llvm/lib/Transforms/IPO/PassManagerBuilder.cpp.rej
    llvm/utils/compare-stats.py

    Ignore these 4 untracked file(s) and continue? [y/N] y

 INFO  Base commit is not in local repository; trying to fetch.


    This diff is against commit 6c1ce7e277624882c6139b978f90a489d638aff8, but
    the commit is nowhere in the working copy. Try to apply it against the
    current working copy state? (0be51c09131f92651e5bad9ca0e3fb8fdc37d84a)
    [Y/n] y

Checking patch llvm/test/Transforms/PhaseOrdering/X86/peel-before-lv-to-enable-vectorization.ll...
error: llvm/test/Transforms/PhaseOrdering/X86/peel-before-lv-to-enable-vectorization.ll: does not exist in index
Checking patch llvm/lib/Transforms/Scalar/LoopUnrollPass.cpp...
Checking patch llvm/lib/Transforms/IPO/PassManagerBuilder.cpp...
error: while searching for:
  if (EnableLoopInterchange)
    MPM.add(createLoopInterchangePass()); // Interchange loops

  // Unroll small loops
  MPM.add(createSimpleLoopUnrollPass(OptLevel, DisableUnrollLoops,
                                     ForgetAllSCEVInLoopUnroll));
  addExtensionsToPM(EP_LoopOptimizerEnd, MPM);

error: patch failed: llvm/lib/Transforms/IPO/PassManagerBuilder.cpp:445
error: while searching for:
  if (EnableLoopInterchange)
    PM.add(createLoopInterchangePass());

  // Unroll small loops
  PM.add(createSimpleLoopUnrollPass(OptLevel, DisableUnrollLoops,
                                    ForgetAllSCEVInLoopUnroll));
  PM.add(createLoopVectorizePass(true, !LoopVectorize));

error: patch failed: llvm/lib/Transforms/IPO/PassManagerBuilder.cpp:1036
Checking patch llvm/include/llvm/Transforms/Scalar/LoopUnrollPass.h...
Checking patch llvm/include/llvm/Transforms/Scalar.h...
Hunk #1 succeeded at 184 (offset 6 lines).
Applied patch llvm/lib/Transforms/Scalar/LoopUnrollPass.cpp cleanly.
Applying patch llvm/lib/Transforms/IPO/PassManagerBuilder.cpp with 2 rejects...
Rejected hunk #1.
Rejected hunk #2.
Applied patch llvm/include/llvm/Transforms/Scalar/LoopUnrollPass.h cleanly.
Applied patch llvm/include/llvm/Transforms/Scalar.h cleanly.

 Patch Failed! 
Usage Exception: Unable to apply patch!
fhahn updated this revision to Diff 295788.Oct 2 2020, 4:30 AM

The patch doesn't seem to apply for me:

Rebased, should apply now.

In terms of runtime impact, I measured the following impact on SPEC2006 on ARM64 with -O3 -flto, outside the noise:

test-suite...006/450.soplex/450.soplex.test   -2.4%
test-suite...6/464.h264ref/464.h264ref.test    1.3%
test-suite...T2006/473.astar/473.astar.test   -0.9%

Presumably the h264ref regression is related to wrong decisions in the cost modeling and probably warrants some further analysis.

The patch doesn't seem to apply for me:

Rebased, should apply now.

It does, thank you.

In terms of runtime impact, I measured the following impact on SPEC2006 on ARM64 with -O3 -flto, outside the noise:

test-suite...006/450.soplex/450.soplex.test   -2.4%
test-suite...6/464.h264ref/464.h264ref.test    1.3%
test-suite...T2006/473.astar/473.astar.test   -0.9%

Presumably the h264ref regression is related to wrong decisions in the cost modeling and probably warrants some further analysis.

lebedev.ri accepted this revision.Oct 2 2020, 6:18 AM

This looks good to me FWIW, on the code i have checked this on,
this is a win, causing small perf improvement and apparently no regressions.

Not sure if other reviewers have other comments.

This revision is now accepted and ready to land.Oct 2 2020, 6:18 AM