Page MenuHomePhabricator

[Passes] Add extra LoopSimplifyCFG run after IndVarSimplify.
Changes PlannedPublic

Authored by fhahn on Apr 19 2021, 11:38 AM.

Details

Summary

IndVarSimplify may simplify branch conditions for blocks in loops, but
does not simplify conditional to unconditional branches.

This can pessimize later passes, like loop-delete, which check for
certain conditions across all blocks in a loop. This results in cases
where we fail to delete or vectorize loops, e.g. because they contain
dynamically dead blocks with calls with side effects.

Fixes #29339 (improved optimization).

It also can improve compile-time drastically by cleaning up blocks and branches after IndVarSimplify.

Fixes #49675 (slow compile time)
Fixes #51264 (slow compile time)
Fixes #51295 (slow compile time)

Compile-time impact:

  • O3 geomean: +0.04%
  • O3 ThinLTO: +0.09%
  • O3 LTO : +0.19%

https://llvm-compile-time-tracker.com/compare.php?from=ead7a5fc04360c1d740214048b15c4d7100dc768&to=e12d7d188de97efd13f36fa8fc8bf5f4298d252f&stat=instructions

This shakes up quite a few things, with quite a few stats changes across
MultiSource/SPEC2000/SPEC2006 on X86 with -flto -O3

e.g.

Same hash: 203 (filtered out)
Remaining: 34
Metric: loop-vectorize.LoopsVectorized

Program                                        base   patch  diff
 test-suite...6/464.h264ref/464.h264ref.test   156.00 159.00  1.9%
 test-suite...marks/7zip/7zip-benchmark.test   336.00 341.00  1.5%
 test-suite...T2006/445.gobmk/445.gobmk.test    90.00  90.00  0.0%
 test-suite...ProxyApps-C++/CLAMR/CLAMR.test   196.00 196.00  0.0%

 Same hash: 203 (filtered out)
Remaining: 34
Metric: scalar-evolution.NumTripCountsComputed

Program                                        base     patch    diff
 test-suite...nch/fourinarow/fourinarow.test    64.00    81.00   26.6%
 test-suite...6/464.h264ref/464.h264ref.test   5330.00  5969.00  12.0%
 test-suite...cations/hexxagon/hexxagon.test    77.00    84.00    9.1%
 test-suite...ications/JM/lencod/lencod.test   5227.00  5660.00   8.3%
 test-suite...ications/JM/ldecod/ldecod.test   2396.00  2592.00   8.2%
 test-suite.../Prolangs-C/loader/loader.test    13.00    14.00    7.7%
 test-suite...nal/skidmarks10/skidmarks.test   933.00   993.00    6.4%
 test-suite...lications/viterbi/viterbi.test    49.00    52.00    6.1%
 test-suite...nsumer-lame/consumer-lame.test   1715.00  1779.00   3.7%
 test-suite...T2006/445.gobmk/445.gobmk.test   2611.00  2687.00   2.9%
 test-suite...marks/7zip/7zip-benchmark.test   4512.00  4642.00   2.9%
 test-suite...CI_Purple/SMG2000/smg2000.test   2007.00  2057.00   2.5%
 test-suite...CFP2000/188.ammp/188.ammp.test   741.00   759.00    2.4%
 test-suite...006/453.povray/453.povray.test   2504.00  2560.00   2.2%
 test-suite...:: External/Povray/povray.test   2583.00  2633.00   1.9%
 test-suite.../CINT2000/176.gcc/176.gcc.test   2602.00  2650.00   1.8%
 test-suite...abench/jpeg/jpeg-6a/cjpeg.test   1307.00  1326.00   1.5%
 test-suite...nsumer-jpeg/consumer-jpeg.test   1332.00  1351.00   1.4%
 test-suite...TimberWolfMC/timberwolfmc.test   1283.00  1299.00   1.2%
 test-suite...006/447.dealII/447.dealII.test   22333.00 22579.00  1.1%
 test-suite...T2006/456.hmmer/456.hmmer.test   1671.00  1688.00   1.0%
 test-suite...ternal/HMMER/hmmcalibrate.test   1672.00  1689.00   1.0%
 test-suite...peg2/mpeg2dec/mpeg2decode.test   497.00   502.00    1.0%

Diff Detail

Unit TestsFailed

TimeTest
60,080 msx64 debian > AddressSanitizer-x86_64-linux.TestCases::scariness_score_test.cpp
Script: -- : 'RUN: at line 4'; /var/lib/buildkite-agent/builds/llvm-project/build/./bin/clang --driver-mode=g++ -fsanitize=address -mno-omit-leaf-frame-pointer -fno-omit-frame-pointer -fno-optimize-sibling-calls -gline-tables-only -m64 -O0 /var/lib/buildkite-agent/builds/llvm-project/compiler-rt/test/asan/TestCases/scariness_score_test.cpp -o /var/lib/buildkite-agent/builds/llvm-project/build/projects/compiler-rt/test/asan/X86_64LinuxConfig/TestCases/Output/scariness_score_test.cpp.tmp
730 msx64 debian > HWAddressSanitizer-x86_64.TestCases::hwasan_symbolize.cpp
Script: -- : 'RUN: at line 1'; /var/lib/buildkite-agent/builds/llvm-project/build/./bin/clang -m64 -gline-tables-only -fsanitize=hwaddress -fuse-ld=lld -fsanitize-hwaddress-experimental-aliasing -mllvm -hwasan-generate-tags-with-calls=1 -mllvm -hwasan-globals -mllvm -hwasan-use-short-granules -mllvm -hwasan-instrument-landing-pads=0 -mllvm -hwasan-instrument-personality-functions -Wl,--build-id -g /var/lib/buildkite-agent/builds/llvm-project/compiler-rt/test/hwasan/TestCases/hwasan_symbolize.cpp -o /var/lib/buildkite-agent/builds/llvm-project/build/projects/compiler-rt/test/hwasan/X86_64/TestCases/Output/hwasan_symbolize.cpp.tmp

Event Timeline

fhahn created this revision.Apr 19 2021, 11:38 AM
fhahn requested review of this revision.Apr 19 2021, 11:38 AM
Herald added a project: Restricted Project. · View Herald TranscriptApr 19 2021, 11:38 AM
fhahn edited the summary of this revision. (Show Details)May 10 2022, 9:25 AM
Herald added a project: Restricted Project. · View Herald TranscriptMay 10 2022, 9:25 AM
Herald added a subscriber: ctetreau. · View Herald Transcript
fhahn edited the summary of this revision. (Show Details)May 10 2022, 9:35 AM
fhahn retitled this revision from [Passes] Add extra LoopSimplifyCFG run after IndVarSimplify (WIP). to [Passes] Add extra LoopSimplifyCFG run after IndVarSimplify..May 10 2022, 9:43 AM
fhahn added reviewers: reames, aeubanks, nikic, asbirlea.
fhahn updated this revision to Diff 428412.May 10 2022, 9:46 AM

Fix failing tests.

This patch increases compile-time overall a bit (details in description). The impact could likely be even small if we only run LoopSimplifyCFG only on loops that indvars changed. This could be done by either exposing a simplifyLoopCFG helper function and use it in indvars or adding a way to mark loops for which LoopSimplifyCFG should run (similar in spirit to D115052).

I'm curious what people think :)

nikic added a comment.May 10 2022, 9:51 AM

Would it be possible to add a PhaseOrdering test that shows an optimization improvement (e.g. the #29339 one)?

Would it be possible to add a PhaseOrdering test that shows an optimization improvement (e.g. the #29339 one)?

I just tried, but it looks like this particular issue has been fixed in 14.0 and it looks like now SimplifyCFG is able to simplify the branches late. I *could* try to extract a phase ordering test for one of the cases where we now vectorize?

This could be done by either exposing a simplifyLoopCFG helper function ...

This variant sounds better, if possible.

This could be done by either exposing a simplifyLoopCFG helper function ...

This variant sounds better, if possible.

+1, at least for this case where it's a simple cleanup

in other cases, such as D125293 and D115052 where we want to do more general cleanup due to added/lowered code, I believe @asbirlea has future plans to add new PM infra to conditionally run passes for simplification pipeline work

This could be done by either exposing a simplifyLoopCFG helper function ...

This variant sounds better, if possible.

+1, at least for this case where it's a simple cleanup

in other cases, such as D125293 and D115052 where we want to do more general cleanup due to added/lowered code, I believe @asbirlea has future plans to add new PM infra to conditionally run passes for simplification pipeline work

Sorry for the late reply here. For this specific case, always running LoopSimplifyCFGPass adds some compile-time and it seems unnecessary to add a hack for running it conditionally on the previous pass. So exposing a helper to be called directly from IndVarSimplifyPass makes more sense.

For additional context, I've been looking into the motivation for adding the enable-no-rerun-simplification-pipeline flag ( D113947) and what it would take to turn it on. Related to this, there are multiple cases of phase ordering issues that may need passes to be conditionally run or rerun, so I'm looking at a longer term design for the pass pipeline.

fhahn planned changes to this revision.May 13 2022, 10:03 AM

I'll update the patch.

At the moment, there appears to be an unfortunate side effect of running loop-simplify-cfg: it invalidates SCEVs for a simplified loop, which can pessimize SCEV trip count computations, e.g. because IndVarSimplify re-writes the bounds so that they are harder for SCEV to handle. I think before proceeding here we would need to make sure SCEV handles the re-written bounds as well as possible.