This is an archive of the discontinued LLVM Phabricator instance.

Split SimplifyCFG to run obscuring switch transforms only during last phase
ClosedPublic

Authored by joerg on Feb 24 2017, 5:17 AM.

Details

Summary

The SimplifyCFG pass currently contains two transformations for switches that can obscure flow and inhibit other optimisations:

  1. switch of constants is replaced with look-up table
  2. sparse switches are replaced with transformation of the condition to make them dense

Most importantly, both transformations can benefit from DCE as a result of CVP and inlining. At the same time, they can inhibit inlining when run before the corresponding module pass as part of the early function cleanup. Neither transform looks like it is very useful early on, so split the existing simplifycfg pass into two and replace the last instance in the normal pass pipeline with the latesimplifycfg pass.

Initial testing of a version covering only point (1) of the list above fixes the regression in PR 32003. A LNT run by Mehdi Amini of the same versions shows:
MultiSource/Benchmarks/ASC_Sequoia/CrystalMk/CrystalMk 2.5% slower
SingleSource/Benchmarks/Misc/matmul_f64_4x4 34.2% faster
SingleSource/Benchmarks/Misc/fp-convert 3.0% faster
MultiSource/Benchmarks/Prolangs-C++/life/life 2.8% faster

Diff Detail

Event Timeline

joerg created this revision.Feb 24 2017, 5:17 AM
efriedma added inline comments.
lib/Transforms/IPO/PassManagerBuilder.cpp
377

This is still interleaved with inlining; is that intentional?

joerg added a comment.Feb 24 2017, 1:52 PM

LNT run with this version shows a consistent 1%-2% improvement in both compile and execution time for many instances. One execution time regression remains.

joerg added inline comments.Mar 4 2017, 5:33 AM
lib/Transforms/IPO/PassManagerBuilder.cpp
377

Inlining is done in line 398 or 476, this function is called from 485. Is my understanding of the pass ordering wrong?
The intention is definitely to defer the switch transforms until all inlining is done.

The inliner is interleaved with the passes from addFunctionSimplificationPasses: first we simplify a leaf function, then we inline that function into another function, then we simplify the caller, etc. If you want to see this in action, try looking at the output of echo "int a(); int b() { return a(); } int c() { return b(); }" | clang -x c - -o - -S -O2 -mllvm -print-after-all |& less.

mehdi_amini edited edge metadata.Mar 6 2017, 5:20 PM

My favorite way is: echo "" | clang -x c - -c -mllvm -debug-pass=Structure -O2

Profile summary info
  ModulePass Manager
    [...]
    FunctionPass Manager
      [...]
      Simplify the CFG
    [...]
    Call Graph SCC Pass Manager
      Remove unused exception handling info
      Function Integration/Inlining
      Deduce function attributes
      FunctionPass Manager
        [...]
        Simplify the CFG
        [...]
        Simplify the CFG

The indentation shows the nesting, and the Call Graph SCC Pass Manager will schedule whatever is nested under it to run first on a callee before moving to the caller (SCC wise).

joerg updated this revision to Diff 91266.Mar 9 2017, 10:12 PM

Move the full version out of the function simplification set into the late optimisation steps after all function inlining is done.

mehdi_amini added inline comments.Mar 9 2017, 10:17 PM
lib/Transforms/IPO/PassManagerBuilder.cpp
620

The comment is misleading, it seems that the only reason we run the pass here is for this purpose.

lib/Transforms/Scalar/SimplifyCFGPass.cpp
256

Why two passes instead of a parameter?

lib/Transforms/Utils/SimplifyCFG.cpp
173

Document.

5568

Comment.

6024

Document the parameter in the doxygen.

mehdi_amini added inline comments.Mar 9 2017, 10:44 PM
lib/Transforms/IPO/PassManagerBuilder.cpp
590

What about this one?

joerg marked 2 inline comments as done.Mar 9 2017, 11:01 PM
joerg added inline comments.
lib/Transforms/IPO/PassManagerBuilder.cpp
590

I don't have a strong opinion on whether it should be done before or after SLP handling. I don't think the code is likely to be vectorised.

620

Well, we also want to do the usual transformations again after loop unrolling. But at this point, we are also commiting to the the lookup table conversions.

lib/Transforms/Scalar/SimplifyCFGPass.cpp
256

Two passes makes it easier to test it on the command line.

jmolloy edited edge metadata.Mar 10 2017, 2:17 AM

Looks good on my side, but please wait for Mehdi.

mehdi_amini added inline comments.Mar 10 2017, 7:41 AM
lib/Transforms/Scalar/SimplifyCFGPass.cpp
256

I can understand this aspect, but on the other hand that does not scale with adding more options, which is why we rarely do this. What other passes have the same approach?

joerg added inline comments.Mar 10 2017, 10:34 PM
lib/Transforms/Scalar/SimplifyCFGPass.cpp
256

I don't think we have a good example where parts of a pass are disabled. I could split them off into a separate pass, but that seems to be even worse. Normally we only have adjustable limits as configuration of passes.

joerg updated this revision to Diff 91455.Mar 10 2017, 10:36 PM

Add comments on why the split pass is useful.

mehdi_amini accepted this revision.Mar 14 2017, 5:06 PM

LGTM.

lib/Transforms/Scalar/SimplifyCFGPass.cpp
256

Normally we only have adjustable limits as configuration of passes ; I don't understand what you meant here?

What I was referring to was things like createFunctionInliningPass(Threshold);, createLoopRotatePass(SizeLevel == 2 ? 0 : -1), createSimpleLoopUnrollPass(OptLevel), and createLoopVectorizePass(true, LoopVectorize).

This revision is now accepted and ready to land.Mar 14 2017, 5:06 PM
This revision was automatically updated to reflect the committed changes.