This is an archive of the discontinued LLVM Phabricator instance.

[GVNHoist] Move GVNHoist to function simplification part of pipeline.
ClosedPublic

Authored by gberry on Dec 13 2016, 11:19 AM.

Details

Summary

Move GVNHoist to later in the optimization pipeline, specifically, to
the function simplification part of the pipeline. The new pipeline
location allows GVNHoist to run on a function after its callees have
been inlined but before the function has been considered for inlining
into its callers, exposing more opportunities for hoisting.

Performance results on AArch64 kryo:
Improvements:

Benchmarks/CoyoteBench/fftbench  -24.952%
spec2006/bzip2                    -4.071%
internal bmark                    -3.177%
Benchmarks/PAQ8p/paq8p            -1.754%
spec2000/perlbmk                  -1.328%
spec2006/h264ref                  -1.140%

Regressions:

internal bmark                    +1.818%
Benchmarks/mafft/pairlocalalign   +1.084%

Diff Detail

Repository
rL LLVM

Event Timeline

gberry updated this revision to Diff 81261.Dec 13 2016, 11:19 AM
gberry retitled this revision from to [GVNHoist] Move GVNHoist to function simplification part of pipeline..
gberry updated this object.
gberry added reviewers: sebpop, dberlin, hiraditya.
gberry added a subscriber: llvm-commits.
hfinkel accepted this revision.Dec 13 2016, 11:38 AM
hfinkel added a reviewer: hfinkel.
hfinkel added a subscriber: hfinkel.

Improvements:

Nice! I'm surprised it was not there before; it is not an early-cleanup pass. Putting it right after EarlyCSE should allow it to share a MemorySSA computation with EarlyCSE (at least when we switch it over to using MSSA). LGTM.

This revision is now accepted and ready to land.Dec 13 2016, 11:38 AM
sebpop accepted this revision.Dec 13 2016, 11:45 AM
sebpop edited edge metadata.

Overall LGTM.
In the past I have seen some improvements to function inlining due to hoisting happening before inlining.
Also there may be some sinking (in cfg-simplify) happening before we have a chance to hoist expressions.
Let's commit this, and I will report if I see perf degradations, in which case we may as well run hoisting before and after inlining.

Thanks for your patch!

Improvements:

Nice! I'm surprised it was not there before; it is not an early-cleanup pass. Putting it right after EarlyCSE should allow it to share a MemorySSA computation with EarlyCSE (at least when we switch it over to using MSSA). LGTM.

Yeah, that (amortizing MemorySSA for EarlyCSE) was actually my main motivation for doing this.

Improvements:

Nice! I'm surprised it was not there before; it is not an early-cleanup pass. Putting it right after EarlyCSE should allow it to share a MemorySSA computation with EarlyCSE (at least when we switch it over to using MSSA). LGTM.

Improvements:

Nice! I'm surprised it was not there before; it is not an early-cleanup pass. Putting it right after EarlyCSE should allow it to share a MemorySSA computation with EarlyCSE (at least when we switch it over to using MSSA). LGTM.

sebpop added inline comments.Dec 13 2016, 12:09 PM
lib/Transforms/IPO/PassManagerBuilder.cpp
253 ↗(On Diff #81261)

Could you please try to see the perf of keeping the hoist pass here and adding an extra one below inline as you did?

gberry added inline comments.Dec 14 2016, 10:37 AM
lib/Transforms/IPO/PassManagerBuilder.cpp
253 ↗(On Diff #81261)

I tried this and the results were very few tests changed, and small net change (positive for one suite, negative for another), leading me to believe that running GVNHoist twice isn't worth the extra compile time. Below are the percent differences in run time between later GVNHoist (this patch) and both early and later GVNHoist.

`TSVC/InductionVariable-flt -5.168%
spec2000/crafty -1.722%
spec2006/mcf +1.130%
Applications/sqlite3 +1.322%
TSVC/ControlFlow-dbl +3.578%`

This revision was automatically updated to reflect the committed changes.
eugenis edited edge metadata.Dec 14 2016, 4:38 PM
eugenis added subscribers: dvyukov, kcc.
eugenis added a subscriber: eugenis.

FYI, this has regressed ThreadSanitizer runtime library codegen and broke this bot:
http://lab.llvm.org:8011/builders/sanitizer-x86_64-linux-autoconf/builds/2442

One of the performance-critical functions have gone up from 5 spills to 6 (x86_64).

I'm working on a small test case.