This is an archive of the discontinued LLVM Phabricator instance.

[GVNHoist] Move GVNHoist to function simplification part of pipeline.
ClosedPublic

Authored by gberry on Dec 13 2016, 11:19 AM.

Download Raw Diff

Details

Reviewers

sebpop
hiraditya
• dberlin
hfinkel

Commits

rGca11a1e14769: [GVNHoist] Move GVNHoist to function simplification part of pipeline.
rL289696: [GVNHoist] Move GVNHoist to function simplification part of pipeline.

Summary

Move GVNHoist to later in the optimization pipeline, specifically, to
the function simplification part of the pipeline. The new pipeline
location allows GVNHoist to run on a function after its callees have
been inlined but before the function has been considered for inlining
into its callers, exposing more opportunities for hoisting.

Performance results on AArch64 kryo:
Improvements:

Benchmarks/CoyoteBench/fftbench  -24.952%
spec2006/bzip2                    -4.071%
internal bmark                    -3.177%
Benchmarks/PAQ8p/paq8p            -1.754%
spec2000/perlbmk                  -1.328%
spec2006/h264ref                  -1.140%

Regressions:

internal bmark                    +1.818%
Benchmarks/mafft/pairlocalalign   +1.084%

Diff Detail

Repository: rL LLVM

Event Timeline

gberry updated this revision to Diff 81261.Dec 13 2016, 11:19 AM

gberry retitled this revision from to [GVNHoist] Move GVNHoist to function simplification part of pipeline..

gberry updated this object.

gberry added reviewers: sebpop, • dberlin, hiraditya.

gberry added a subscriber: llvm-commits.

Herald added subscribers: mcrosier, mehdi_amini, aemerson. · View Herald TranscriptDec 13 2016, 11:19 AM

Improvements:

Nice! I'm surprised it was not there before; it is not an early-cleanup pass. Putting it right after EarlyCSE should allow it to share a MemorySSA computation with EarlyCSE (at least when we switch it over to using MSSA). LGTM.

This revision is now accepted and ready to land.Dec 13 2016, 11:38 AM

Overall LGTM.
In the past I have seen some improvements to function inlining due to hoisting happening before inlining.
Also there may be some sinking (in cfg-simplify) happening before we have a chance to hoist expressions.
Let's commit this, and I will report if I see perf degradations, in which case we may as well run hoisting before and after inlining.

Thanks for your patch!

In D27722#621354, @hfinkel wrote:

Improvements:

Nice! I'm surprised it was not there before; it is not an early-cleanup pass. Putting it right after EarlyCSE should allow it to share a MemorySSA computation with EarlyCSE (at least when we switch it over to using MSSA). LGTM.

Yeah, that (amortizing MemorySSA for EarlyCSE) was actually my main motivation for doing this.

In D27722#621354, @hfinkel wrote:

Improvements:

Nice! I'm surprised it was not there before; it is not an early-cleanup pass. Putting it right after EarlyCSE should allow it to share a MemorySSA computation with EarlyCSE (at least when we switch it over to using MSSA). LGTM.

sebpop added inline comments.Dec 13 2016, 12:09 PM

lib/Transforms/IPO/PassManagerBuilder.cpp
253 ↗	(On Diff #81261)	Could you please try to see the perf of keeping the hoist pass here and adding an extra one below inline as you did?

gberry added inline comments.Dec 14 2016, 10:37 AM

lib/Transforms/IPO/PassManagerBuilder.cpp
253 ↗	(On Diff #81261)	I tried this and the results were very few tests changed, and small net change (positive for one suite, negative for another), leading me to believe that running GVNHoist twice isn't worth the extra compile time. Below are the percent differences in run time between later GVNHoist (this patch) and both early and later GVNHoist. `TSVC/InductionVariable-flt -5.168% spec2000/crafty -1.722% spec2006/mcf +1.130% Applications/sqlite3 +1.322% TSVC/ControlFlow-dbl +3.578%`

Closed by commit rL289696: [GVNHoist] Move GVNHoist to function simplification part of pipeline. (authored by gberry). · Explain WhyDec 14 2016, 11:48 AM

This revision was automatically updated to reflect the committed changes.

FYI, this has regressed ThreadSanitizer runtime library codegen and broke this bot:
http://lab.llvm.org:8011/builders/sanitizer-x86_64-linux-autoconf/builds/2442

One of the performance-critical functions have gone up from 5 spills to 6 (x86_64).

I'm working on a small test case.

Filed https://llvm.org/bugs/show_bug.cgi?id=31382

gberry mentioned this in D28380: [SelectionDAG] Handle inverted conditions when splitting into multiple branches..Jan 12 2017, 3:00 PM

Revision Contents

Path

Size

llvm/

trunk/

lib/

Transforms/

IPO/

PassManagerBuilder.cpp

4 lines

test/

Transforms/

GVNHoist/

hoist-inline.ll

38 lines

Diff 81432

llvm/trunk/lib/Transforms/IPO/PassManagerBuilder.cpp

Show First 20 Lines • Show All 243 Lines • ▼ Show 20 Lines	void PassManagerBuilder::populateFunctionPassManager(

if (OptLevel == 0) return;		if (OptLevel == 0) return;

addInitialAliasAnalysisPasses(FPM);		addInitialAliasAnalysisPasses(FPM);

FPM.add(createCFGSimplificationPass());		FPM.add(createCFGSimplificationPass());
FPM.add(createSROAPass());		FPM.add(createSROAPass());
FPM.add(createEarlyCSEPass());		FPM.add(createEarlyCSEPass());
if(EnableGVNHoist)
FPM.add(createGVNHoistPass());
FPM.add(createLowerExpectIntrinsicPass());		FPM.add(createLowerExpectIntrinsicPass());
}		}

// Do PGO instrumentation generation or use pass as the option specified.		// Do PGO instrumentation generation or use pass as the option specified.
void PassManagerBuilder::addPGOInstrPasses(legacy::PassManagerBase &MPM) {		void PassManagerBuilder::addPGOInstrPasses(legacy::PassManagerBase &MPM) {
if (!EnablePGOInstrGen && PGOInstrUse.empty())		if (!EnablePGOInstrGen && PGOInstrUse.empty())
return;		return;
// Perform the preinline and cleanup passes for O1 and above.		// Perform the preinline and cleanup passes for O1 and above.
Show All 28 Lines	if (!PGOInstrUse.empty())
MPM.add(createPGOInstrumentationUseLegacyPass(PGOInstrUse));		MPM.add(createPGOInstrumentationUseLegacyPass(PGOInstrUse));
}		}
void PassManagerBuilder::addFunctionSimplificationPasses(		void PassManagerBuilder::addFunctionSimplificationPasses(
legacy::PassManagerBase &MPM) {		legacy::PassManagerBase &MPM) {
// Start of function pass.		// Start of function pass.
// Break up aggregate allocas, using SSAUpdater.		// Break up aggregate allocas, using SSAUpdater.
MPM.add(createSROAPass());		MPM.add(createSROAPass());
MPM.add(createEarlyCSEPass()); // Catch trivial redundancies		MPM.add(createEarlyCSEPass()); // Catch trivial redundancies
		if(EnableGVNHoist)
		MPM.add(createGVNHoistPass());
// Speculative execution if the target has divergent branches; otherwise nop.		// Speculative execution if the target has divergent branches; otherwise nop.
MPM.add(createSpeculativeExecutionIfHasBranchDivergencePass());		MPM.add(createSpeculativeExecutionIfHasBranchDivergencePass());
MPM.add(createJumpThreadingPass()); // Thread jumps.		MPM.add(createJumpThreadingPass()); // Thread jumps.
MPM.add(createCorrelatedValuePropagationPass()); // Propagate conditionals		MPM.add(createCorrelatedValuePropagationPass()); // Propagate conditionals
MPM.add(createCFGSimplificationPass()); // Merge & remove BBs		MPM.add(createCFGSimplificationPass()); // Merge & remove BBs
// Combine silly seq's		// Combine silly seq's
addInstructionCombiningPass(MPM);		addInstructionCombiningPass(MPM);
if (SizeLevel == 0 && !DisableLibCallsShrinkWrap)		if (SizeLevel == 0 && !DisableLibCallsShrinkWrap)
▲ Show 20 Lines • Show All 649 Lines • Show Last 20 Lines

llvm/trunk/test/Transforms/GVNHoist/hoist-inline.ll

				; RUN: opt -S -O2 < %s \| FileCheck %s

				; Check that the inlined loads are hoisted.
				; CHECK-LABEL: define i32 @fun(
				; CHECK-LABEL: entry:
				; CHECK: load i32, i32* @A
				; CHECK: if.then:

				@A = external global i32
				@B = external global i32
				@C = external global i32

				define i32 @loadA() {
				%a = load i32, i32* @A
				ret i32 %a
				}

				define i32 @fun(i1 %c) {
				entry:
				br i1 %c, label %if.then, label %if.else

				if.then:
				store i32 1, i32* @B
				%call1 = call i32 @loadA()
				store i32 2, i32* @C
				br label %if.endif

				if.else:
				store i32 2, i32* @C
				%call2 = call i32 @loadA()
				store i32 1, i32* @B
				br label %if.endif

				if.endif:
				%ret = phi i32 [ %call1, %if.then ], [ %call2, %if.else ]
				ret i32 %ret
				}