This is an archive of the discontinued LLVM Phabricator instance.

Paths

Table of Contentst

-
lib/Transforms/IPO/
-
Transforms/
-
IPO/
2
PassManagerBuilder.cpp

Differential D15995

[LTO] Add a run of LoopUnroll
ClosedPublic

Authored by jmolloy on Jan 8 2016, 7:29 AM.

Download Raw Diff

Details

Reviewers

jmolloy
mehdi_amini

Summary

Loop trip counts can often be resolved during LTO. We should obviously be unrolling small loops once those trip counts have been resolved, but we weren't.

This causes no change in a -flto run of the test-suite for me, but the nature of fully unrolling loops is that when they trigger, they cause massive changes in runtime. We observe this on third party test suites.

Diff Detail

Repository: rL LLVM

Event Timeline

jmolloy updated this revision to Diff 44329.Jan 8 2016, 7:29 AM

jmolloy retitled this revision from to [LTO] Add a run of LoopUnroll.

jmolloy updated this object.

jmolloy added a reviewer: mehdi_amini.

jmolloy set the repository for this revision to rL LLVM.

jmolloy added a subscriber: llvm-commits.

Herald added a subscriber: mehdi_amini. · View Herald TranscriptJan 8 2016, 7:29 AM

mehdi_amini added inline comments.Jan 8 2016, 8:33 AM

lib/Transforms/IPO/PassManagerBuilder.cpp
576	Note: it is inserted before LoopInterchangePass in the regular pipeline? Also there is a comment before another insertion `// BBVectorize may have significantly shortened a loop body; unroll again.`. Could this be valid for the Loop and SLP vectorizers that are inserted below?

hfinkel added a subscriber: hfinkel.Jan 8 2016, 8:45 AM

hfinkel added inline comments.

lib/Transforms/IPO/PassManagerBuilder.cpp
576	Also there is a comment before another insertion // BBVectorize may have significantly shortened a loop body; unroll again.. Could this be valid for the Loop and SLP vectorizers that are inserted below? Yes, although for the loop vectorizer there is a competing effect: The loop vectorizer does not vectorize loops with known constant trip counts below some threshold (TinyTripCountVectorThreshold - 16 by default).

Hi Mehdi,

Note: it is inserted before LoopInterchangePass in the regular pipeline?

In the regular pipeline there are many runs of loop unrolling. This extra unroll will only unroll loops completely - it'll never do partial unrolling. So the ideal place to do this is when loops are most countable and when trivial loops have been deleted. That's why I put this after LoopDeletion.

I agree that a cleanup Unroll after Vectorize seems a good idea. I'll add that.

James

jmolloy updated this revision to Diff 44626.Jan 12 2016, 5:44 AM

Mehdi accepted this on IRC with the condition that the first unroll invocation was swapped with LoopInterchange to be identical to the non-LTO pipeline.

This revision is now accepted and ready to land.Jan 14 2016, 6:58 AM

r257767

Revision Contents

Path

Size

lib/

Transforms/

IPO/

PassManagerBuilder.cpp

5 lines

Diff 44626

lib/Transforms/IPO/PassManagerBuilder.cpp

Show First 20 Lines • Show All 564 Lines • ▼ Show 20 Lines	void PassManagerBuilder::addLTOOptimizationPasses(legacy::PassManagerBase &PM) {
PM.add(createMemCpyOptPass()); // Remove dead memcpys.		PM.add(createMemCpyOptPass()); // Remove dead memcpys.

// Nuke dead stores.		// Nuke dead stores.
PM.add(createDeadStoreEliminationPass());		PM.add(createDeadStoreEliminationPass());

// More loops are countable; try to optimize them.		// More loops are countable; try to optimize them.
PM.add(createIndVarSimplifyPass());		PM.add(createIndVarSimplifyPass());
PM.add(createLoopDeletionPass());		PM.add(createLoopDeletionPass());
		if (!DisableUnrollLoops)
		PM.add(createSimpleLoopUnrollPass()); // Unroll small loops
if (EnableLoopInterchange)		if (EnableLoopInterchange)
PM.add(createLoopInterchangePass());		PM.add(createLoopInterchangePass());
		mehdi_aminiUnsubmitted Not Done Reply Inline Actions Note: it is inserted before LoopInterchangePass in the regular pipeline? Also there is a comment before another insertion `// BBVectorize may have significantly shortened a loop body; unroll again.`. Could this be valid for the Loop and SLP vectorizers that are inserted below? mehdi_amini: Note: it is inserted before LoopInterchangePass in the regular pipeline? Also there is a…
		hfinkelUnsubmitted Not Done Reply Inline Actions Also there is a comment before another insertion // BBVectorize may have significantly shortened a loop body; unroll again.. Could this be valid for the Loop and SLP vectorizers that are inserted below? Yes, although for the loop vectorizer there is a competing effect: The loop vectorizer does not vectorize loops with known constant trip counts below some threshold (TinyTripCountVectorThreshold - 16 by default). hfinkel: > Also there is a comment before another insertion // BBVectorize may have significantly…

PM.add(createLoopVectorizePass(true, LoopVectorize));		PM.add(createLoopVectorizePass(true, LoopVectorize));
		// The vectorizer may have significantly shortened a loop body; unroll again.
		if (!DisableUnrollLoops)
		PM.add(createLoopUnrollPass());

// Now that we've optimized loops (in particular loop induction variables),		// Now that we've optimized loops (in particular loop induction variables),
// we may have exposed more scalar opportunities. Run parts of the scalar		// we may have exposed more scalar opportunities. Run parts of the scalar
// optimizer again at this point.		// optimizer again at this point.
PM.add(createInstructionCombiningPass()); // Initial cleanup		PM.add(createInstructionCombiningPass()); // Initial cleanup
PM.add(createCFGSimplificationPass()); // if-convert		PM.add(createCFGSimplificationPass()); // if-convert
PM.add(createSCCPPass()); // Propagate exposed constants		PM.add(createSCCPPass()); // Propagate exposed constants
PM.add(createInstructionCombiningPass()); // Clean up again		PM.add(createInstructionCombiningPass()); // Clean up again
▲ Show 20 Lines • Show All 153 Lines • Show Last 20 Lines