This is an archive of the discontinued LLVM Phabricator instance.

[LoopReroll] Run LICM before Loop Rerolling.
ClosedPublic

Authored by jmolloy on Feb 16 2015, 9:08 AM.

Download Raw Diff

Details

Reviewers

jmolloy
hfinkel

Summary

Loop rerolling bails out if any value is not used by any iteration. Make sure all invariant values are hoisted first.

Diff Detail

Repository: rL LLVM

Event Timeline

jmolloy updated this revision to Diff 20037.Feb 16 2015, 9:08 AM

jmolloy retitled this revision from to [LoopReroll] Run LICM before Loop Rerolling..

jmolloy updated this object.

jmolloy edited the test plan for this revision. (Show Details)

jmolloy added a reviewer: hfinkel.

jmolloy set the repository for this revision to rL LLVM.

jmolloy added a subscriber: Unknown Object (MLST).

Two thoughts:

Is it hard to teach the unroller to ignore loop-invariant values? [SE has a function for that].
Except under iteration from the CGSCC pass manager, we don't run LICM again after loop rerolling. If we're still finding loop invariant values in loops at that point in the pass sequence, it seems like we should run this again generically (regardless of whether we might reroll or not).

Hi Hal,

Hmm yeah, good questions. I can do (1). I hadn't remembered that SE had a function for this - I was trying to avoid implementing LICM myself!

But I'm not sure if I should do (1) because of (2). In my testcase, which is fairly difficult to reduce, we end up with a stray invariant multiply which is not the cheapest thing in the world. Perhaps adding another LICM regardless of Rerolling is the right way to go after all?

In D7674#124359, @jmolloy wrote:

Hi Hal,

Hmm yeah, good questions. I can do (1). I hadn't remembered that SE had a function for this - I was trying to avoid implementing LICM myself!

But I'm not sure if I should do (1) because of (2). In my testcase, which is fairly difficult to reduce,

This is not difficult to reduce ;) -- add an assert in the loop reroller, and let bugpoint do it. We need to understand where this comes from so we can sensibly talk about adjusting the pass manager configuration.

we end up with a stray invariant multiply which is not the cheapest thing in the world. Perhaps adding another LICM regardless of Rerolling is the right way to go after all?

I vote for both.

Hi Hal,

OK, I managed to track it down (bugpoint didn't help, but I strained my eyes over every pass until I'd worked it out :) ).

It comes from loop unrolling - we unroll part of a loop nest but don't run LICM afterwards to clean up. New patch adds it to the cleanup part of the scalar optimizer.

As this was a latent bug, I don't think we need to smarten up the reroller to handle this - we should be able to expect our input to be optimized. Do you agree with this?

Cheers,

James

In D7674#124383, @jmolloy wrote:

Hi Hal,

OK, I managed to track it down (bugpoint didn't help, but I strained my eyes over every pass until I'd worked it out :) ).

It comes from loop unrolling - we unroll part of a loop nest but don't run LICM afterwards to clean up. New patch adds it to the cleanup part of the scalar optimizer.

Okay, nice!

As this was a latent bug, I don't think we need to smarten up the reroller to handle this - we should be able to expect our input to be optimized. Do you agree with this?

Yea, that's why I didn't handle them in the first place.

Cheers,

James

Thanks Hal - committed in r229419.

This revision is now accepted and ready to land.Feb 16 2015, 11:02 AM

jmolloy closed this revision.Mar 18 2015, 3:27 AM

Revision Contents

Path

Size

lib/

Transforms/

IPO/

PassManagerBuilder.cpp

6 lines

Diff 20037

lib/Transforms/IPO/PassManagerBuilder.cpp

Show First 20 Lines • Show All 256 Lines • ▼ Show 20 Lines	void PassManagerBuilder::populateModulePassManager(
MPM.add(createInstructionCombiningPass());		MPM.add(createInstructionCombiningPass());
addExtensionsToPM(EP_Peephole, MPM);		addExtensionsToPM(EP_Peephole, MPM);
MPM.add(createJumpThreadingPass()); // Thread jumps		MPM.add(createJumpThreadingPass()); // Thread jumps
MPM.add(createCorrelatedValuePropagationPass());		MPM.add(createCorrelatedValuePropagationPass());
MPM.add(createDeadStoreEliminationPass()); // Delete dead stores		MPM.add(createDeadStoreEliminationPass()); // Delete dead stores

addExtensionsToPM(EP_ScalarOptimizerLate, MPM);		addExtensionsToPM(EP_ScalarOptimizerLate, MPM);

if (RerollLoops)		if (RerollLoops) {
		// Run LICM before loop rerolling, as rerolling cannot handle any
		// loop-invariant values at all.
		MPM.add(createLICMPass());
MPM.add(createLoopRerollPass());		MPM.add(createLoopRerollPass());
		}
if (!RunSLPAfterLoopVectorization) {		if (!RunSLPAfterLoopVectorization) {
if (SLPVectorize)		if (SLPVectorize)
MPM.add(createSLPVectorizerPass()); // Vectorize parallel scalar chains.		MPM.add(createSLPVectorizerPass()); // Vectorize parallel scalar chains.

if (BBVectorize) {		if (BBVectorize) {
MPM.add(createBBVectorizePass());		MPM.add(createBBVectorizePass());
MPM.add(createInstructionCombiningPass());		MPM.add(createInstructionCombiningPass());
addExtensionsToPM(EP_Peephole, MPM);		addExtensionsToPM(EP_Peephole, MPM);
▲ Show 20 Lines • Show All 319 Lines • Show Last 20 Lines