This is an archive of the discontinued LLVM Phabricator instance.

[LICM] Sink entire inner loops.
Needs RevisionPublic

Authored by chrisdiamand_arm on Feb 12 2016, 9:36 AM.

Details

Summary

[LICM] Sink entire inner loops.

Currently, LICM can only move individual instructions. However, inner
loops can also be moved: If the outputs of a loop, (which are just
the phi nodes in the exit block, thanks to LCSSA form) are not used
in the outer loop, it can be sunk.

This patch teaches the LICM pass to sink an entire inner loop to the
exit block of its parent loop.

Diff Detail

Event Timeline

chrisdiamand_arm retitled this revision from to [LICM] Hoist and sink entire inner loops..
chrisdiamand_arm updated this object.
chrisdiamand_arm added a subscriber: llvm-commits.
jmolloy requested changes to this revision.Feb 15 2016, 9:15 AM
jmolloy edited edge metadata.

Hi Chris,

Thanks for working on this! This is a very impressively written patch. Most of my comments are to do with coding style and using more of LLVM's frameworks to avoid using helper functions.

Cheers,

James

test/Analysis/ScalarEvolution/2012-03-26-LoadConstant.ll
1

Why have these orders changed?

test/Transforms/LICM/inner-loop-dont-sink.ll
5

We generally only use CHECK-LABEL: for delineating between multiple testcases. Its only purpose is to stop llvm-lit running from one testcase to another.

As you've only got one testcase here, you should just use CHECK:.

jmolloy added inline comments.Feb 15 2016, 9:15 AM
include/llvm/Analysis/LoopInfo.h
369

This is a very public header, so please add good doxygen comments.

lib/Transforms/Scalar/LICM.cpp
331

don't need to check PN here because the loop condition has already checked it.

344

s/BB->getInstList()/*BB/

345

instead of !dyn_cast, use !isa<PHINode>().

383

s/TODO/FIXME/

392

Same comment as above: use isa<> instead of dyn_cast<> here if you're not using the result.

409

I'd rename this "replaceSuccessorIf", which implies the predicate. Functions that take predicates are also nicer to call if the predicate is last (it makes formatting a lambda easier).

Actually, I'd call it "replaceFirstSuccessorIf", as it doesn't replace multiple successors.

412

Capital "I" is the convention.

461

Single-line if statements should have their braces elided.

468

You should never need to directly modify InstList. Instead, just do:

PN->moveBefore(NewExitBlock->getTerminator())

(and remove PN->removeFromParent()).

OOI, it looks like NewExitBlock doesn't have a Terminator instruction at this point. I find it's always easier to make the block well-formed (add a terminator) as soon as possible, as it makes dumps, sanity checks and insertions (like this) easier.

473

You don't need this helper function:

SubLoopPredecessor->getTerminator()->replaceUsesOfWith(SubLoopHeader, OldExitBlock);
478
for (auto *PN = OldExitBlock.begin(); isa<PHINode>(PN); ++PN)
  PN->replaceUsesOfWith(ExitingBlock, SubLoopPredecessor);
483

You can use RAUW just like in line 473 to remove the need for the helper.

487

No braces around single-line statements.

Also:

for (auto *BB : SubLoop->blocks())
  CurLoop->removeBlockFromLoop(BB);
512

Assuming you've added a dummy terminator before now to keep NewExitBlock well-formed, you can simply do:

PreheaderTerminator->moveBefore(NewExitBlock->getTerminator());
NewExitBlock->getTerminator()->eraseFromParent();
533

Algorithmically this might be easier done by first splitting the outer loop's exit block after all PHI nodes then inserting the subloop in between. (see BasicBlock::split)

596

*BB, not BB->getInstList()

659

These should be inside DEBUG() macros for release builds.

1437

I'm not sure I understand why these have been changed?

This revision now requires changes to proceed.Feb 15 2016, 9:15 AM
lib/Transforms/Scalar/LICM.cpp
208

This conflicts with r260892 committed yesterday - I'll fix the merge conflicts in the next version.

659

Isn't everything in verifyLoop() wrapped in #ifndef NDEBUG anyway?

1437

The lookup() method inserts a null value into LoopToAliasSetMap when the key isn't found. This means that the assert(LoopToAliasSetMap.empty()) statement in doFinalization() fails.

find() doesn't add an entry when one doesn't already exist, so avoids this.

test/Analysis/ScalarEvolution/2012-03-26-LoadConstant.ll
1

When I first wrote it I had to switch them for some reason, but I've just tried it again and it's no longer needed. Will put them back.

reames edited edge metadata.Feb 16 2016, 7:40 PM

Taking a step back, can you give a motivating example on why we might want to do this? Your tests look like they'd be caught by loop-unswitch and LICM together, but I suspect that's just because the tests are (correctly) simple.

test/Transforms/LICM/inner-loop-dont-sink.ll
5

Er, this disagrees with quite a few other tests. :) Using CHECK-LABEL to ensure things are in the right basic block seems entirely reasonable to me.

chrisdiamand_arm updated this object.
chrisdiamand_arm edited edge metadata.
chrisdiamand_arm marked 19 inline comments as done.

Hopefully this address all your feedback, James.

  • I've completed removed the replaceSuccessor helper function, and replacePhiUses now no longer requires a predicate (as it's only used for one thing).
  • It also now uses splitBasicBlock during sinking, and inserts the loop between the two parts of the original exit block. The loop over the outer loop's exit block's phi nodes is still required, although I've updated the comment to explain it more clearly.
  • Style issues should be fixed.

Thanks!
Chris

chrisdiamand_arm edited edge metadata.

Taking a step back, can you give a motivating example on why we might want to do this? Your tests look like they'd be caught by loop-unswitch and LICM together, but I suspect that's just because the tests are (correctly) simple.

The motivation behind this is to handle things like the following:

void expensive_loop(...) {
  for (...)
    do_stuff;
}

for (int i = 0; i < big_number; ++i)
  expensive_loop(...);

I noticed that setting __attribute__((noinline)) on expensive_loop() made it run much faster. If inlining is disabled, the call to expensive_loop() can be sunk and is only run once. With inlining, the loop in expensive_loop() gets turned into a subloop, and can't be moved, so the inner loop is run big_number times. This patch would allow the inlined contents of expensive_loop() to be removed from the outer loop.

Cheers!
Chris

mcrosier added inline comments.
test/Transforms/LICM/inner-loop-dont-sink.ll
7

I tend to agree with Philip here. It also avoids issues if/when someone adds an additional test case to this file.

test/Transforms/LICM/inner-loop-dont-sink.ll
7

Ok, I've just looked this up here:

It is treated identically to a normal CHECK directive except that FileCheck makes an additional assumption that a line matched by the directive cannot also be matched by any other check present in match-filename

So James is correct in that it shouldn't be used on every label, but that's because a label (e.g. entry) may not be unique if another test case is added. I think I need to do something like:

; CHECK-LABEL: main
; CHECK: entry:
...

...because main is a unique function name within the file, even if another test is added.
Does that sound reasonable?

mcrosier added inline comments.Feb 17 2016, 7:45 AM
test/Transforms/LICM/inner-loop-dont-sink.ll
7

Correct. You should have a CHECK-LABEL directive on main and only main. Generally, each function name within a file (which must be unique) should have an associated CHECK-LABEL.

This adds one '; CHECK-LABEL: @main(' line to each new test.

I'm still going over the code, but the subloop management looks highly suspect. I'm deeply suspicions that this is not interacting properly with the LoopPassManager. Good places to look for inspiration are LoopUnswitch and LoopDeletion since they both manipulate the loop nest structure.

lib/Transforms/Scalar/LICM.cpp
720

FYI, this part in particular looks really really suspect. I'm not quite sure what the right way to solve this is, but this probably isn't it. :)

More random comments. Note that this is likely fairly far from submission in it's current form. You might want to think about options for splitting this up so that we don't get caught in a long review cycle.

lib/Transforms/Scalar/LICM.cpp
120

You'll need to enumerate which passes are preserved without this.

351

This should be checked by the caller. Possible in that helper function: isHeaderOfImmediateSubLoop?

354

Given these two sets of checks are repeated, a helper function would be good. Alternatively, is isLoopSimplifyForm sufficient?

361

Do we have any guarantee at this point that CurLoop has only one exit? If so, assert it.

379

Everywhere you have isa<BranchInst> you probably want to introduce handle all terminators except invokes.

384

This is a *really* expensive way to phrase this query. It'll be general, but slow. Can you look for an alternate way to express this for the entire loop in one go?

609

BB is already available in this scope.

613

Introducing a helper function isHeaderOfSubLoop would make this far easier to follow.

622

I don't see that your updating DT here. This is problematic since we'll be walking a stale tree.

Hi, thanks for taking a look at this! Comments inline.

lib/Transforms/Scalar/LICM.cpp
354

isLoopSimplifyForm allows multiple exit blocks, so it's not sufficient as currently written. LICM can sink individual instructions to multiple exit blocks IIRC, but it has to duplicate the instruction for each exit block. I'm not sure we'd want to duplicate entire inner loops though?

361

Good point, this should be checked by canHoistSubLoop. I'll add an assert, too.

379

I'm not sure about this - wouldn't that allow something like an indirect branch to a function with side-effects to be hoisted?

384

I think all of this stuff has to be checked for every instruction at some point. I think some checks are redundant though - canSinkOrHoistInst actually calls isSafeToExecuteUnconditionally, so the second call is redundant, for example (I just have both because that's what the original hoisting code does).

Some of this stuff is calculated during normal single instruction hoisting/sinking, so I'll investigate if information from that can be reused somehow.

609

Good point :)

622

Ok. I'll take a look at the other passes you mentioned to see what they do about this.

720

This bit was pretty tricky, and I agree it's not ideal. Here I used LP to access LICM::deleteAnalysisSubloop, which frees the AST (maybe it looks like I'm trying to tell the LoopPassManager about the deleted loop here?). An alternative would be to make sinkRegion and hoistRegion methods of LICM.

Either way, the AST management has to change a bit, otherwise hoisted subloops' ASTs don't get freed.

chrisdiamand_arm retitled this revision from [LICM] Hoist and sink entire inner loops. to [LICM] Sink entire inner loops..
chrisdiamand_arm updated this object.
chrisdiamand_arm marked an inline comment as done.

This update removes the hoisting code, in a bid to make the diff a bit more manageable.

It addresses two main issues pointed out by Philip:

  • It now tells the LPPassManager about loops which have been moved.
  • It recalculates the dominator tree after each pass.

There's also some other stuff, mostly factoring out various checks so they can be re-used for hoisting in a later patch.

In particular I'd welcome any comments on the dominator tree handling. My assumption here is that we don't need to update the DT in sinkRegion() (even though it's being traversed), because all the children of a subloop have already been visited by the time the sinking actually happens. One benefit of leaving out the hoisting for now is that the above assumption doesn't hold for hoisting...

Cheers!

lib/Transforms/Scalar/LICM.cpp
120

I think Chandler's recent patches to LICM et al now mean I now don't need to add anything here.

613

I'm not sure what you're after here, is it really that unclear? I've kept this check out of canSinkSubLoop() so that I can do everything after this line in terms of Loop *SubLoop instead of referring to the subloop by its header.

chrisdiamand_arm updated this object.

This now updates the DominatorTree incrementally, rather than recalculating the whole thing at the end of runOnLoop().

Has anyone had a chance to look at this yet?

Cheers!
Chris

Ping. Also (I should have mentioned this earlier), I've tested this with the regression tests, the LLVM test suite, and several proprietary benchmarks.

Updating the diff after rebasing on ToT (there was a conflict with some reformatting).

Has anyone got any comments on this? In particular, the interaction with the LoopPassManager, and keeping the DominatorTree up-to-date have been improved since Philip pointed those issues out.

Cheers,
Chris

Generally looks good to me.
I can't approve it, though.

lib/Transforms/Scalar/LICM.cpp
359

Any reason to not just use the block iterator instead of converting operands to blocks repeatedly?

374

Please add a message

394

The number of times you do this makes me wonder if we shouldn't just have a phi_iterator for the basic block.

421

Do we really have no branch redirect utility that does this?
(I thought we did have one that did this and also updated dominators)

chrisdiamand_arm marked 2 inline comments as done.Jun 1 2016, 7:37 AM

Replies inline - cheers!

lib/Transforms/Scalar/LICM.cpp
359

Yep - the index is required for setIncomingBlock anyway, so I thought it seemed cleaner to use it throughout instead of mixing indices and iterators.

394

That would be extremely useful, I think there are 4 here. Probably outside the scope of this patch though...

421

Not that I can find, through lots of recursive grepping. This particular code is quite specific to LICM (or at least loop transformations) anyway - it has to redirect only the branches which point outside the subloop. Are there any other passes which replace loop exit blocks that could benefit from this being factored out?

chrisdiamand_arm marked 3 inline comments as done.

Add messages to assertions.

Rebase and fix conflicts with renaming 'LICMSafetyInfo' to 'LoopSafetyInfo'. Also ping :)

Hi - does anyone have any thoughts on this?

sanjoy resigned from this revision.Jun 24 2017, 12:41 PM

Inactive, as far as I can tell.

This revision now requires changes to proceed.Jun 24 2017, 12:41 PM