This is an archive of the discontinued LLVM Phabricator instance.

Add LoopSimplifyCFG pass
ClosedPublic

Authored by escha on Jan 20 2016, 4:55 PM.

Details

Summary

super short version: this is a loop pass that does trivial CFG simplification on a loop, as requested by Chandler as the solution to the real problem below. it isn't used in the pass manager yet. right now it only merges consecutive blocks; it doesn't do anything fancier, but could in the future.

Details:

This IR has a perfectly reasonable nested loop that rotate -> unroll does not actually unroll all the way:

define i32 @foo(i32* %P, i64 *%Q) {
entry:
  br label %outer

outer:
  %y.2 = phi i32 [ 0, %entry ], [ %y.inc2, %outer.latch2 ]
  br label %inner

inner:
  %x.2 = phi i32 [ 0, %outer ], [ %inc2, %inner ]
  %inc2 = add nsw i32 %x.2, 1
  %exitcond2 = icmp eq i32 %inc2, 3
  store i32 %x.2, i32* %P
  br i1 %exitcond2, label %outer.latch, label %inner

outer.latch:
  %y.inc2 = add nsw i32 %y.2, 1
  %exitcond.outer = icmp eq i32 %y.inc2, 3
  store i32 %y.2, i32* %P
  br i1 %exitcond.outer, label %exit, label %outer.latch2

outer.latch2:
  %t = sext i32 %y.inc2 to i64 
  store i64 %t, i64* %Q
  br label %outer

exit:
  ret i32 0
}

This is because after unrolling the inner loop, the outer loop has two header blocks, which while valid and canonical in terms of LCSSA, is not what loop rotate understands. The hack solution is to run rotate -> unroll -> simplifycfg-> rotate -> unroll, which is bad. The slightly less hack is to put this simplification into LoopSimplify, which Chandler argues is a bad idea because LoopSimplify specifically simplifies in ways that maintain the canonical form, and nothing else (and we may want to run LoopSimplifyCFG in other places for other reasons). Chandler suggests that the most general solution is just to add a much-needed LoopSimplifyCFG, which I did.

The problem with using this right now is that in practice, you need a pipeline that looks like this to make use of it:

LoopPassManager:

  • Loop SimplifyCFG
  • Loop Rotate
  • Loop Unroll

And currently the PassManagerBuilder causes the LPMs to be split up due to analyses that are required being inserted in between (which chandler is working on). However, with a shim to require the associated analyses, this does work in practice in our pipeline out of tree, and a test just for this pass is included.

This is important to us because we have critical benchmark code that takes a form similar to this and similarly fails to unroll, resulting in catastrophic performance regressions.

Diff Detail

Repository
rL LLVM

Event Timeline

escha updated this revision to Diff 45465.Jan 20 2016, 4:55 PM
escha retitled this revision from to Add LoopSimplifyCFG pass.
escha updated this object.
escha set the repository for this revision to rL LLVM.
escha added a subscriber: llvm-commits.
escha updated this object.Jan 20 2016, 4:55 PM
escha updated this object.Jan 20 2016, 4:58 PM
mzolotukhin edited edge metadata.Jan 20 2016, 5:37 PM

Hi,

First of all, thanks for working on this! One question on the test below.

Michael

test/Transforms/LoopSimplifyCFG/merge-header.ll
5–6

Which basic blocks do we merge here? From the check-lines it looks like we merge %entry with %outer, which doesn't sound right, since we want to merge only blocks inside loops. Am I misreading something here?

Minor drop by comments inline.

lib/Transforms/Scalar/LoopSimplifyCFG.cpp
38

Minor: convention is to not indent namespaces: http://llvm.org/docs/CodingStandards.html#namespace-indentation

Also, why not make this a struct?

91

Minor: coding style is to name this Changed.

106

Won't this invalidate E?

escha added inline comments.Jan 20 2016, 9:13 PM
lib/Transforms/Scalar/LoopSimplifyCFG.cpp
38

I was just copying LoopDeletion to make this pass; has the style changed since then?

91

Okay, will change this on the next update.

106

I don't *think* so? This only modifies things earlier in the list, not later. Should I recalculate E on each iteration just to be sure?

test/Transforms/LoopSimplifyCFG/merge-header.ll
5–6

We want to merge Entry and Outer here, yes. Entry is the loop header, Outer + Inner are the body, Outer.Latch2 is the latch, and Exit is the loop exit. Entry + Outer can be merged in this case.

majnemer added inline comments.
lib/Transforms/Scalar/LoopSimplifyCFG.cpp
106

Loop::block_end accesses the end iterator of a vector. LoopInfo::removeBlock calls Loop::removeBlockFromLoop which will erase an element from the vector. std::vector::erase invalidates all iterators at or beyond the point of the erase including the vector's end iterator.

sanjoy added inline comments.Jan 21 2016, 8:14 AM
lib/Transforms/Scalar/LoopSimplifyCFG.cpp
38

My guess would be that that indentation rule was added after LoopDeletion was added to LLVM. In any case, it is in the coding standard now, so we should just follow that. :)

If you have spare time, I think fixing the indentation in LoopDeletion in a separate change will also be a appropriate thing to do.

106

Now that I think of it, is the ++I safe? Won't I be invalidated by the erase?

chandlerc added inline comments.Jan 21 2016, 8:53 AM
lib/Transforms/Scalar/LoopSimplifyCFG.cpp
38

Feel free to just throw some clang-format on this code since it is "new", it'll do the right thing for you. =]

106

My suggestion would be to copy the loop blocks into a local smallvector, and then just loop over that so you don't have to worry about this and can freely update LoopInfo as you go.

escha updated this revision to Diff 45565.Jan 21 2016, 10:49 AM
escha edited edge metadata.

Updated diff with suggested changes.

mzolotukhin added inline comments.Jan 21 2016, 12:32 PM
lib/Transforms/Scalar/LoopSimplifyCFG.cpp
89

Nitpick: s/changed/Changed/

test/Transforms/LoopSimplifyCFG/merge-header.ll
6–7

Hmm, the header here is %outer, %entry is a preheader:

$ opt -analyze -loops < merge-header.ll
Printing analysis 'Natural Loop Information' for function 'foo':
Loop at depth 1 containing: %outer<header>,%inner<exiting>,%outer.latch2<latch>

That said, I think the test is correct in current form (I misread it first time), but I think we might want to make it more explicit. Like, we can check something like this:

CHECK:      entry:
CHECK-NEXT:   br label %[[LOOP:%.]]
CHECK:      [[LOOP]:
CHECK-NEXT:   phi
CHECK-NOT:    br label
CHECK:        br i1

This way we make sure that the blocks are actually merged.

sanjoy added inline comments.Jan 21 2016, 12:34 PM
lib/Transforms/Scalar/LoopSimplifyCFG.cpp
110

This is going to free Pred, right? Is it possible that Pred is after Succ in Blocks and you'll end up trying to examine a free'ed block later?

anemet added a subscriber: anemet.Jan 21 2016, 12:46 PM

I also like the idea.

I guess one more difference between this pass and LoopSimplify that LoopSimplify is a function pass whereas this one is a loop pass. So you do get the nice inner-to-outer propagation you're looking for.

escha added inline comments.Jan 21 2016, 3:40 PM
lib/Transforms/Scalar/LoopSimplifyCFG.cpp
110

Ugh, I guess this is possible. How do we avoid this given iterator invalidation issues?

sanjoy added inline comments.Jan 21 2016, 3:45 PM
lib/Transforms/Scalar/LoopSimplifyCFG.cpp
110

I think things would Just Work(TM) if you have SmallVector<WeakVH, 16> Blocks, and check if an entry has been null'ed out before you do anything with it.

zzheng added a subscriber: zzheng.Jan 21 2016, 4:05 PM
escha updated this revision to Diff 45715.Jan 22 2016, 10:56 AM

Updated per comments.

reames added a subscriber: reames.Jan 25 2016, 1:32 PM

A meta concern here: how do we share code between this pass and the existing SimplifyCFG pass? It would be really unfortunate if we ended up duplicating large chunks of SimplifyCFG here. The simple unconditional edge folding case here is fine as a starting point and the code duplication isn't too terrible, but how do we avoid problems going forward?

escha added a comment.Jan 25 2016, 1:35 PM

The long-term plan is that if large chunks of SimplifyCFG are moved here for any reason, they'll be factored out into utility functions (to avoid duplicating things).

I have a couple of comments inline, after which this looks good to me.

lib/Transforms/Scalar/LoopSimplifyCFG.cpp
70

Why do you need to depend on TargetTransformInfoWrapperPass and AssumptionCacheTracker?

97

I think you're copying ValueHandle s here (which will call ValueHandleBase::AddToExistingUseList). Can you change this to const auto & or auto & ?

100

You should be able to use cast_or_null here.

escha added inline comments.Jan 29 2016, 2:13 PM
lib/Transforms/Scalar/LoopSimplifyCFG.cpp
70

I guess I don't fully understand this section. Are these "dependencies" things we need to run, or are these "dependencies" things we can pass on to the next pass (preserved), or...? I notice a lot of passes seem to have things here that aren't listed as Required. How does this section relate to the Required/Preserves bit?

escha added a comment.Jan 29 2016, 2:35 PM

Nitpicks resolved and approved by Justin offline (he explained the pass dependencies bit to me as well).

escha accepted this revision.Jan 29 2016, 2:40 PM
escha added a reviewer: escha.
This revision is now accepted and ready to land.Jan 29 2016, 2:40 PM
escha closed this revision.Jan 29 2016, 2:41 PM