Page MenuHomePhabricator

[LoopSimplifyCFG] Teach LoopSimplifyCFG to constant-fold branches and switches

Authored by mkazantsev on Nov 1 2018, 8:15 PM.



This patch introduces infrastructure and the simplest case for constant-folding
of branch and switch instructions within loop into unconditional branches.
It is useful as a cleanup for such passes as loop unswitching that sometimes
produce such branches.

Only the simplest case supported in this patch: after the folding, no block
should become dead or stop being part of the loop. Support for more
sophisticated cases will go separately in follow-up patches.

Diff Detail


Event Timeline

mkazantsev created this revision.Nov 1 2018, 8:15 PM
mkazantsev updated this revision to Diff 172299.Nov 1 2018, 8:25 PM
mkazantsev updated this revision to Diff 172300.Nov 1 2018, 8:29 PM
mkazantsev updated this revision to Diff 172301.Nov 1 2018, 9:09 PM

Fix typo in dump()

reames added a comment.Nov 2 2018, 5:36 PM

Drive by comments.

It looks like you forgot to include your test cases?

182 ↗(On Diff #172301)

Why bother to keep around both sets when given an exit block you can answer liveness questions with only one copy? (Same question on other parallel dead/live structures BTW)

196 ↗(On Diff #172301)

I think you need a better name for this variable.

198 ↗(On Diff #172301)

Not clear to me why you need this bit of code in this patch.

248 ↗(On Diff #172301)

Where's the code for LI updating?

dmgreen added a subscriber: dmgreen.Nov 4 2018, 1:02 PM
dmgreen added inline comments.
64 ↗(On Diff #172301)

Should be CI?

anna added inline comments.Nov 5 2018, 9:17 AM
248 ↗(On Diff #172301)

From the summary of this patch, it looks like we don't need LI update in this patch:

Only the simplest case supported in this patch: after the folding, no block
should become dead or stop being part of the loop. Support for more
sophisticated cases will go separately in follow-up patches.

If that's the case, I'd really like an assert to make sure that the LI is still correct after this transform is completed.

I'll review the code to see if what's stated here is actually true (for example, we have a number of prechecks before performing the actual transform). See run function below.

anna added a comment.Nov 5 2018, 10:14 AM

Couple of comments inline. Pls add test cases.

64 ↗(On Diff #172301)

yup, looks like a typo.

136 ↗(On Diff #172301)

if constant folding will be done. At this point CF is not done.

159 ↗(On Diff #172301)

Pls add test cases where:

  1. Block's only incoming edge if constant folded will make the block dead - this transform shouldn't kick in.
  2. Block has multiple incoming edges - even if all except one is constant folded, the block is still live.
198 ↗(On Diff #172301)

If LoopIsLive is false, we should just return at this point now, right? For this patch, I believe this chunk of code is not needed. Am I missing something?

261 ↗(On Diff #172301)

Pls add test cases for each of the scenarios described below. The constantfold xform should kick in with this patch.

288 ↗(On Diff #172301)

We need a bunch of asserts here:

  1. assert that L.getHeader() is reachableFromEntry.
  2. assert that LI is correct (mentioned in earlier comment).
anna requested changes to this revision.Nov 5 2018, 10:19 AM

based on above comments.

This revision now requires changes to proceed.Nov 5 2018, 10:19 AM

As for the test: I have commited all related tests I am planning to support in this pass in the file test/Transforms/LoopSimplifyCFG/constant-fold-branch.ll. It has all situations with dead blocks, live blocks, dead loops etc, please take a look and tell me if some important case is missing. My plan is to support folding in all these cases incrementally.

As for comments in code, I'll address them shortly.

mkazantsev added inline comments.Nov 5 2018, 5:00 PM
64 ↗(On Diff #172301)

Good catch!

159 ↗(On Diff #172301)

Already in test/Transforms/LoopSimplifyCFG/constant-fold-branch.ll, it has a variety of such cases with inner/exit blocks and loops becoming dead.

182 ↗(On Diff #172301)

As we support more cases in follow-up patches, sets of live and dead blocks have different purposes and are all used. See:

This patch just collects all required information and enables the simplest case (that doesn't need all of it). However, as we add more logic, these sets are used more actively.

All this logic is only split into two patches to make it easier to review.

198 ↗(On Diff #172301)

You are right. But this patch is just collecting all required data for follow-up patches

My idea was to collect *all* required data in this patch and then add transforms that use it one by one, without adding more analysis. If you think it would be easier to review if the data collection is also added incrementally, I can rework the change.

248 ↗(On Diff #172301)

No LI update is needed because no block gets deleted or leaves the loop. This patch only handles the situation when it is so. You can take a look on follow-up dependent patches, they have LI updates.

I will add the assert that LI is still correct after the transform.

261 ↗(On Diff #172301)

All in test/Transforms/LoopSimplifyCFG/constant-fold-branch.ll. Currently they are all negative tests. Patches to this pass enable groups of them.

288 ↗(On Diff #172301)

AFAIK reachibility of header is implied by correctness of LI. Will do.

skatkov added inline comments.Nov 5 2018, 6:48 PM
48 ↗(On Diff #172301)

Please update the comment that you support only branch and switch terminators and for others you just return nullptr.

49 ↗(On Diff #172301)

In terms of compile time improvement I see that in analysis you use this function a lot.
So it makes sense to cache the result. So you need a set of live pairs <Src, Dest>.

70 ↗(On Diff #172301)

May be just return this LiveSucc here and after the loop just return the default dest?

103 ↗(On Diff #172301)

According to code how you fill in it, it seems that FoldCandidates contains only blocks from L, not form sub loops...

160 ↗(On Diff #172301)

With this check you want to process only blocks from the current loop, not in sub loops. What is the reason for that?

196 ↗(On Diff #172301)

In terms of analysis I think the restriction that there is an only one loop latch is redundant.

279 ↗(On Diff #172301)

It seems that this check should be the first one.
If there is no to fold so no one of the check above will be true.

345 ↗(On Diff #172301)

Be careful here in the future, if constantFoldTerminators will eliminate the current loop you should not run merge blocks...

mkazantsev planned changes to this revision.Nov 5 2018, 6:58 PM

Also Max, please consider split this patch into two ones. One just NFC patch with analysis and update tests to check the output of dump and then patches which does modifications.

Making a separate NFC which checks analyzes behavior doesn't sound because I am using sets to store blocks status, and printing out the contents of sets is non-deterministic.

mkazantsev marked 8 inline comments as done.Nov 6 2018, 10:34 PM
mkazantsev added inline comments.
49 ↗(On Diff #172301)

This check is free for branches, for switches it's O(num_succ) which is negligible. I think it's better keep it that way than add a cache which can be a source of potential bugs.

160 ↗(On Diff #172301)

If we modify some condition in the inner loop, we may end up needing to update its loop info, which we don't want to do. Besides, we expect that all branches in inner loops will have been folded by the moment when we will process the outer loop.

196 ↗(On Diff #172301)

It makes things easier, and we canonicalize most loops to this form, so I think we shouldn't bother about multi-latch case. We can support them later if needed.

345 ↗(On Diff #172301)

Yes, it will be handled when we support deletion of the current loop (I plan to make it as the last step).

Fixed typo bug, added LI and DT verification, changed FoldingCandidates to vector to keep deterministic order of updates, updated tests with decline messages to make sure that we exercise all possible scenarios in our test file.

anna added a comment.Nov 7 2018, 10:19 AM

comments inline. Algo looks fine.

346 ↗(On Diff #172898)

You're not running this to a fixed point, so why not just do Changed =

198 ↗(On Diff #172301)

pls add this with the patch that deletes the loop, instead of in this patch. I'd like to reduce the complexity of this patch.

288 ↗(On Diff #172301)

I checked, but LI.verify() doesn't contain that check for header is reachableFromEntry. We recompute a new LI and then verify original LI and the recomputed one is the same.

8 ↗(On Diff #172898)

I understand why you added these, but it's Really a lot of information in the debug output. I think "Give up constant terminator folding in loop header" is enough.

apilipenko added inline comments.
114–115 ↗(On Diff #172898)

Minor. Extract a local lambda?

mkazantsev added inline comments.Nov 7 2018, 9:18 PM
8 ↗(On Diff #172898)

Specifying the reason lets us be sure that we exercise all scenarios in this file. Besides, all these messages will be deleted one by one as the optimization is generalized. It's more a temporal thing for current development than something that will stay for long. Let's leave it as is, we're going to delete them anyways.

mkazantsev added inline comments.Nov 7 2018, 9:24 PM
198 ↗(On Diff #172301)

Sorry for misleading you earlier. This logic works if we *do not* delete the loop. I cannot remove it. It is required to detect the cases we are not supporting yet.

mkazantsev marked 3 inline comments as done.Nov 7 2018, 9:41 PM
mkazantsev added inline comments.
288 ↗(On Diff #172301)

Both things are asserted now.

mkazantsev marked 2 inline comments as done.

Added lambda for printing;
Reworked to make it clear that we are doing it for non-dead loops;
Simplified logic in constantFoldTerminators

anna added inline comments.Nov 13 2018, 1:29 PM
192 ↗(On Diff #173121)

I find this name misleading because the current loop can be deleted in 2 ways:

  1. latch to header edge is dead
  2. edge to header from outside this loop is dead

Second case does not happen now, because you only consider blocks in current loop as candidate for folding.

However, I think you need to add both cases for completeness here.

198 ↗(On Diff #172301)

okay, makes sense.

8 ↗(On Diff #172898)

okay, pls correct typo: "blocks that are not dead".

anna requested changes to this revision.Nov 14 2018, 6:48 AM

Forgot to mark request changes. See previous inline comments.

This revision now requires changes to proceed.Nov 14 2018, 6:48 AM
mkazantsev added inline comments.Nov 16 2018, 3:34 AM
192 ↗(On Diff #173121)

Technically, we only modify the current loop and cannot change anything outside this loop, so 2nd case is just out of scope.

mkazantsev added inline comments.Nov 16 2018, 3:41 AM
192 ↗(On Diff #173121)

I've added an explanatory comment why the second case is impossible in the field's comment.

mkazantsev marked an inline comment as done.

Fixed typo, added comment that explains why the loop can be dead in one way only.

anna accepted this revision.Nov 16 2018, 9:26 AM

LGTM. Thanks for working through the comments.

90 ↗(On Diff #174351)

Minor: can remove the "in practice" here.

This revision is now accepted and ready to land.Nov 16 2018, 9:26 AM
mkazantsev marked an inline comment as done.Nov 18 2018, 8:48 PM
This revision was automatically updated to reflect the committed changes.

Hello. I have an error from this, which I think may well be a knock-on affect for the code later in the pass. Something like this:

define void @repo(i32 %size) {
  br label %loop

  %i = phi i32 [ 0, %entry ], [ %inc, %loop2 ]
  br i1 true, label %loop1, label %loop2

  call void @something()
  br label %loop2

  %phi = phi i32 [ 1, %loop ], [ 2, %loop1 ]
  %inc = add i32 %i, %phi
  %cmp = icmp ult i32 %inc, 3900
  br i1 %cmp, label %loop, label %end

  ret void

declare void @something()

The br i1 true is const-folded, but the %phi gets the wrong value, taking the 1 where it should be taking the 2.

Thanks for pointing out, the patch has been reverted due to very same problem. It is now fixed by recommit patch rL347289.

Hello, Another one for you, this time to do with LCSSA not being preserved, run with "opt -loop-simplifycfg -indvars simplified.ll -S":

target datalayout = "e-m:e-i8:8:32-i16:16:32-i64:64-i128:128-n32:64-S128"
target triple = "aarch64-arm-none-eabi"

define void @c() {
  br label %d

d.loopexit:                                       ; preds = %if.end.7, %for.body
  %.lcssa = phi i32 [ %1, %for.body ], [ 0, %if.end.7 ]
  br label %d

d:                                                ; preds = %d.loopexit, %entry
  %0 = phi i32 [ undef, %entry ], [ %.lcssa, %d.loopexit ]
  br label %for.cond

for.cond:                                         ; preds = %if.end.8, %d
  %1 = phi i32 [ %0, %d ], [ 0, %if.end.8 ]
  br label %for.body

for.body:                                         ; preds = %for.cond
  %tobool2 = icmp eq i32 %1, 0
  br i1 %tobool2, label %if.end, label %d.loopexit

if.end:                                           ; preds = %for.body
  br label %if.end.7

if.end.7:                                         ; preds = %if.end
  br i1 true, label %if.end.8, label %d.loopexit

if.end.8:                                         ; preds = %if.end.7
  br label %for.cond

Thanks for the test, I will fix it.