This is an archive of the discontinued LLVM Phabricator instance.

[IndVars] Break backedge and replace PHIs if loop exits on 1st iteration
ClosedPublic

Authored by dmakogon on Aug 30 2021, 2:28 AM.

Details

Summary

Implement TODO in optimizeLoopExits. Now if we have proved that some loop exit is taken on 1st iteration, we make all branches in the following exiting blocks always branch to the loop and also we replace all loop header PHI nodes with the values from the loop preheader (because the backedge is never taken and the loop is in the Loop Simplify Form) and simplify their uses.

Diff Detail

Event Timeline

dmakogon created this revision.Aug 30 2021, 2:28 AM
dmakogon requested review of this revision.Aug 30 2021, 2:28 AM
Herald added a project: Restricted Project. · View Herald TranscriptAug 30 2021, 2:28 AM

Fine by me with nits, but please get someone else's approval.

llvm/lib/Transforms/Scalar/IndVarSimplify.cpp
1320

{ } not needed

llvm/test/Transforms/IndVarSimplify/eliminate-backedge.ll
2

Please add another run commant with -passes=indvars (new PM).

65

Can you pls make one of the exits leave the loop by true condition and stay by false condition, just to see how the branch get folded in this case?

dmakogon updated this revision to Diff 369640.Aug 31 2021, 12:19 AM

Resolved style issues and updated test to have an exiting branch on true condition.
Now we mark all exits after the one taken on the 1st iteration as not live, so that they never branch to exit.

dmakogon marked 3 inline comments as done.Aug 31 2021, 12:19 AM
dmakogon planned changes to this revision.Aug 31 2021, 1:44 AM
dmakogon updated this revision to Diff 370138.Sep 1 2021, 7:27 PM

Try to simplify PHI uses after substituting the preheader value.
Now if proved that some exit is taken on 1st iteration we replace all further branches' conditions with the ones branching to the loop (instead of their exits).

dmakogon updated this revision to Diff 370143.Sep 1 2021, 7:45 PM

Fix style issues

dmakogon edited the summary of this revision. (Show Details)Sep 1 2021, 7:59 PM
dmakogon updated this revision to Diff 370160.Sep 1 2021, 10:12 PM
dmakogon updated this revision to Diff 370205.Sep 2 2021, 2:25 AM

Apply clang-format

mkazantsev added inline comments.Sep 2 2021, 3:19 AM
llvm/lib/Transforms/Scalar/IndVarSimplify.cpp
1325

Not sure about this. Potentially expensive compile time wise, and not clear what consequences changing the content of other loops has. Might be reasonable for enabling more other transforms, though.

Let's split it off and think through.

dmakogon updated this revision to Diff 370222.Sep 2 2021, 3:55 AM

Now we do not simplify replaced PHI nodes leaving it for further passes

Please can you precommit the test regeneration and the new tests?

Dima doen't have access yet; I will check in the test now. @dmakogon please rebase after it's merged.

dmakogon updated this revision to Diff 370847.Sep 5 2021, 10:36 PM
dmakogon added a reviewer: lebedev.ri.

New test file was merged with another patch, so rebased this one and updated test checks.

Fine by me. Roman?

lebedev.ri accepted this revision.Sep 7 2021, 1:45 AM

I think there are two patches here - ExitsOnFirstIter, and replaceLoopPHINodesWithPreheaderValues() change.
I think it may be nice to split the patch into two, with replaceLoopPHINodesWithPreheaderValues() being the first one.

I think there's generality that may be missing here.
We know which exit exits on the first iteration, which means that all the following exits are not reached.
But does that tell us anything about the preceding exits?
I'm roughly thinking about this situation: https://godbolt.org/z/KdKb3jzqW (ignore that it is optimized already)

Also, much like with that simplifycfg patch, should we be emitting assumptions here?

All that being said, seems reasonable.

llvm/test/Transforms/IndVarSimplify/floating-point-iv.ll
345 ↗(On Diff #370847)

Please regenerate the checklines in the affected test files before committing the actual patch.

This revision is now accepted and ready to land.Sep 7 2021, 1:45 AM
mkazantsev added a comment.EditedSep 12 2021, 8:23 PM

I think there are two patches here - ExitsOnFirstIter, and replaceLoopPHINodesWithPreheaderValues() change.
I think it may be nice to split the patch into two, with replaceLoopPHINodesWithPreheaderValues() being the first one.

Agreed. Dima, please split them up and I'll merge them.

I think there's generality that may be missing here.
We know which exit exits on the first iteration, which means that all the following exits are not reached.
But does that tell us anything about the preceding exits?

AFAIK getExitCount returns exact exit count for a given block *under assumption that this exit will be taken*. It doesn't account for preceeding checks. In general case, we can do the following opt. Given 2 exits A and B, exact counts known for both, A dominates B and exact exit count of A is strictly greater than those for B. In this case we can remove exit A as never taken. Not sure if IndVars already does it, let me try to wright a simple example...

BTW, every preceeding check with strictly positive exit cound can be removed.

mkazantsev accepted this revision.Sep 12 2021, 8:32 PM

! In D108910#2996649, @mkazantsev wrote:
Not sure if IndVars already does it, let me try to wright a simple example...

...Yes it does.

; NOTE: Assertions have been autogenerated by utils/update_test_checks.py
; RUN: opt -indvars -S < %s | FileCheck %s
; RUN: opt -passes=indvars -S < %s | FileCheck %s

declare void @never_called()
declare void @will_be_called()

define void @test_01(i32 %a, i32 %b) {
; CHECK-LABEL: @test_01(
; CHECK-NEXT:  entry:
; CHECK-NEXT:    [[GUARD_COND:%.*]] = icmp ugt i32 [[A:%.*]], [[B:%.*]]
; CHECK-NEXT:    br i1 [[GUARD_COND]], label [[LOOP_PREHEADER:%.*]], label [[FAILURE:%.*]]
; CHECK:       loop.preheader:
; CHECK-NEXT:    br label [[LOOP:%.*]]
; CHECK:       loop:
; CHECK-NEXT:    [[IV:%.*]] = phi i32 [ [[IV_NEXT:%.*]], [[BACKEDGE:%.*]] ], [ 0, [[LOOP_PREHEADER]] ]
; CHECK-NEXT:    br i1 false, label [[NEVER_CALLED:%.*]], label [[BACKEDGE]]
; CHECK:       backedge:
; CHECK-NEXT:    [[IV_NEXT]] = add i32 [[IV]], 1
; CHECK-NEXT:    [[COND_2:%.*]] = icmp eq i32 [[IV]], [[B]]
; CHECK-NEXT:    br i1 [[COND_2]], label [[WILL_BE_CALLED:%.*]], label [[LOOP]]
; CHECK:       never_called:
; CHECK-NEXT:    call void @never_called()
; CHECK-NEXT:    ret void
; CHECK:       will_be_called:
; CHECK-NEXT:    call void @will_be_called()
; CHECK-NEXT:    ret void
; CHECK:       failure:
; CHECK-NEXT:    ret void
;
entry:
  %guard_cond = icmp ugt i32 %a, %b
  br i1 %guard_cond, label %loop, label %failure

loop:
  %iv = phi i32 [0, %entry], [%iv.next, %backedge]
  %cond_1 = icmp eq i32 %iv, %a
  br i1 %cond_1, label %never_called, label %backedge

backedge:
  %iv.next = add i32 %iv, 1
  %cond_2 = icmp eq i32 %iv, %b
  br i1 %cond_2, label %will_be_called, label %loop

never_called:
  call void @never_called()
  ret void

will_be_called:
  call void @will_be_called()
  ret void

failure:
  ret void
}
reames added a subscriber: reames.Sep 13 2021, 10:17 AM

I have reverted this patch for a couple of reasons:

  1. The commit message was incorrect. The change did not actually break the backedge.
  2. There doesn't seem to have been any discussion in the review of why folding the exits to taken in provably dead code was the correct answer. In particular, why not fold to poison? Or just leave it to loop deletion? Motivation is important.

These comments should be easy to address. I reverted mostly because of the functional issue in the split off patch, and decided to revert both to be safe.

@reames I don't quite get the 2nd part of the revert motivation. Leaving it to loop deleiton is cool if there will be loop deletion. Branching out of loop looks better because SimplifyCFG can deal with it. Why would anything be better than leaving the loop (and potentially breaking it for further passes)? Any examples of this?

reames added a comment.EditedSep 27 2021, 11:32 AM

@reames I don't quite get the 2nd part of the revert motivation. Leaving it to loop deleiton is cool if there will be loop deletion. Branching out of loop looks better because SimplifyCFG can deal with it. Why would anything be better than leaving the loop (and potentially breaking it for further passes)? Any examples of this?

Max,

Sorry for delayed response, I apparently forgot to reply to this before leaving for vacation. My apologizes.

My second point is really two questions:

  1. Why do we need to fold these branches to constants at all? We've already proven that a dominating exit was taken, and thus the code being modified in this change is provably unreachable. I'd expect existing passes (SimplifyCFG, LoopDeletion, etc..) to handle the current form without issue. Do you have a case where this change actually effects the output after say simplify-cfg? I'd not expect that, and it almost seems like you might be covering up a missing transform elsewhere.
  2. Given we're dealing with unreachable code, why fold to untaken? Why not be more aggressive about identifying the explicit UB? Replacing terminators with "unreachable" would require non-trivial analysis update, so I get not doing that. But why not replace the condition with an explicit "poison"? That would be trivially immediate UB, and likely "clearer" for simplify-cfg and friends.

p.s. The title and description of this review still needs updated.