Implement TODO in optimizeLoopExits. Now if we have proved that some loop exit is taken on 1st iteration, we make all branches in the following exiting blocks always branch to the loop and also we replace all loop header PHI nodes with the values from the loop preheader (because the backedge is never taken and the loop is in the Loop Simplify Form) and simplify their uses.
Details
Diff Detail
- Repository
- rG LLVM Github Monorepo
Event Timeline
Fine by me with nits, but please get someone else's approval.
llvm/lib/Transforms/Scalar/IndVarSimplify.cpp | ||
---|---|---|
1320 | { } not needed | |
llvm/test/Transforms/IndVarSimplify/eliminate-backedge.ll | ||
2 | Please add another run commant with -passes=indvars (new PM). | |
65 | Can you pls make one of the exits leave the loop by true condition and stay by false condition, just to see how the branch get folded in this case? |
Resolved style issues and updated test to have an exiting branch on true condition.
Now we mark all exits after the one taken on the 1st iteration as not live, so that they never branch to exit.
Try to simplify PHI uses after substituting the preheader value.
Now if proved that some exit is taken on 1st iteration we replace all further branches' conditions with the ones branching to the loop (instead of their exits).
llvm/lib/Transforms/Scalar/IndVarSimplify.cpp | ||
---|---|---|
1325 | Not sure about this. Potentially expensive compile time wise, and not clear what consequences changing the content of other loops has. Might be reasonable for enabling more other transforms, though. Let's split it off and think through. |
Dima doen't have access yet; I will check in the test now. @dmakogon please rebase after it's merged.
New test file was merged with another patch, so rebased this one and updated test checks.
I think there are two patches here - ExitsOnFirstIter, and replaceLoopPHINodesWithPreheaderValues() change.
I think it may be nice to split the patch into two, with replaceLoopPHINodesWithPreheaderValues() being the first one.
I think there's generality that may be missing here.
We know which exit exits on the first iteration, which means that all the following exits are not reached.
But does that tell us anything about the preceding exits?
I'm roughly thinking about this situation: https://godbolt.org/z/KdKb3jzqW (ignore that it is optimized already)
Also, much like with that simplifycfg patch, should we be emitting assumptions here?
All that being said, seems reasonable.
llvm/test/Transforms/IndVarSimplify/floating-point-iv.ll | ||
---|---|---|
345 ↗ | (On Diff #370847) | Please regenerate the checklines in the affected test files before committing the actual patch. |
Agreed. Dima, please split them up and I'll merge them.
AFAIK getExitCount returns exact exit count for a given block *under assumption that this exit will be taken*. It doesn't account for preceeding checks. In general case, we can do the following opt. Given 2 exits A and B, exact counts known for both, A dominates B and exact exit count of A is strictly greater than those for B. In this case we can remove exit A as never taken. Not sure if IndVars already does it, let me try to wright a simple example...
BTW, every preceeding check with strictly positive exit cound can be removed.
! In D108910#2996649, @mkazantsev wrote:
Not sure if IndVars already does it, let me try to wright a simple example...
...Yes it does.
; NOTE: Assertions have been autogenerated by utils/update_test_checks.py ; RUN: opt -indvars -S < %s | FileCheck %s ; RUN: opt -passes=indvars -S < %s | FileCheck %s declare void @never_called() declare void @will_be_called() define void @test_01(i32 %a, i32 %b) { ; CHECK-LABEL: @test_01( ; CHECK-NEXT: entry: ; CHECK-NEXT: [[GUARD_COND:%.*]] = icmp ugt i32 [[A:%.*]], [[B:%.*]] ; CHECK-NEXT: br i1 [[GUARD_COND]], label [[LOOP_PREHEADER:%.*]], label [[FAILURE:%.*]] ; CHECK: loop.preheader: ; CHECK-NEXT: br label [[LOOP:%.*]] ; CHECK: loop: ; CHECK-NEXT: [[IV:%.*]] = phi i32 [ [[IV_NEXT:%.*]], [[BACKEDGE:%.*]] ], [ 0, [[LOOP_PREHEADER]] ] ; CHECK-NEXT: br i1 false, label [[NEVER_CALLED:%.*]], label [[BACKEDGE]] ; CHECK: backedge: ; CHECK-NEXT: [[IV_NEXT]] = add i32 [[IV]], 1 ; CHECK-NEXT: [[COND_2:%.*]] = icmp eq i32 [[IV]], [[B]] ; CHECK-NEXT: br i1 [[COND_2]], label [[WILL_BE_CALLED:%.*]], label [[LOOP]] ; CHECK: never_called: ; CHECK-NEXT: call void @never_called() ; CHECK-NEXT: ret void ; CHECK: will_be_called: ; CHECK-NEXT: call void @will_be_called() ; CHECK-NEXT: ret void ; CHECK: failure: ; CHECK-NEXT: ret void ; entry: %guard_cond = icmp ugt i32 %a, %b br i1 %guard_cond, label %loop, label %failure loop: %iv = phi i32 [0, %entry], [%iv.next, %backedge] %cond_1 = icmp eq i32 %iv, %a br i1 %cond_1, label %never_called, label %backedge backedge: %iv.next = add i32 %iv, 1 %cond_2 = icmp eq i32 %iv, %b br i1 %cond_2, label %will_be_called, label %loop never_called: call void @never_called() ret void will_be_called: call void @will_be_called() ret void failure: ret void }
I have reverted this patch for a couple of reasons:
- The commit message was incorrect. The change did not actually break the backedge.
- There doesn't seem to have been any discussion in the review of why folding the exits to taken in provably dead code was the correct answer. In particular, why not fold to poison? Or just leave it to loop deletion? Motivation is important.
These comments should be easy to address. I reverted mostly because of the functional issue in the split off patch, and decided to revert both to be safe.
@reames I don't quite get the 2nd part of the revert motivation. Leaving it to loop deleiton is cool if there will be loop deletion. Branching out of loop looks better because SimplifyCFG can deal with it. Why would anything be better than leaving the loop (and potentially breaking it for further passes)? Any examples of this?
Max,
Sorry for delayed response, I apparently forgot to reply to this before leaving for vacation. My apologizes.
My second point is really two questions:
- Why do we need to fold these branches to constants at all? We've already proven that a dominating exit was taken, and thus the code being modified in this change is provably unreachable. I'd expect existing passes (SimplifyCFG, LoopDeletion, etc..) to handle the current form without issue. Do you have a case where this change actually effects the output after say simplify-cfg? I'd not expect that, and it almost seems like you might be covering up a missing transform elsewhere.
- Given we're dealing with unreachable code, why fold to untaken? Why not be more aggressive about identifying the explicit UB? Replacing terminators with "unreachable" would require non-trivial analysis update, so I get not doing that. But why not replace the condition with an explicit "poison"? That would be trivially immediate UB, and likely "clearer" for simplify-cfg and friends.
p.s. The title and description of this review still needs updated.
{ } not needed