Download Raw Diff

Details

Reviewers

mkazantsev
lebedev.ri

Commits

rGd9ca444835e6: [IndVars] Break backedge and replace PHIs if loop exits on 1st iteration

Summary

Implement TODO in optimizeLoopExits. Now if we have proved that some loop exit is taken on 1st iteration, we make all branches in the following exiting blocks always branch to the loop and also we replace all loop header PHI nodes with the values from the loop preheader (because the backedge is never taken and the loop is in the Loop Simplify Form) and simplify their uses.

Diff Detail

Repository: rG LLVM Github Monorepo

Unit TestsFailed

	Time	Test
	130 ms	x64 debian > LLVM.Transforms/IndVarSimplify::eliminate-comparison.ll
	60 ms	x64 debian > LLVM.Transforms/IndVarSimplify::eliminate-exit-no-dl.ll
	90 ms	x64 debian > LLVM.Transforms/IndVarSimplify::floating-point-iv.ll
	120 ms	x64 debian > LLVM.Transforms/IndVarSimplify/X86::eliminate-trunc.ll
	100 ms	x64 windows > LLVM.Transforms/IndVarSimplify::eliminate-comparison.ll
		View Full Test Results (8 Failed)

Event Timeline

dmakogon created this revision.Aug 30 2021, 2:28 AM

Herald added a subscriber: hiraditya. · View Herald TranscriptAug 30 2021, 2:28 AM

dmakogon requested review of this revision.Aug 30 2021, 2:28 AM

Herald added a project: Restricted Project. · View Herald TranscriptAug 30 2021, 2:28 AM

Herald added a subscriber: llvm-commits. · View Herald Transcript

Fine by me with nits, but please get someone else's approval.

llvm/lib/Transforms/Scalar/IndVarSimplify.cpp
1320	{ } not needed
llvm/test/Transforms/IndVarSimplify/eliminate-backedge.ll
3	Please add another run commant with `-passes=indvars` (new PM).
66	Can you pls make one of the exits leave the loop by `true` condition and stay by `false` condition, just to see how the branch get folded in this case?

Harbormaster completed remote builds in B121714: Diff 369404.Aug 30 2021, 3:01 AM

Resolved style issues and updated test to have an exiting branch on true condition.
Now we mark all exits after the one taken on the 1st iteration as not live, so that they never branch to exit.

dmakogon marked 3 inline comments as done.Aug 31 2021, 12:19 AM

Harbormaster completed remote builds in B121889: Diff 369640.Aug 31 2021, 12:47 AM

dmakogon planned changes to this revision.Aug 31 2021, 1:44 AM

Try to simplify PHI uses after substituting the preheader value.
Now if proved that some exit is taken on 1st iteration we replace all further branches' conditions with the ones branching to the loop (instead of their exits).

Harbormaster completed remote builds in B122222: Diff 370138.Sep 1 2021, 7:33 PM

Fix style issues

dmakogon edited the summary of this revision. (Show Details)Sep 1 2021, 7:59 PM

Harbormaster completed remote builds in B122228: Diff 370143.Sep 1 2021, 8:21 PM

dmakogon updated this revision to Diff 370160.Sep 1 2021, 10:12 PM

Harbormaster completed remote builds in B122239: Diff 370160.Sep 1 2021, 10:40 PM

Apply clang-format

Harbormaster completed remote builds in B122265: Diff 370205.Sep 2 2021, 3:16 AM

mkazantsev added inline comments.Sep 2 2021, 3:19 AM

llvm/lib/Transforms/Scalar/IndVarSimplify.cpp
1325	Not sure about this. Potentially expensive compile time wise, and not clear what consequences changing the content of other loops has. Might be reasonable for enabling more other transforms, though. Let's split it off and think through.

Now we do not simplify replaced PHI nodes leaving it for further passes

Harbormaster completed remote builds in B122275: Diff 370222.Sep 2 2021, 3:55 AM

Please can you precommit the test regeneration and the new tests?

Dima doen't have access yet; I will check in the test now. @dmakogon please rebase after it's merged.

mkazantsev mentioned this in rG0f80961e8c72: [Test] Missed opt test for D108910.Sep 2 2021, 10:46 PM

New test file was merged with another patch, so rebased this one and updated test checks.

Harbormaster completed remote builds in B122720: Diff 370847.Sep 5 2021, 11:14 PM

Fine by me. Roman?

I think there are two patches here - ExitsOnFirstIter, and replaceLoopPHINodesWithPreheaderValues() change.
I think it may be nice to split the patch into two, with replaceLoopPHINodesWithPreheaderValues() being the first one.

I think there's generality that may be missing here.
We know which exit exits on the first iteration, which means that all the following exits are not reached.
But does that tell us anything about the preceding exits?
I'm roughly thinking about this situation: https://godbolt.org/z/KdKb3jzqW (ignore that it is optimized already)

Also, much like with that simplifycfg patch, should we be emitting assumptions here?

All that being said, seems reasonable.

llvm/test/Transforms/IndVarSimplify/floating-point-iv.ll
345 ↗	(On Diff #370847)	Please regenerate the checklines in the affected test files before committing the actual patch.

This revision is now accepted and ready to land.Sep 7 2021, 1:45 AM

dmakogon mentioned this in D109596: [IndVars] Replace PHIs if loop exits on 1st iteration.Sep 10 2021, 4:53 AM

In D108910#2986030, @lebedev.ri wrote:

I think there are two patches here - ExitsOnFirstIter, and replaceLoopPHINodesWithPreheaderValues() change.
I think it may be nice to split the patch into two, with replaceLoopPHINodesWithPreheaderValues() being the first one.

Agreed. Dima, please split them up and I'll merge them.

In D108910#2986030, @lebedev.ri wrote:

I think there's generality that may be missing here.
We know which exit exits on the first iteration, which means that all the following exits are not reached.
But does that tell us anything about the preceding exits?

AFAIK getExitCount returns exact exit count for a given block *under assumption that this exit will be taken*. It doesn't account for preceeding checks. In general case, we can do the following opt. Given 2 exits A and B, exact counts known for both, A dominates B and exact exit count of A is strictly greater than those for B. In this case we can remove exit A as never taken. Not sure if IndVars already does it, let me try to wright a simple example...

BTW, every preceeding check with strictly positive exit cound can be removed.

mkazantsev accepted this revision.Sep 12 2021, 8:32 PM

mkazantsev mentioned this in rG5a6dfb27ca74: [IndVars] Replace PHIs if loop exits on 1st iteration.Sep 12 2021, 8:50 PM

Closed by commit rGd9ca444835e6: [IndVars] Break backedge and replace PHIs if loop exits on 1st iteration (authored by mkazantsev). · Explain WhySep 12 2021, 9:32 PM

This revision was automatically updated to reflect the committed changes.

mkazantsev added a commit: rGd9ca444835e6: [IndVars] Break backedge and replace PHIs if loop exits on 1st iteration.

! In D108910#2996649, @mkazantsev wrote:
Not sure if IndVars already does it, let me try to wright a simple example...

...Yes it does.

; NOTE: Assertions have been autogenerated by utils/update_test_checks.py
; RUN: opt -indvars -S < %s | FileCheck %s
; RUN: opt -passes=indvars -S < %s | FileCheck %s

declare void @never_called()
declare void @will_be_called()

define void @test_01(i32 %a, i32 %b) {
; CHECK-LABEL: @test_01(
; CHECK-NEXT:  entry:
; CHECK-NEXT:    [[GUARD_COND:%.*]] = icmp ugt i32 [[A:%.*]], [[B:%.*]]
; CHECK-NEXT:    br i1 [[GUARD_COND]], label [[LOOP_PREHEADER:%.*]], label [[FAILURE:%.*]]
; CHECK:       loop.preheader:
; CHECK-NEXT:    br label [[LOOP:%.*]]
; CHECK:       loop:
; CHECK-NEXT:    [[IV:%.*]] = phi i32 [ [[IV_NEXT:%.*]], [[BACKEDGE:%.*]] ], [ 0, [[LOOP_PREHEADER]] ]
; CHECK-NEXT:    br i1 false, label [[NEVER_CALLED:%.*]], label [[BACKEDGE]]
; CHECK:       backedge:
; CHECK-NEXT:    [[IV_NEXT]] = add i32 [[IV]], 1
; CHECK-NEXT:    [[COND_2:%.*]] = icmp eq i32 [[IV]], [[B]]
; CHECK-NEXT:    br i1 [[COND_2]], label [[WILL_BE_CALLED:%.*]], label [[LOOP]]
; CHECK:       never_called:
; CHECK-NEXT:    call void @never_called()
; CHECK-NEXT:    ret void
; CHECK:       will_be_called:
; CHECK-NEXT:    call void @will_be_called()
; CHECK-NEXT:    ret void
; CHECK:       failure:
; CHECK-NEXT:    ret void
;
entry:
  %guard_cond = icmp ugt i32 %a, %b
  br i1 %guard_cond, label %loop, label %failure

loop:
  %iv = phi i32 [0, %entry], [%iv.next, %backedge]
  %cond_1 = icmp eq i32 %iv, %a
  br i1 %cond_1, label %never_called, label %backedge

backedge:
  %iv.next = add i32 %iv, 1
  %cond_2 = icmp eq i32 %iv, %b
  br i1 %cond_2, label %will_be_called, label %loop

never_called:
  call void @never_called()
  ret void

will_be_called:
  call void @will_be_called()
  ret void

failure:
  ret void
}

reames added a reverting change: rG5746c76f3fc9: Revert "[IndVars] Break backedge and replace PHIs if loop exits on 1st….Sep 13 2021, 10:12 AM

I have reverted this patch for a couple of reasons:

The commit message was incorrect. The change did not actually break the backedge.
There doesn't seem to have been any discussion in the review of why folding the exits to taken in provably dead code was the correct answer. In particular, why not fold to poison? Or just leave it to loop deletion? Motivation is important.

These comments should be easy to address. I reverted mostly because of the functional issue in the split off patch, and decided to revert both to be safe.

@reames I don't quite get the 2nd part of the revert motivation. Leaving it to loop deleiton is cool if there will be loop deletion. Branching out of loop looks better because SimplifyCFG can deal with it. Why would anything be better than leaving the loop (and potentially breaking it for further passes)? Any examples of this?

In D108910#3005884, @mkazantsev wrote:

@reames I don't quite get the 2nd part of the revert motivation. Leaving it to loop deleiton is cool if there will be loop deletion. Branching out of loop looks better because SimplifyCFG can deal with it. Why would anything be better than leaving the loop (and potentially breaking it for further passes)? Any examples of this?

Max,

Sorry for delayed response, I apparently forgot to reply to this before leaving for vacation. My apologizes.

My second point is really two questions:

Why do we need to fold these branches to constants at all? We've already proven that a dominating exit was taken, and thus the code being modified in this change is provably unreachable. I'd expect existing passes (SimplifyCFG, LoopDeletion, etc..) to handle the current form without issue. Do you have a case where this change actually effects the output after say simplify-cfg? I'd not expect that, and it almost seems like you might be covering up a missing transform elsewhere.
Given we're dealing with unreachable code, why fold to untaken? Why not be more aggressive about identifying the explicit UB? Replacing terminators with "unreachable" would require non-trivial analysis update, so I get not doing that. But why not replace the condition with an explicit "poison"? That would be trivially immediate UB, and likely "clearer" for simplify-cfg and friends.

p.s. The title and description of this review still needs updated.

Diff 369640

llvm/lib/Transforms/Scalar/IndVarSimplify.cpp

Show First 20 Lines • Show All 1,303 Lines • ▼ Show 20 Lines	static void foldExit(const Loop L, BasicBlock ExitingBB, bool IsTaken,
BranchInst *BI = cast<BranchInst>(ExitingBB->getTerminator());		BranchInst *BI = cast<BranchInst>(ExitingBB->getTerminator());
bool ExitIfTrue = !L->contains(*succ_begin(ExitingBB));		bool ExitIfTrue = !L->contains(*succ_begin(ExitingBB));
auto *OldCond = BI->getCondition();		auto *OldCond = BI->getCondition();
auto *NewCond =		auto *NewCond =
ConstantInt::get(OldCond->getType(), IsTaken ? ExitIfTrue : !ExitIfTrue);		ConstantInt::get(OldCond->getType(), IsTaken ? ExitIfTrue : !ExitIfTrue);
replaceExitCond(BI, NewCond, DeadInsts);		replaceExitCond(BI, NewCond, DeadInsts);
}		}

		static void replaceLoopPHINodesWithPreheaderValues(Loop *L) {
		auto *LoopPreheader = L->getLoopPreheader();
		auto *LoopHeader = L->getHeader();
		SmallVector<PHINode *, 4> LoopPHINodes;
		for (auto &PN : LoopHeader->phis()) {
		PN.replaceAllUsesWith(PN.getIncomingValueForBlock(LoopPreheader));
		LoopPHINodes.push_back(&PN);
		}
		for (auto *PN : LoopPHINodes)
		mkazantsevUnsubmitted Done Reply Inline Actions { } not needed mkazantsev: { } not needed
		PN->eraseFromParent();
		}

static void replaceWithInvariantCond(		static void replaceWithInvariantCond(
const Loop L, BasicBlock ExitingBB, ICmpInst::Predicate InvariantPred,		const Loop L, BasicBlock ExitingBB, ICmpInst::Predicate InvariantPred,
		mkazantsevUnsubmitted Not Done Reply Inline Actions Not sure about this. Potentially expensive compile time wise, and not clear what consequences changing the content of other loops has. Might be reasonable for enabling more other transforms, though. Let's split it off and think through. mkazantsev: Not sure about this. Potentially expensive compile time wise, and not clear what consequences…
const SCEV InvariantLHS, const SCEV InvariantRHS, SCEVExpander &Rewriter,		const SCEV InvariantLHS, const SCEV InvariantRHS, SCEVExpander &Rewriter,
SmallVectorImpl<WeakTrackingVH> &DeadInsts) {		SmallVectorImpl<WeakTrackingVH> &DeadInsts) {
BranchInst *BI = cast<BranchInst>(ExitingBB->getTerminator());		BranchInst *BI = cast<BranchInst>(ExitingBB->getTerminator());
Rewriter.setInsertPoint(BI);		Rewriter.setInsertPoint(BI);
auto *LHSV = Rewriter.expandCodeFor(InvariantLHS);		auto *LHSV = Rewriter.expandCodeFor(InvariantLHS);
auto *RHSV = Rewriter.expandCodeFor(InvariantRHS);		auto *RHSV = Rewriter.expandCodeFor(InvariantRHS);
bool ExitIfTrue = !L->contains(*succ_begin(ExitingBB));		bool ExitIfTrue = !L->contains(*succ_begin(ExitingBB));
if (ExitIfTrue)		if (ExitIfTrue)
▲ Show 20 Lines • Show All 127 Lines • ▼ Show 20 Lines
#ifdef ASSERT		#ifdef ASSERT
for (unsigned i = 1; i < ExitingBlocks.size(); i++) {		for (unsigned i = 1; i < ExitingBlocks.size(); i++) {
assert(DT->dominates(ExitingBlocks[i-1], ExitingBlocks[i]));		assert(DT->dominates(ExitingBlocks[i-1], ExitingBlocks[i]));
}		}
#endif		#endif

bool Changed = false;		bool Changed = false;
bool SkipLastIter = false;		bool SkipLastIter = false;
		bool ExitsOnFirstIter = false;
SmallSet<const SCEV*, 8> DominatingExitCounts;		SmallSet<const SCEV *, 8> DominatingExitCounts;
for (BasicBlock *ExitingBB : ExitingBlocks) {		for (BasicBlock *ExitingBB : ExitingBlocks) {
		if (ExitsOnFirstIter) {
		// If proved that some earlier exit is taken
		// on 1st iteration, then fold this one.
		foldExit(L, ExitingBB, false, DeadInsts);
		continue;
		}

const SCEV *ExitCount = SE->getExitCount(L, ExitingBB);		const SCEV *ExitCount = SE->getExitCount(L, ExitingBB);
if (isa<SCEVCouldNotCompute>(ExitCount)) {		if (isa<SCEVCouldNotCompute>(ExitCount)) {
// Okay, we do not know the exit count here. Can we at least prove that it		// Okay, we do not know the exit count here. Can we at least prove that it
// will remain the same within iteration space?		// will remain the same within iteration space?
auto *BI = cast<BranchInst>(ExitingBB->getTerminator());		auto *BI = cast<BranchInst>(ExitingBB->getTerminator());
auto OptimizeCond = [&](bool Inverted, bool SkipLastIter) {		auto OptimizeCond = [&](bool Inverted, bool SkipLastIter) {
return optimizeLoopExitWithUnknownExitCount(		return optimizeLoopExitWithUnknownExitCount(
L, BI, ExitingBB, MaxExitCount, Inverted, SkipLastIter, SE,		L, BI, ExitingBB, MaxExitCount, Inverted, SkipLastIter, SE,
Show All 27 Lines	for (BasicBlock *ExitingBB : ExitingBlocks) {
if (MaxExitCount == ExitCount)		if (MaxExitCount == ExitCount)
// If the loop has more than 1 iteration, all further checks will be		// If the loop has more than 1 iteration, all further checks will be
// executed 1 iteration less.		// executed 1 iteration less.
SkipLastIter = true;		SkipLastIter = true;

// If we know we'd exit on the first iteration, rewrite the exit to		// If we know we'd exit on the first iteration, rewrite the exit to
// reflect this. This does not imply the loop must exit through this		// reflect this. This does not imply the loop must exit through this
// exit; there may be an earlier one taken on the first iteration.		// exit; there may be an earlier one taken on the first iteration.
// TODO: Given we know the backedge can't be taken, we should go ahead		// We know the backedge can't be taken, so we go ahead and break all
// and break it. Or at least, kill all the header phis and simplify.		// branches in all further exiting blocks. Also we kill all
		// header phis and simplify.
if (ExitCount->isZero()) {		if (ExitCount->isZero()) {
foldExit(L, ExitingBB, true, DeadInsts);		foldExit(L, ExitingBB, true, DeadInsts);
		replaceLoopPHINodesWithPreheaderValues(L);
Changed = true;		Changed = true;
		ExitsOnFirstIter = true;
continue;		continue;
}		}

assert(ExitCount->getType()->isIntegerTy() &&		assert(ExitCount->getType()->isIntegerTy() &&
MaxExitCount->getType()->isIntegerTy() &&		MaxExitCount->getType()->isIntegerTy() &&
"Exit counts must be integers");		"Exit counts must be integers");

Type *WiderType =		Type *WiderType =
▲ Show 20 Lines • Show All 465 Lines • Show Last 20 Lines

llvm/test/Transforms/IndVarSimplify/eliminate-backedge.ll

This file was added.

				; NOTE: Assertions have been autogenerated by utils/update_test_checks.py UTC_ARGS: -p
				; RUN: opt < %s -indvars -S \| FileCheck %s
				; RUN: opt < %s -passes=indvars -S \| FileCheck %s
				mkazantsevUnsubmitted Done Reply Inline Actions Please add another run commant with `-passes=indvars` (new PM). mkazantsev: Please add another run commant with `-passes=indvars` (new PM).
				target datalayout = "e-p:64:64:64-i1:8:8-i8:8:8-i16:16:16-i32:32:32-i64:64:64-f32:32:32-f64:64:64-v64:64:64-v128:128:128-a0:0:64-s0:64:64-f80:128:128-n8:16:32:64"

				declare i1 @foo(i8, i8)
				declare i1 @bar()
				declare i1 @baz()

				define i1 @kill_backedge_and_phis(i8* align 1 %lhs, i8* align 1 %rhs, i32 %len) {
				; CHECK-LABEL: @kill_backedge_and_phis(
				; CHECK-NEXT: entry:
				; CHECK-NEXT: %length_not_zero = icmp ne i32 %len, 0
				; CHECK-NEXT: br i1 %length_not_zero, label %loop_preheader, label %exit
				; CHECK: loop_preheader:
				; CHECK-NEXT: br label %loop
				; CHECK: loop:
				; CHECK-NEXT: %iv.wide.next = add nuw nsw i64 0, 1
				; CHECK-NEXT: %left_ptr = getelementptr inbounds i8, i8* %lhs, i32 0
				; CHECK-NEXT: %right_ptr = getelementptr inbounds i8, i8* %rhs, i32 0
				; CHECK-NEXT: %result = call i1 @foo(i8* %left_ptr, i8* %right_ptr)
				; CHECK-NEXT: br i1 %result, label %exiting_1, label %exit.loopexit
				; CHECK: exiting_1:
				; CHECK-NEXT: %iv.wide.is_not_zero = icmp ne i64 0, 0
				; CHECK-NEXT: br i1 false, label %exiting_2, label %exit.loopexit
				; CHECK: exiting_2:
				; CHECK-NEXT: %bar_ret = call i1 @bar()
				; CHECK-NEXT: br i1 false, label %exit.loopexit, label %exiting_3
				; CHECK: exiting_3:
				; CHECK-NEXT: %baz_ret = call i1 @baz()
				; CHECK-NEXT: br i1 true, label %loop, label %exit.loopexit
				; CHECK: exit.loopexit:
				; CHECK-NEXT: %val.ph = phi i1 [ %baz_ret, %exiting_3 ], [ %bar_ret, %exiting_2 ], [ %iv.wide.is_not_zero, %exiting_1 ], [ %result, %loop ]
				; CHECK-NEXT: br label %exit
				; CHECK: exit:
				; CHECK-NEXT: %val = phi i1 [ false, %entry ], [ %val.ph, %exit.loopexit ]
				; CHECK-NEXT: ret i1 %val
				;
				entry:
				%length_not_zero = icmp ne i32 %len, 0
				br i1 %length_not_zero, label %loop_preheader, label %exit

				loop_preheader:
				br label %loop

				loop:
				%iv = phi i32 [ 0, %loop_preheader ], [ %iv.next, %latch ]
				%iv.wide = phi i64 [ 0, %loop_preheader ], [ %iv.wide.next, %latch ]
				%iv.next = add i32 %iv, 1
				%iv.wide.next = add i64 %iv.wide, 1
				%left_ptr = getelementptr inbounds i8, i8* %lhs, i32 %iv
				%right_ptr = getelementptr inbounds i8, i8* %rhs, i32 %iv
				%result = call i1 @foo(i8* %left_ptr, i8* %right_ptr)
				br i1 %result, label %exiting_1, label %exit

				exiting_1:
				%iv.wide.is_not_zero = icmp ne i64 %iv.wide, 0
				br i1 %iv.wide.is_not_zero, label %exiting_2, label %exit

				exiting_2:
				%bar_ret = call i1 @bar()
				br i1 %bar_ret, label %exit, label %exiting_3

				exiting_3:
				%baz_ret = call i1 @baz()
				br i1 %baz_ret, label %latch, label %exit
				mkazantsevUnsubmitted Done Reply Inline Actions Can you pls make one of the exits leave the loop by `true` condition and stay by `false` condition, just to see how the branch get folded in this case? mkazantsev: Can you pls make one of the exits leave the loop by `true` condition and stay by `false`…

				latch:
				%continue = icmp ne i32 %iv.next, %len
				br i1 %continue, label %loop, label %exit

				exit:
				%val = phi i1 [ %result, %loop ], [ %iv.wide.is_not_zero, %exiting_1 ],
				[ %bar_ret, %exiting_2 ], [ %baz_ret, %exiting_3 ],
				[ %baz_ret, %latch ], [ 0, %entry ]
				ret i1 %val
				}

This is an archive of the discontinued LLVM Phabricator instance.

[IndVars] Break backedge and replace PHIs if loop exits on 1st iteration
ClosedPublic

Details

Diff Detail

Unit TestsFailed

Event Timeline

Revision Contents

Diff 369640

llvm/lib/Transforms/Scalar/IndVarSimplify.cpp

llvm/test/Transforms/IndVarSimplify/eliminate-backedge.ll

This is an archive of the discontinued LLVM Phabricator instance.

[IndVars] Break backedge and replace PHIs if loop exits on 1st iterationClosedPublic

Details

Diff Detail

Unit TestsFailed

Event Timeline

Revision Contents

Diff 369640

llvm/lib/Transforms/Scalar/IndVarSimplify.cpp

llvm/test/Transforms/IndVarSimplify/eliminate-backedge.ll

[IndVars] Break backedge and replace PHIs if loop exits on 1st iteration
ClosedPublic