This is an archive of the discontinued LLVM Phabricator instance.

IndVarSimplify: Don't let LFTR compare against a poison value
ClosedPublic

Authored by majnemer on Sep 3 2014, 10:02 AM.

Download Raw Diff

Details

Reviewers

chandlerc
nicholas
atrick
nlewycky
hfinkel

Commits

rG37c07a6294a2: Merging rr217102: -------------------------------------------------------------…
rGc6ab01eccae0: IndVarSimplify: Don't let LFTR compare against a poison value
rL217102: IndVarSimplify: Don't let LFTR compare against a poison value

Summary

LinearFunctionTestReplace tries to use the *next* indvar to compare
against when possible. However, it may be the case that the calculation
for the next indvar has NUW/NSW flags and that it may only be safely
used inside the loop. Using it in a comparison to calculate the exit
condition could result in observing poison.

This fixes PR20680.

Diff Detail

Repository: rL LLVM

Event Timeline

majnemer updated this revision to Diff 13216.Sep 3 2014, 10:02 AM

majnemer retitled this revision from to IndVarSimplify: Don't let LFTR compare against a poison value.

majnemer updated this object.

majnemer added reviewers: chandlerc, nicholas, nlewycky, atrick.

majnemer added a subscriber: Unknown Object (MLST).

hfinkel added a subscriber: hfinkel.Sep 3 2014, 10:47 AM

hfinkel added inline comments.

lib/Transforms/Scalar/IndVarSimplify.cpp
1652 ↗	(On Diff #13216)	I'm not 100% sure this is right. I think that IncrementedIndvarSCEV->getNoWrapFlags() will return only the deduced flags (the ones that SCEV can independently prove), and that could be different from the set of flags on the instruction itself. It might be better to test the flags on IncrementedIndvar itself (but, then again, I could just be wrong).

We talked about this on IRC; taking the flags from SCEV seems reasonable because SCEV is the one that generated the instruction in the first place (although a comment should be added to that effect).

LGTM.

This revision is now accepted and ready to land.Sep 3 2014, 3:37 PM

Closed by commit rL217102 (authored by @majnemer).

I'll say LGTM because fixing a bug is an improvement. This is unfortunate though because LFTR does not care about overflow and we don't want to pessimize code just because SCEV was able to infer NSW. The problem is that we can't express that NSW applies to some uses an not others.

The fix is also not totally robust because SCEV could, in theory, drop NSW flags even though they exist in IR. The induction variable was not necessarily produced by SCEVExpander. I think making it totally robust would involve getting a new expression for CmpIndVar by applying non-NSW AddExpr's (similar to genLoopLimit). That is of course much more complexity, hence risk, so maybe best as a FIXME.

Out of curiosity, can you explain how we came to generate bad code for this case (so I don't have to debug the test case)? Was there an optimization after LFTR that assumed NSW at the compare? Do we end up widening the loop test and failing to exit?

FWIW: The check against FlagAnyWrap was a little confusing to me, as opposed to simply:
if (maskFlags(Expr->getNoWrapFlags(), SCEV::NSW | SCEV::NUW))

In D5174#13, @atrick wrote:

I'll say LGTM because fixing a bug is an improvement. This is unfortunate though because LFTR does not care about overflow and we don't want to pessimize code just because SCEV was able to infer NSW. The problem is that we can't express that NSW applies to some uses an not others.

I agree, but using the induction variable seemed better than introducing another addition; I don't know of any machinery in LLVM that will merge a NSW "add" and a normal "add" into a single, normal, add.

The fix is also not totally robust because SCEV could, in theory, drop NSW flags even though they exist in IR. The induction variable was not necessarily produced by SCEVExpander. I think making it totally robust would involve getting a new expression for CmpIndVar by applying non-NSW AddExpr's (similar to genLoopLimit). That is of course much more complexity, hence risk, so maybe best as a FIXME.

I'll add a FIXME.

Out of curiosity, can you explain how we came to generate bad code for this case (so I don't have to debug the test case)? Was there an optimization after LFTR that assumed NSW at the compare? Do we end up widening the loop test and failing to exit?

I think we inferred that the rewritten exit condition was unreachable because of the NUW/NSW behavior of the SCEV.

Before my patch, we had BBs that looked like this:

for.inc13.i:
  %indvars.iv.next.i = add nuw nsw i32 %indvars.iv.i, 1
  br i1 true, label %for.cond2.preheader.i, label %fn1.exit.us-lcssa.us-lcssa

After my patch, the same BB looks like this:

for.inc13.1:
  %indvars.iv.next.i = add nuw nsw i32 %indvars.iv.i, 1
  br i1 %exitcond19.i, label %for.cond2.preheader.i, label %fn1.exit.us-lcssa.us-lcssa

FWIW: The check against FlagAnyWrap was a little confusing to me, as opposed to simply:
if (maskFlags(Expr->getNoWrapFlags(), SCEV::NSW | SCEV::NUW))

I'll clean this up as well.

Review comments addressed in rL217115.

Hi all,

We've been seeing some performance regressions in our internal AArch64 tests due to this patch. I'll preface this by saying that I'm not an expert in this matter at all so take everything I say with a grain of salt :)

I've reduced the test case (this one is from LNT) to what's attached (

test.c897 BDownload

). Cleaned up, the IR looks something like this:

for.cond1.preheader:                              ; preds = %middle.block, %entry                                                                           
  %indvars.iv17 = phi i64 [ 0, %entry ], [ %indvars.iv.next18, %middle.block ]                                      
  %indvars.iv.next18 = add nuw nsw i64 %indvars.iv17, 1                                                             
  ...                                                                                                                                                       
                                                                                                      
middle.block:                                     ; preds = %vector.body                                                                                            
  %exitcond19 = icmp eq i64 %indvars.iv17, 255                                                                      
  br i1 %exitcond19, label %for.end10, label %for.cond1.preheader

Using -target aarch64-linux-gnu -mcpu=cortex-a57 -O3, this compiles to the following assembly, also shortened:

.LBB0_1:                                // %for.cond1.preheader                                                     
                                        // =>This Loop Header: Depth=1                                              
                                        //     Child Loop BB0_2 Depth 2                                             
        add     x11, x10, #1            // =1
        ...
// BB#3:                                // %middle.block                                                            
                                        //   in Loop: Header=BB0_1 Depth=1                                                                                           
        cmp      x10, #255              // =255                                                                     
        mov      x10, x11                                                                                           
        b.ne    .LBB0_1

It seems like the exact problem the patch addresses occurs: since the calculation for the next indvar has NUW/NSW flags set, it can't be used to check the exit condition. This leads to an extra register being used for the induction variable.

In D5174#13, @atrick wrote:

I'll say LGTM because fixing a bug is an improvement. This is unfortunate though because LFTR does not care about overflow and we don't want to pessimize code just because SCEV was able to infer NSW. The problem is that we can't express that NSW applies to some uses an not others.

Am I right in thinking that the behaviour in this specific test case is too conservative as a result of the patch? (An i64 won't overflow, since we know the trip count is only 256.)

If yes, do you have any suggestions on how we could fix this? I'm assuming this might also affect other targets, at least similar ones such as ARM/AArch32 or Thumb. Any input would be welcome.

Cheers
Moritz

kongyi added a subscriber: kongyi.Sep 5 2014, 8:46 AM

What you've done is the right idea. We want to allow post-increment comparison whenever we can prove no overflow on the increment. Handling the constant case as you've done seems safe, albeit messy.

A more general way to check overflow is something like this (I haven't tested this in your case):

unsigned BitWidth = SE->getTypeSizeInBits(BECount->getType());
// BitWidt+1 should actually be sufficient
Type *WideTy = IntegerType::get(SE->getContext(), BitWidth * 2);
IVCount = SE->getAddExpr(BackedgeTakenCount,
                         SE->getConstant(BackedgeTakenCount->getType(), 1));
IVCountWide = SE->getAddExpr(
                SE->getSignExtendedExpr(BackedgeTakenCount, WideTy),
                SE->getConstant(WideTy, 1));
if (getSignExtendExpr(IVCount, WideTy) == IVCountWide)
  // no signed overflow

You may want to give this a try and make sure it's still safe in the problem case.

Revision Contents

Path

Size

llvm/

trunk/

lib/

Transforms/

Scalar/

IndVarSimplify.cpp

24 lines

test/

Transforms/

IndVarSimplify/

2011-10-27-lftrnull.ll

2 lines

lftr-address-space-pointers.ll

4 lines

lftr-extend-const.ll

4 lines

lftr-reuse.ll

24 lines

pr20680.ll

219 lines

Diff 13233

llvm/trunk/lib/Transforms/Scalar/IndVarSimplify.cpp

Show First 20 Lines • Show All 1,635 Lines • ▼ Show 20 Lines	LinearFunctionTestReplace(Loop *L,
// Initialize CmpIndVar and IVCount to their preincremented values.		// Initialize CmpIndVar and IVCount to their preincremented values.
Value *CmpIndVar = IndVar;		Value *CmpIndVar = IndVar;
const SCEV *IVCount = BackedgeTakenCount;		const SCEV *IVCount = BackedgeTakenCount;

// If the exiting block is the same as the backedge block, we prefer to		// If the exiting block is the same as the backedge block, we prefer to
// compare against the post-incremented value, otherwise we must compare		// compare against the post-incremented value, otherwise we must compare
// against the preincremented value.		// against the preincremented value.
if (L->getExitingBlock() == L->getLoopLatch()) {		if (L->getExitingBlock() == L->getLoopLatch()) {
		// The BackedgeTaken expression contains the number of times that the
		// backedge branches to the loop header. This is one less than the
		// number of times the loop executes, so use the incremented indvar.
		llvm::Value *IncrementedIndvar = IndVar->getIncomingValueForBlock(L->getExitingBlock());
		const auto *IncrementedIndvarSCEV =
		cast<SCEVAddRecExpr>(SE->getSCEV(IncrementedIndvar));
		// It is unsafe to use the incremented indvar if it has a wrapping flag, we
		// don't want to compare against a poison value. Check the SCEV that
		// corresponds to the incremented indvar, the SCEVExpander will only insert
		// flags in the IR if the SCEV originally had wrapping flags.
		if (ScalarEvolution::maskFlags(IncrementedIndvarSCEV->getNoWrapFlags(),
		SCEV::FlagNUW \| SCEV::FlagNSW) ==
		SCEV::FlagAnyWrap) {
// Add one to the "backedge-taken" count to get the trip count.		// Add one to the "backedge-taken" count to get the trip count.
// This addition may overflow, which is valid as long as the comparison is		// This addition may overflow, which is valid as long as the comparison is
// truncated to BackedgeTakenCount->getType().		// truncated to BackedgeTakenCount->getType().
IVCount = SE->getAddExpr(BackedgeTakenCount,		IVCount =
		SE->getAddExpr(BackedgeTakenCount,
SE->getConstant(BackedgeTakenCount->getType(), 1));		SE->getConstant(BackedgeTakenCount->getType(), 1));
// The BackedgeTaken expression contains the number of times that the		CmpIndVar = IncrementedIndvar;
// backedge branches to the loop header. This is one less than the		}
// number of times the loop executes, so use the incremented indvar.
CmpIndVar = IndVar->getIncomingValueForBlock(L->getExitingBlock());
}		}

Value *ExitCnt = genLoopLimit(IndVar, IVCount, L, Rewriter, SE);		Value *ExitCnt = genLoopLimit(IndVar, IVCount, L, Rewriter, SE);
assert(ExitCnt->getType()->isPointerTy() == IndVar->getType()->isPointerTy()		assert(ExitCnt->getType()->isPointerTy() == IndVar->getType()->isPointerTy()
&& "genLoopLimit missed a cast");		&& "genLoopLimit missed a cast");

// Insert a new icmp_ne or icmp_eq instruction before the branch.		// Insert a new icmp_ne or icmp_eq instruction before the branch.
BranchInst *BI = cast<BranchInst>(L->getExitingBlock()->getTerminator());		BranchInst *BI = cast<BranchInst>(L->getExitingBlock()->getTerminator());
▲ Show 20 Lines • Show All 276 Lines • Show Last 20 Lines

llvm/trunk/test/Transforms/IndVarSimplify/2011-10-27-lftrnull.ll

	; RUN: opt < %s -indvars -S \| FileCheck %s			; RUN: opt < %s -indvars -S \| FileCheck %s
	; rdar://10359193: assert "IndVar type must match IVInit type"			; rdar://10359193: assert "IndVar type must match IVInit type"

	target datalayout = "e-p:32:32:32-i1:8:32-i8:8:32-i16:16:32-i32:32:32-i64:32:64-f32:32:32-f64:32:64-v64:32:64-v128:32:128-a0:0:32-n32-S32"			target datalayout = "e-p:32:32:32-i1:8:32-i8:8:32-i16:16:32-i32:32:32-i64:32:64-f32:32:32-f64:32:64-v64:32:64-v128:32:128-a0:0:32-n32-S32"
	target triple = "thumbv7-apple-darwin"			target triple = "thumbv7-apple-darwin"

	; CHECK-LABEL: @test(			; CHECK-LABEL: @test(
	; CHECK: if.end.i126:			; CHECK: if.end.i126:
	; CHECK: %exitcond = icmp ne i8* %incdec.ptr.i, getelementptr (i8* null, i32 undef)			; CHECK: %exitcond = icmp ne i8* %destYPixelPtr.010.i, getelementptr (i8* null, i32 undef)
	define void @test() nounwind {			define void @test() nounwind {
	entry:			entry:
	br label %while.cond			br label %while.cond

	while.cond:			while.cond:
	br i1 undef, label %while.end, label %while.body			br i1 undef, label %while.end, label %while.body

	while.body: ; preds = %while.cond			while.body: ; preds = %while.cond
	▲ Show 20 Lines • Show All 42 Lines • Show Last 20 Lines

llvm/trunk/test/Transforms/IndVarSimplify/lftr-address-space-pointers.ll

	; RUN: opt -S -indvars -o - %s \| FileCheck %s			; RUN: opt -S -indvars -o - %s \| FileCheck %s
	target datalayout = "e-p:32:32:32-p1:64:64:64-p2:8:8:8-p3:16:16:16-i1:8:8-i8:8:8-i16:16:16-i32:32:32-i64:32:32-n8:16:32:64"			target datalayout = "e-p:32:32:32-p1:64:64:64-p2:8:8:8-p3:16:16:16-i1:8:8-i8:8:8-i16:16:16-i32:32:32-i64:32:32-n8:16:32:64"

	; Derived from ptriv in lftr-reuse.ll			; Derived from ptriv in lftr-reuse.ll
	define void @ptriv_as2(i8 addrspace(2)* %base, i32 %n) nounwind {			define void @ptriv_as2(i8 addrspace(2)* %base, i32 %n) nounwind {
	; CHECK-LABEL: @ptriv_as2(			; CHECK-LABEL: @ptriv_as2(
	entry:			entry:
	%idx.trunc = trunc i32 %n to i8			%idx.trunc = trunc i32 %n to i8
	%add.ptr = getelementptr inbounds i8 addrspace(2)* %base, i8 %idx.trunc			%add.ptr = getelementptr inbounds i8 addrspace(2)* %base, i8 %idx.trunc
	%cmp1 = icmp ult i8 addrspace(2)* %base, %add.ptr			%cmp1 = icmp ult i8 addrspace(2)* %base, %add.ptr
	br i1 %cmp1, label %for.body, label %for.end			br i1 %cmp1, label %for.body, label %for.end

	; Make sure the added GEP has the right index type			; Make sure the added GEP has the right index type
	; CHECK: %lftr.limit = getelementptr i8 addrspace(2)* %base, i8 %0			; CHECK: %lftr.limit = getelementptr i8 addrspace(2)* %base, i8

	; CHECK: for.body:			; CHECK: for.body:
	; CHECK: phi i8 addrspace(2)*			; CHECK: phi i8 addrspace(2)*
	; CHECK-NOT: phi			; CHECK-NOT: phi
	; CHECK-NOT: add{{^rspace}}			; CHECK-NOT: add{{^rspace}}
	; CHECK: icmp ne i8 addrspace(2)*			; CHECK: icmp ne i8 addrspace(2)*
	; CHECK: br i1			; CHECK: br i1
	for.body:			for.body:
	Show All 15 Lines
	; CHECK-LABEL: @ptriv_as3(			; CHECK-LABEL: @ptriv_as3(
	entry:			entry:
	%idx.trunc = trunc i32 %n to i16			%idx.trunc = trunc i32 %n to i16
	%add.ptr = getelementptr inbounds i8 addrspace(3)* %base, i16 %idx.trunc			%add.ptr = getelementptr inbounds i8 addrspace(3)* %base, i16 %idx.trunc
	%cmp1 = icmp ult i8 addrspace(3)* %base, %add.ptr			%cmp1 = icmp ult i8 addrspace(3)* %base, %add.ptr
	br i1 %cmp1, label %for.body, label %for.end			br i1 %cmp1, label %for.body, label %for.end

	; Make sure the added GEP has the right index type			; Make sure the added GEP has the right index type
	; CHECK: %lftr.limit = getelementptr i8 addrspace(3)* %base, i16 %0			; CHECK: %lftr.limit = getelementptr i8 addrspace(3)* %base, i16

	; CHECK: for.body:			; CHECK: for.body:
	; CHECK: phi i8 addrspace(3)*			; CHECK: phi i8 addrspace(3)*
	; CHECK-NOT: phi			; CHECK-NOT: phi
	; CHECK-NOT: add{{^rspace}}			; CHECK-NOT: add{{^rspace}}
	; CHECK: icmp ne i8 addrspace(3)*			; CHECK: icmp ne i8 addrspace(3)*
	; CHECK: br i1			; CHECK: br i1
	for.body:			for.body:
	Show All 15 Lines

llvm/trunk/test/Transforms/IndVarSimplify/lftr-extend-const.ll

	;RUN: opt -S %s -indvars \| FileCheck %s			;RUN: opt -S %s -indvars \| FileCheck %s

	; CHECK-LABEL: @foo(			; CHECK-LABEL: @foo(
	; CHECK-NOT: %lftr.wideiv = trunc i32 %indvars.iv.next to i16			; CHECK-NOT: %lftr.wideiv = trunc i32 %indvars.iv.next to i16
	; CHECK: %exitcond = icmp ne i32 %indvars.iv.next, 512			; CHECK: %exitcond = icmp ne i32 %indvars.iv, 511
	define void @foo() #0 {			define void @foo() #0 {
	entry:			entry:
	br label %for.body			br label %for.body

	for.body: ; preds = %entry, %for.body			for.body: ; preds = %entry, %for.body
	%i.01 = phi i16 [ 0, %entry ], [ %inc, %for.body ]			%i.01 = phi i16 [ 0, %entry ], [ %inc, %for.body ]
	%conv2 = sext i16 %i.01 to i32			%conv2 = sext i16 %i.01 to i32
	call void @bar(i32 %conv2) #1			call void @bar(i32 %conv2) #1
	%inc = add i16 %i.01, 1			%inc = add i16 %i.01, 1
	%cmp = icmp slt i16 %inc, 512			%cmp = icmp slt i16 %inc, 512
	br i1 %cmp, label %for.body, label %for.end			br i1 %cmp, label %for.body, label %for.end

	for.end: ; preds = %for.body			for.end: ; preds = %for.body
	ret void			ret void
	}			}

	; Check that post-incrementing the backedge taken count does not overflow.			; Check that post-incrementing the backedge taken count does not overflow.
	; CHECK-LABEL: @postinc(			; CHECK-LABEL: @postinc(
	; CHECK: icmp eq i32 %indvars.iv.next, 256			; CHECK: icmp eq i32 %indvars.iv, 255
	define i32 @postinc() #0 {			define i32 @postinc() #0 {
	entry:			entry:
	br label %do.body			br label %do.body

	do.body: ; preds = %do.body, %entry			do.body: ; preds = %do.body, %entry
	%first.0 = phi i8 [ 0, %entry ], [ %inc, %do.body ]			%first.0 = phi i8 [ 0, %entry ], [ %inc, %do.body ]
	%conv = zext i8 %first.0 to i32			%conv = zext i8 %first.0 to i32
	call void @bar(i32 %conv) #1			call void @bar(i32 %conv) #1
	Show All 12 Lines

llvm/trunk/test/Transforms/IndVarSimplify/lftr-reuse.ll

	Show First 20 Lines • Show All 76 Lines • ▼ Show 20 Lines
	exit:			exit:
	ret void			ret void
	}			}

	; Force SCEVExpander to look for an existing well-formed phi.			; Force SCEVExpander to look for an existing well-formed phi.
	; Perform LFTR without generating extra preheader code.			; Perform LFTR without generating extra preheader code.
	define void @guardedloop([0 x double]* %matrix, [0 x double]* %vector,			define void @guardedloop([0 x double]* %matrix, [0 x double]* %vector,
	i32 %irow, i32 %ilead) nounwind {			i32 %irow, i32 %ilead) nounwind {
	; CHECK: entry:			; CHECK-LABEL: @guardedloop(
	; CHECK-NOT: zext			; CHECK-LABEL: entry:
	; CHECK-NOT: add			; CHECK-NEXT: %[[cmp:.*]] = icmp slt i32 1, %irow
	; CHECK: loop:			; CHECK-NEXT: br i1 %[[cmp]], label %[[loop_preheader:.]], label %[[return:.]]
	; CHECK: phi i64
	; CHECK: phi i64			; CHECK: [[loop_preheader]]:
				; CHECK-NEXT: %[[sext:.*]] = sext i32 %ilead to i64
				; CHECK-NEXT: %[[add:.*]] = add i32 %irow, -1
				; CHECK-NEXT: br label %[[loop:.*]]

				; CHECK: [[loop]]:
				; CHECK-NEXT: %[[indvars_iv2:.*]] = phi i64
				; CHECK-NEXT: phi i64
	; CHECK-NOT: phi			; CHECK-NOT: phi
	; CHECK: icmp ne			; CHECK: %[[lftr_wideiv:.*]] = trunc i64 %[[indvars_iv2]] to i32
	; CHECK: br i1			; CHECK-NEXT: %[[exitcond:.*]] = icmp ne i32 %[[lftr_wideiv]], %[[add]]
				; CHECK-NEXT: br i1 %[[exitcond]], label %[[loop]], label
	entry:			entry:
	%cmp = icmp slt i32 1, %irow			%cmp = icmp slt i32 1, %irow
	br i1 %cmp, label %loop, label %return			br i1 %cmp, label %loop, label %return

	loop:			loop:
	%rowidx = phi i32 [ 0, %entry ], [ %row.inc, %loop ]			%rowidx = phi i32 [ 0, %entry ], [ %row.inc, %loop ]
	%i = phi i32 [ 0, %entry ], [ %i.inc, %loop ]			%i = phi i32 [ 0, %entry ], [ %i.inc, %loop ]
	%diagidx = add nsw i32 %rowidx, %i			%diagidx = add nsw i32 %rowidx, %i
	▲ Show 20 Lines • Show All 130 Lines • Show Last 20 Lines

llvm/trunk/test/Transforms/IndVarSimplify/pr20680.ll

				; RUN: opt < %s -indvars -S \| FileCheck %s

				@a = common global i32 0, align 4
				@c = common global i32 0, align 4
				@b = common global i32 0, align 4

				define void @f() {
				; CHECK-LABEL: @f(
				; CHECK-LABEL: entry:
				; CHECK: br label %[[for_cond2_preheader:.*]]

				; CHECK: [[for_cond2_preheader]]:
				; CHECK-NEXT: %[[indvars_iv:.]] = phi i32 [ %[[indvars_iv_next:.]], %[[for_inc13:.*]] ], [ -14, %entry ]
				; br i1 {{.*}}, label %[[for_inc13]], label %
				entry:
				%0 = load i32* @a, align 4
				%tobool2 = icmp eq i32 %0, 0
				%1 = load i32* @a, align 4
				%tobool = icmp eq i32 %1, 0
				br label %for.cond2.preheader

				for.cond2.preheader: ; preds = %for.inc13, %entry
				%storemerge15 = phi i8 [ -14, %entry ], [ %inc14, %for.inc13 ]
				br i1 %tobool2, label %for.inc13, label %for.body3.lr.ph

				for.body3.lr.ph: ; preds = %for.cond2.preheader
				%tobool5 = icmp eq i8 %storemerge15, 0
				%conv7 = sext i8 %storemerge15 to i32
				%2 = add nsw i32 %conv7, 1
				%3 = icmp ult i32 %2, 3
				%div = select i1 %3, i32 %conv7, i32 0
				br i1 %tobool5, label %for.body3.lr.ph.split.us, label %for.body3.lr.ph.for.body3.lr.ph.split_crit_edge

				for.body3.lr.ph.for.body3.lr.ph.split_crit_edge: ; preds = %for.body3.lr.ph
				br label %for.body3.lr.ph.split

				for.body3.lr.ph.split.us: ; preds = %for.body3.lr.ph
				br i1 %tobool, label %for.body3.lr.ph.split.us.split.us, label %for.body3.lr.ph.split.us.for.body3.lr.ph.split.us.split_crit_edge

				for.body3.lr.ph.split.us.for.body3.lr.ph.split.us.split_crit_edge: ; preds = %for.body3.lr.ph.split.us
				br label %for.body3.lr.ph.split.us.split

				for.body3.lr.ph.split.us.split.us: ; preds = %for.body3.lr.ph.split.us
				br label %for.body3.us.us

				for.body3.us.us: ; preds = %for.cond2.loopexit.us.us, %for.body3.lr.ph.split.us.split.us
				br i1 true, label %cond.false.us.us, label %cond.end.us.us

				cond.false.us.us: ; preds = %for.body3.us.us
				br label %cond.end.us.us

				cond.end.us.us: ; preds = %cond.false.us.us, %for.body3.us.us
				%cond.us.us = phi i32 [ %div, %cond.false.us.us ], [ %conv7, %for.body3.us.us ]
				%4 = load i32* @b, align 4
				%cmp91.us.us = icmp slt i32 %4, 1
				br i1 %cmp91.us.us, label %for.inc.lr.ph.us.us, label %for.cond2.loopexit.us.us

				for.cond2.loopexit.us.us: ; preds = %for.cond8.for.cond2.loopexit_crit_edge.us.us, %cond.end.us.us
				br i1 true, label %for.cond2.for.inc13_crit_edge.us-lcssa.us.us-lcssa.us, label %for.body3.us.us

				for.inc.lr.ph.us.us: ; preds = %cond.end.us.us
				br label %for.inc.us.us

				for.cond8.for.cond2.loopexit_crit_edge.us.us: ; preds = %for.inc.us.us
				%inc.lcssa.us.us = phi i32 [ %inc.us.us, %for.inc.us.us ]
				store i32 %inc.lcssa.us.us, i32* @b, align 4
				br label %for.cond2.loopexit.us.us

				for.inc.us.us: ; preds = %for.inc.us.us, %for.inc.lr.ph.us.us
				%5 = phi i32 [ %4, %for.inc.lr.ph.us.us ], [ %inc.us.us, %for.inc.us.us ]
				%inc.us.us = add nsw i32 %5, 1
				%cmp9.us.us = icmp slt i32 %inc.us.us, 1
				br i1 %cmp9.us.us, label %for.inc.us.us, label %for.cond8.for.cond2.loopexit_crit_edge.us.us

				for.cond2.for.inc13_crit_edge.us-lcssa.us.us-lcssa.us: ; preds = %for.cond2.loopexit.us.us
				%cond.lcssa.ph.us.ph.us = phi i32 [ %cond.us.us, %for.cond2.loopexit.us.us ]
				br label %for.cond2.for.inc13_crit_edge.us-lcssa.us

				for.body3.lr.ph.split.us.split: ; preds = %for.body3.lr.ph.split.us.for.body3.lr.ph.split.us.split_crit_edge
				br label %for.body3.us

				for.body3.us: ; preds = %for.cond2.loopexit.us, %for.body3.lr.ph.split.us.split
				br i1 true, label %cond.false.us, label %cond.end.us

				cond.false.us: ; preds = %for.body3.us
				br label %cond.end.us

				cond.end.us: ; preds = %cond.false.us, %for.body3.us
				%cond.us = phi i32 [ %div, %cond.false.us ], [ %conv7, %for.body3.us ]
				%6 = load i32* @b, align 4
				%cmp91.us = icmp slt i32 %6, 1
				br i1 %cmp91.us, label %for.inc.lr.ph.us, label %for.cond2.loopexit.us

				for.inc.us: ; preds = %for.inc.lr.ph.us, %for.inc.us
				%7 = phi i32 [ %6, %for.inc.lr.ph.us ], [ %inc.us, %for.inc.us ]
				%inc.us = add nsw i32 %7, 1
				%cmp9.us = icmp slt i32 %inc.us, 1
				br i1 %cmp9.us, label %for.inc.us, label %for.cond8.for.cond2.loopexit_crit_edge.us

				for.cond2.loopexit.us: ; preds = %for.cond8.for.cond2.loopexit_crit_edge.us, %cond.end.us
				br i1 false, label %for.cond2.for.inc13_crit_edge.us-lcssa.us.us-lcssa, label %for.body3.us

				for.inc.lr.ph.us: ; preds = %cond.end.us
				br label %for.inc.us

				for.cond8.for.cond2.loopexit_crit_edge.us: ; preds = %for.inc.us
				%inc.lcssa.us = phi i32 [ %inc.us, %for.inc.us ]
				store i32 %inc.lcssa.us, i32* @b, align 4
				br label %for.cond2.loopexit.us

				for.cond2.for.inc13_crit_edge.us-lcssa.us.us-lcssa: ; preds = %for.cond2.loopexit.us
				%cond.lcssa.ph.us.ph = phi i32 [ %cond.us, %for.cond2.loopexit.us ]
				br label %for.cond2.for.inc13_crit_edge.us-lcssa.us

				for.cond2.for.inc13_crit_edge.us-lcssa.us: ; preds = %for.cond2.for.inc13_crit_edge.us-lcssa.us.us-lcssa, %for.cond2.for.inc13_crit_edge.us-lcssa.us.us-lcssa.us
				%cond.lcssa.ph.us = phi i32 [ %cond.lcssa.ph.us.ph, %for.cond2.for.inc13_crit_edge.us-lcssa.us.us-lcssa ], [ %cond.lcssa.ph.us.ph.us, %for.cond2.for.inc13_crit_edge.us-lcssa.us.us-lcssa.us ]
				br label %for.cond2.for.inc13_crit_edge

				for.body3.lr.ph.split: ; preds = %for.body3.lr.ph.for.body3.lr.ph.split_crit_edge
				br i1 %tobool, label %for.body3.lr.ph.split.split.us, label %for.body3.lr.ph.split.for.body3.lr.ph.split.split_crit_edge

				for.body3.lr.ph.split.for.body3.lr.ph.split.split_crit_edge: ; preds = %for.body3.lr.ph.split
				br label %for.body3.lr.ph.split.split

				for.body3.lr.ph.split.split.us: ; preds = %for.body3.lr.ph.split
				br label %for.body3.us3

				for.body3.us3: ; preds = %for.cond2.loopexit.us11, %for.body3.lr.ph.split.split.us
				br i1 false, label %cond.false.us4, label %cond.end.us5

				cond.false.us4: ; preds = %for.body3.us3
				br label %cond.end.us5

				cond.end.us5: ; preds = %cond.false.us4, %for.body3.us3
				%cond.us6 = phi i32 [ %div, %cond.false.us4 ], [ %conv7, %for.body3.us3 ]
				%8 = load i32* @b, align 4
				%cmp91.us7 = icmp slt i32 %8, 1
				br i1 %cmp91.us7, label %for.inc.lr.ph.us12, label %for.cond2.loopexit.us11

				for.inc.us8: ; preds = %for.inc.lr.ph.us12, %for.inc.us8
				%9 = phi i32 [ %8, %for.inc.lr.ph.us12 ], [ %inc.us9, %for.inc.us8 ]
				%inc.us9 = add nsw i32 %9, 1
				%cmp9.us10 = icmp slt i32 %inc.us9, 1
				br i1 %cmp9.us10, label %for.inc.us8, label %for.cond8.for.cond2.loopexit_crit_edge.us13

				for.cond2.loopexit.us11: ; preds = %for.cond8.for.cond2.loopexit_crit_edge.us13, %cond.end.us5
				br i1 true, label %for.cond2.for.inc13_crit_edge.us-lcssa.us-lcssa.us, label %for.body3.us3

				for.inc.lr.ph.us12: ; preds = %cond.end.us5
				br label %for.inc.us8

				for.cond8.for.cond2.loopexit_crit_edge.us13: ; preds = %for.inc.us8
				%inc.lcssa.us14 = phi i32 [ %inc.us9, %for.inc.us8 ]
				store i32 %inc.lcssa.us14, i32* @b, align 4
				br label %for.cond2.loopexit.us11

				for.cond2.for.inc13_crit_edge.us-lcssa.us-lcssa.us: ; preds = %for.cond2.loopexit.us11
				%cond.lcssa.ph.ph.us = phi i32 [ %cond.us6, %for.cond2.loopexit.us11 ]
				br label %for.cond2.for.inc13_crit_edge.us-lcssa

				for.body3.lr.ph.split.split: ; preds = %for.body3.lr.ph.split.for.body3.lr.ph.split.split_crit_edge
				br label %for.body3

				for.cond8.for.cond2.loopexit_crit_edge: ; preds = %for.inc
				%inc.lcssa = phi i32 [ %inc, %for.inc ]
				store i32 %inc.lcssa, i32* @b, align 4
				br label %for.cond2.loopexit

				for.cond2.loopexit: ; preds = %cond.end, %for.cond8.for.cond2.loopexit_crit_edge
				br i1 false, label %for.cond2.for.inc13_crit_edge.us-lcssa.us-lcssa, label %for.body3

				for.body3: ; preds = %for.cond2.loopexit, %for.body3.lr.ph.split.split
				br i1 false, label %cond.false, label %cond.end

				cond.false: ; preds = %for.body3
				br label %cond.end

				cond.end: ; preds = %cond.false, %for.body3
				%cond = phi i32 [ %div, %cond.false ], [ %conv7, %for.body3 ]
				%10 = load i32* @b, align 4
				%cmp91 = icmp slt i32 %10, 1
				br i1 %cmp91, label %for.inc.lr.ph, label %for.cond2.loopexit

				for.inc.lr.ph: ; preds = %cond.end
				br label %for.inc

				for.inc: ; preds = %for.inc, %for.inc.lr.ph
				%11 = phi i32 [ %10, %for.inc.lr.ph ], [ %inc, %for.inc ]
				%inc = add nsw i32 %11, 1
				%cmp9 = icmp slt i32 %inc, 1
				br i1 %cmp9, label %for.inc, label %for.cond8.for.cond2.loopexit_crit_edge

				for.cond2.for.inc13_crit_edge.us-lcssa.us-lcssa: ; preds = %for.cond2.loopexit
				%cond.lcssa.ph.ph = phi i32 [ %cond, %for.cond2.loopexit ]
				br label %for.cond2.for.inc13_crit_edge.us-lcssa

				for.cond2.for.inc13_crit_edge.us-lcssa: ; preds = %for.cond2.for.inc13_crit_edge.us-lcssa.us-lcssa, %for.cond2.for.inc13_crit_edge.us-lcssa.us-lcssa.us
				%cond.lcssa.ph = phi i32 [ %cond.lcssa.ph.ph, %for.cond2.for.inc13_crit_edge.us-lcssa.us-lcssa ], [ %cond.lcssa.ph.ph.us, %for.cond2.for.inc13_crit_edge.us-lcssa.us-lcssa.us ]
				br label %for.cond2.for.inc13_crit_edge

				for.cond2.for.inc13_crit_edge: ; preds = %for.cond2.for.inc13_crit_edge.us-lcssa, %for.cond2.for.inc13_crit_edge.us-lcssa.us
				%cond.lcssa = phi i32 [ %cond.lcssa.ph, %for.cond2.for.inc13_crit_edge.us-lcssa ], [ %cond.lcssa.ph.us, %for.cond2.for.inc13_crit_edge.us-lcssa.us ]
				store i32 %cond.lcssa, i32* @c, align 4
				br label %for.inc13

				; CHECK: [[for_inc13]]:
				; CHECK-NEXT: %[[indvars_iv_next]] = add nuw nsw i32 %[[indvars_iv]], 1
				; CHECK-NEXT: %[[exitcond4:.*]] = icmp ne i32 %[[indvars_iv]], -1
				; CHECK-NEXT: br i1 %[[exitcond4]], label %[[for_cond2_preheader]], label %[[for_end15:.*]]
				for.inc13: ; preds = %for.cond2.for.inc13_crit_edge, %for.cond2.preheader
				%inc14 = add i8 %storemerge15, 1
				%cmp = icmp ugt i8 %inc14, 50
				br i1 %cmp, label %for.cond2.preheader, label %for.end15

				; CHECK: [[for_end15]]:
				; CHECK-NEXT: ret void
				for.end15: ; preds = %for.inc13
				ret void
				}