This is an archive of the discontinued LLVM Phabricator instance.

Paths

Table of Contentst

-
llvm/
-
lib/Transforms/Scalar/
-
Transforms/
-
Scalar/
2/5
LoopInterchange.cpp
-
test/Transforms/LoopInterchange/
-
Transforms/
-
LoopInterchange/
-
pr43176-move-to-new-latch.ll
-
pr57148.ll

Differential D132055

[LoopInterchange][PR57148] Ensure LCSSA form after loop interchnange
ClosedPublic

Authored by congzhe on Aug 17 2022, 11:34 AM.

Download Raw Diff

Details

Reviewers

bmahjour
Meinersbur
uabelho

Group Reviewers

Restricted Project

Commits

rG22c91df52ccc: [LoopInterchange][PR57148] Ensure the correct form of IR after transformation

Summary

This patch resolves pr57148 (https://github.com/llvm/llvm-project/issues/57148) which is an assertion error due to of loss of LCSSA form after interchange.

In cases where the LCSSA form is not maintained after interchange (e.g., interchanging the middle loop and the outermost loop in the test case added in this patch), we change the IR to LCSSA form again.

The test case added is the reproducer of pr57148.

Diff Detail

Repository: rG LLVM Github Monorepo

Event Timeline

congzhe created this revision.Aug 17 2022, 11:34 AM

Herald added a project: Restricted Project. · View Herald TranscriptAug 17 2022, 11:34 AM

Herald added a subscriber: hiraditya. · View Herald Transcript

congzhe requested review of this revision.Aug 17 2022, 11:34 AM

Herald added a subscriber: llvm-commits. · View Herald TranscriptAug 17 2022, 11:34 AM

congzhe edited the summary of this revision. (Show Details)Aug 17 2022, 11:35 AM

congzhe mentioned this in D124926: [LoopInterchange] New cost model for loop interchange.

Harbormaster completed remote builds in B181797: Diff 453368.Aug 17 2022, 1:38 PM

I have no idea if this is the right fix, but I've verified that it solves the problem that I reported in pr 57148.
Thanks!

However, that was a reduced version of the problem, and if I try the original non-reduced version (unfortunately for an out-of-tree target) I get another failure with this patch.
So it might be a step in the right direction, but there are more problems lurking.

A reduced version of the new problem is

opt -passes="loop-interchange" bbi-72571_2.ll -o /dev/null

which results in

Instruction does not dominate all uses!
  %i.166.lcssa = phi i16 [ %3, %vector.body85.split ]
  %arrayidx60 = getelementptr inbounds [2 x [4 x i32]], ptr @c, i16 0, i16 %i.166.lcssa, i16 %j.165
LLVM ERROR: Broken module found, compilation aborted!

bbi-72571_2.ll1 KBDownload

bmahjour added inline comments.Aug 22 2022, 9:53 AM

llvm/lib/Transforms/Scalar/LoopInterchange.cpp
575–576	can outer loop still be non-lcssa?
576	why not consider innerloop also?

In D132055#3734791, @uabelho wrote:
I have no idea if this is the right fix, but I've verified that it solves the problem that I reported in pr 57148.
Thanks!

However, that was a reduced version of the problem, and if I try the original non-reduced version (unfortunately for an out-of-tree target) I get another failure with this patch.
So it might be a step in the right direction, but there are more problems lurking.

A reduced version of the new problem is
opt -passes="loop-interchange" bbi-72571_2.ll -o /dev/null
which results in
Instruction does not dominate all uses!
  %i.166.lcssa = phi i16 [ %3, %vector.body85.split ]
  %arrayidx60 = getelementptr inbounds [2 x [4 x i32]], ptr @c, i16 0, i16 %i.166.lcssa, i16 %j.165
LLVM ERROR: Broken module found, compilation aborted!
bbi-72571_2.ll1 KBDownload

The problem in bbi-72751_2.ll is different from the original bbi-72751.ll. The problem and solution is described as follows.

The problem is that if we have a loop nest like this:

for.outermost.header:
  %phi.outermost = ...

for.middle.header:
  %phi.middle= ...
  use of %phi.outermost

for.innermost:
  ....

If we interchange the outermost and the middle loop, it would become:

// the new outermost loop header
for.middle.header:
  %phi.middle= ...
  use of %phi.outermost

// the new middle loop header
for.outermost.header:
  %phi.outermost = ...

for.innermost:
  ....

And we'll end up with the problem that we have use of %phi.outermost before its def.

The solution is that we split the inner loop header to two basic blocks where one contains only phi instructions. We already did this splitting for the innermost loop header which usually is also the innermost loop latch (so it is a single BB that contains phis and other instructions). With this patch we do the splitting for non-innermost loops as well.

For the case above this is how we do the splitting after this patch:

for.outermost.header:
  %phi.outermost = ...

for.middle.header:
  %phi.middle= ...

for.middle.header.split:
  use of %phi.outermost

for.innermost:
  ....

And now we can correctly interchange them:

// the new outermost loop header
for.middle.header:
  %phi.middle= ...

// the new middle loop header
for.outermost.header:
  %phi.outermost = ...

for.middle.header.split:
  use of %phi.outermost

for.innermost:
  ....

bbi-72571_2.ll is added in llvm/test/Transforms/LoopInterchange/pr57148.ll as another test case. Another change in test cases is llvm/test/Transforms/LoopInterchange/pr43176-move-to-new-latch.ll but it is just simple and better clean up of BBs, there is no functional change.

congzhe added inline comments.Sep 19 2022, 7:56 PM

llvm/lib/Transforms/Scalar/LoopInterchange.cpp
576	Thanks for the comment! I've now used `formLCSSARecursively()` and removed the LCSSA assertion check for both inner and outer loops.

Harbormaster completed remote builds in B187672: Diff 461456.Sep 19 2022, 9:51 PM

LGTM

This revision is now accepted and ready to land.Sep 21 2022, 8:18 AM

LGTM

llvm/lib/Transforms/Scalar/LoopInterchange.cpp
1355–1357	[suggestion] Update commend for condition: "Ensure the inner loop phi nodes have a separate basic block."

congzhe updated this revision to Diff 462074.Sep 21 2022, 8:31 PM

congzhe added inline comments.

llvm/lib/Transforms/Scalar/LoopInterchange.cpp
1355–1357	Thanks, I've updated the patch accordingly and I will land it shortly.

Harbormaster completed remote builds in B188100: Diff 462074.Sep 21 2022, 8:33 PM

This revision was landed with ongoing or failed builds.Sep 21 2022, 9:21 PM

Closed by commit rG22c91df52ccc: [LoopInterchange][PR57148] Ensure the correct form of IR after transformation (authored by congzhe). · Explain Why

This revision was automatically updated to reflect the committed changes.

congzhe added a commit: rG22c91df52ccc: [LoopInterchange][PR57148] Ensure the correct form of IR after transformation.

Hello @congzhe ,

I think I'm seeing a miscompile with this patch:

opt "-passes=function(loop(loop-interchange))" bbi-74005_x86.ll -S -o -

Now I don't know how this is supposed to work but it looks to me that in the input code we read

%arrayidx14.promoted.i = load i32, ptr %arrayidx14.i, align 1

then do 512 rounds of the inner loop and add the read elements, and then we store what we have so far:

%18 = tail call i32 @llvm.vector.reduce.add.v4i32(<4 x i32> %16)
store i32 %18, ptr %arrayidx14.i, align 1

But after loop-interchange we load, then add, but then we dont do the store, but do the load and execute the inner loop again? So it looks like we throw away the calculated addition and just read the same values again?
Or am I missing something here?

bbi-74005_x86.ll2 KBDownload

In D132055#3814831, @uabelho wrote:
Hello @congzhe ,

I think I'm seeing a miscompile with this patch:
opt "-passes=function(loop(loop-interchange))" bbi-74005_x86.ll -S -o -
Now I don't know how this is supposed to work but it looks to me that in the input code we read
%arrayidx14.promoted.i = load i32, ptr %arrayidx14.i, align 1
then do 512 rounds of the inner loop and add the read elements, and then we store what we have so far:
%18 = tail call i32 @llvm.vector.reduce.add.v4i32(<4 x i32> %16)
store i32 %18, ptr %arrayidx14.i, align 1
But after loop-interchange we load, then add, but then we dont do the store, but do the load and execute the inner loop again? So it looks like we throw away the calculated addition and just read the same values again?
Or am I missing something here?

bbi-74005_x86.ll2 KBDownload

Hi Mikael, thanks for finding it out, I'll take a look at this issue,

congzhe mentioned this in D134930: [LoopInterchange] Do not interchange when a reduction phi in all subloops of the outer loop is not recognizable.Sep 29 2022, 5:46 PM

In D132055#3814831, @uabelho wrote:
Hello @congzhe ,

I think I'm seeing a miscompile with this patch:
opt "-passes=function(loop(loop-interchange))" bbi-74005_x86.ll -S -o -
Now I don't know how this is supposed to work but it looks to me that in the input code we read
%arrayidx14.promoted.i = load i32, ptr %arrayidx14.i, align 1
then do 512 rounds of the inner loop and add the read elements, and then we store what we have so far:
%18 = tail call i32 @llvm.vector.reduce.add.v4i32(<4 x i32> %16)
store i32 %18, ptr %arrayidx14.i, align 1
But after loop-interchange we load, then add, but then we dont do the store, but do the load and execute the inner loop again? So it looks like we throw away the calculated addition and just read the same values again?
Or am I missing something here?

bbi-74005_x86.ll2 KBDownload

In D132055#3817047, @congzhe wrote:
In D132055#3814831, @uabelho wrote:
Hello @congzhe ,

I think I'm seeing a miscompile with this patch:
opt "-passes=function(loop(loop-interchange))" bbi-74005_x86.ll -S -o -
Now I don't know how this is supposed to work but it looks to me that in the input code we read
%arrayidx14.promoted.i = load i32, ptr %arrayidx14.i, align 1
then do 512 rounds of the inner loop and add the read elements, and then we store what we have so far:
%18 = tail call i32 @llvm.vector.reduce.add.v4i32(<4 x i32> %16)
store i32 %18, ptr %arrayidx14.i, align 1
But after loop-interchange we load, then add, but then we dont do the store, but do the load and execute the inner loop again? So it looks like we throw away the calculated addition and just read the same values again?
Or am I missing something here?

bbi-74005_x86.ll2 KBDownload
Hi Mikael, thanks for finding it out, I'll take a look at this issue,

Posted D134930 to fix it.

congzhe mentioned this in rG75b33d6bd518: [LoopInterchange] Check phis in all subloops.Nov 3 2022, 9:22 PM

Revision Contents

Path

Size

llvm/

lib/

Transforms/

Scalar/

LoopInterchange.cpp

12 lines

test/

Transforms/

LoopInterchange/

pr43176-move-to-new-latch.ll

8 lines

pr57148.ll

168 lines

Diff 462079

llvm/lib/Transforms/Scalar/LoopInterchange.cpp

Show First 20 Lines • Show All 566 Lines • ▼ Show 20 Lines	ORE->emit([&]() {
InnerLoop->getHeader())		InnerLoop->getHeader())
<< "Loop interchanged with enclosing loop.";		<< "Loop interchanged with enclosing loop.";
});		});

LoopInterchangeTransform LIT(OuterLoop, InnerLoop, SE, LI, DT, LIL);		LoopInterchangeTransform LIT(OuterLoop, InnerLoop, SE, LI, DT, LIL);
LIT.transform();		LIT.transform();
LLVM_DEBUG(dbgs() << "Loops interchanged.\n");		LLVM_DEBUG(dbgs() << "Loops interchanged.\n");
LoopsInterchanged++;		LoopsInterchanged++;

assert(InnerLoop->isLCSSAForm(*DT) &&		llvm::formLCSSARecursively(OuterLoop, DT, LI, SE);
		bmahjourUnsubmitted Not Done Reply Inline Actions can outer loop still be non-lcssa? bmahjour: can outer loop still be non-lcssa?
		bmahjourUnsubmitted Not Done Reply Inline Actions why not consider innerloop also? bmahjour: why not consider innerloop also?
		congzheAuthorUnsubmitted Done Reply Inline Actions Thanks for the comment! I've now used `formLCSSARecursively()` and removed the LCSSA assertion check for both inner and outer loops. congzhe: Thanks for the comment! I've now used `formLCSSARecursively()` and removed the LCSSA assertion…
"Inner loop not left in LCSSA form after loop interchange!");
assert(OuterLoop->isLCSSAForm(*DT) &&
"Outer loop not left in LCSSA form after loop interchange!");

return true;		return true;
}		}
};		};

} // end anonymous namespace		} // end anonymous namespace

bool LoopInterchangeLegality::containsUnsafeInstructions(BasicBlock *BB) {		bool LoopInterchangeLegality::containsUnsafeInstructions(BasicBlock *BB) {
return any_of(*BB, [](const Instruction &I) {		return any_of(*BB, [](const Instruction &I) {
▲ Show 20 Lines • Show All 760 Lines • ▼ Show 20 Lines	Instruction *CondI = dyn_cast<Instruction>(
cast<BranchInst>(InnerLoop->getLoopLatch()->getTerminator())		cast<BranchInst>(InnerLoop->getLoopLatch()->getTerminator())
->getCondition());		->getCondition());
if (CondI)		if (CondI)
WorkList.insert(CondI);		WorkList.insert(CondI);
MoveInstructions();		MoveInstructions();
for (Instruction *InnerIndexVar : InnerIndexVarList)		for (Instruction *InnerIndexVar : InnerIndexVarList)
WorkList.insert(cast<Instruction>(InnerIndexVar));		WorkList.insert(cast<Instruction>(InnerIndexVar));
MoveInstructions();		MoveInstructions();
		}

// Splits the inner loops phi nodes out into a separate basic block.		// Ensure the inner loop phi nodes have a separate basic block.
BasicBlock *InnerLoopHeader = InnerLoop->getHeader();		BasicBlock *InnerLoopHeader = InnerLoop->getHeader();
		if (InnerLoopHeader->getFirstNonPHI() != InnerLoopHeader->getTerminator()) {
		MeinersburUnsubmitted Not Done Reply Inline Actions [suggestion] Update commend for condition: "Ensure the inner loop phi nodes have a separate basic block." Meinersbur: [suggestion] Update commend for condition: "Ensure the inner loop phi nodes have a separate…
		congzheAuthorUnsubmitted Done Reply Inline Actions Thanks, I've updated the patch accordingly and I will land it shortly. congzhe: Thanks, I've updated the patch accordingly and I will land it shortly.
SplitBlock(InnerLoopHeader, InnerLoopHeader->getFirstNonPHI(), DT, LI);		SplitBlock(InnerLoopHeader, InnerLoopHeader->getFirstNonPHI(), DT, LI);
LLVM_DEBUG(dbgs() << "splitting InnerLoopHeader done\n");		LLVM_DEBUG(dbgs() << "splitting InnerLoopHeader done\n");
}		}

// Instructions in the original inner loop preheader may depend on values		// Instructions in the original inner loop preheader may depend on values
// defined in the outer loop header. Move them there, because the original		// defined in the outer loop header. Move them there, because the original
// inner loop preheader will become the entry into the interchanged loop nest.		// inner loop preheader will become the entry into the interchanged loop nest.
// Currently we move all instructions and rely on LICM to move invariant		// Currently we move all instructions and rely on LICM to move invariant
▲ Show 20 Lines • Show All 405 Lines • Show Last 20 Lines

llvm/test/Transforms/LoopInterchange/pr43176-move-to-new-latch.ll

	; NOTE: Assertions have been autogenerated by utils/update_test_checks.py			; NOTE: Assertions have been autogenerated by utils/update_test_checks.py
	; RUN: opt -loop-interchange -cache-line-size=64 -verify-loop-lcssa -verify-dom-info -S %s \| FileCheck %s			; RUN: opt -loop-interchange -cache-line-size=64 -verify-loop-lcssa -verify-dom-info -S %s \| FileCheck %s

	@b = external dso_local global [5 x i32], align 16			@b = external dso_local global [5 x i32], align 16

	define void @test1() {			define void @test1() {
	; CHECK-LABEL: @test1(			; CHECK-LABEL: @test1(
	; CHECK-NEXT: entry:			; CHECK-NEXT: entry:
	; CHECK-NEXT: br label [[FOR_BODY2_PREHEADER:%.*]]			; CHECK-NEXT: br label [[FOR_BODY2_PREHEADER:%.*]]
	; CHECK: for.body.preheader:			; CHECK: for.body.preheader:
	; CHECK-NEXT: br label [[FOR_BODY:%.*]]			; CHECK-NEXT: br label [[FOR_BODY:%.*]]
	; CHECK: for.body:			; CHECK: for.body:
	; CHECK-NEXT: [[INC41:%.]] = phi i32 [ [[INC4:%.]], [[FOR_INC3:%.]] ], [ undef, [[FOR_BODY_PREHEADER:%.]] ]			; CHECK-NEXT: [[INC41:%.]] = phi i32 [ [[INC4:%.]], [[FOR_INC3:%.]] ], [ undef, [[FOR_BODY_PREHEADER:%.]] ]
	; CHECK-NEXT: [[IDXPROM:%.*]] = sext i32 [[INC41]] to i64			; CHECK-NEXT: [[IDXPROM:%.*]] = sext i32 [[INC41]] to i64
	; CHECK-NEXT: [[ARRAYIDX:%.]] = getelementptr inbounds [5 x i32], [5 x i32] @b, i64 0, i64 [[IDXPROM]]			; CHECK-NEXT: [[ARRAYIDX:%.]] = getelementptr inbounds [5 x i32], [5 x i32] @b, i64 0, i64 [[IDXPROM]]
	; CHECK-NEXT: br label [[FOR_BODY2_SPLIT:%.*]]			; CHECK-NEXT: br label [[FOR_INC:%.*]]
	; CHECK: for.body2.preheader:			; CHECK: for.body2.preheader:
	; CHECK-NEXT: br label [[FOR_BODY2:%.*]]			; CHECK-NEXT: br label [[FOR_BODY2:%.*]]
	; CHECK: for.body2:			; CHECK: for.body2:
	; CHECK-NEXT: [[LSR_IV:%.]] = phi i32 [ [[TMP1:%.]], [[FOR_INC_SPLIT:%.*]] ], [ 1, [[FOR_BODY2_PREHEADER]] ]			; CHECK-NEXT: [[LSR_IV:%.]] = phi i32 [ [[TMP1:%.]], [[FOR_INC_SPLIT:%.*]] ], [ 1, [[FOR_BODY2_PREHEADER]] ]
	; CHECK-NEXT: br label [[FOR_BODY_PREHEADER]]			; CHECK-NEXT: br label [[FOR_BODY_PREHEADER]]
	; CHECK: for.body2.split:
	; CHECK-NEXT: br label [[FOR_INC:%.*]]
	; CHECK: for.inc:			; CHECK: for.inc:
	; CHECK-NEXT: [[TMP0:%.]] = load i32, i32 [[ARRAYIDX]], align 4			; CHECK-NEXT: [[TMP0:%.]] = load i32, i32 [[ARRAYIDX]], align 4
	; CHECK-NEXT: store i32 undef, i32* [[ARRAYIDX]], align 4			; CHECK-NEXT: store i32 undef, i32* [[ARRAYIDX]], align 4
	; CHECK-NEXT: [[CMP:%.*]] = icmp slt i32 [[LSR_IV]], 4			; CHECK-NEXT: [[CMP:%.*]] = icmp slt i32 [[LSR_IV]], 4
	; CHECK-NEXT: [[LSR_IV_NEXT:%.*]] = add nuw nsw i32 [[LSR_IV]], 1			; CHECK-NEXT: [[LSR_IV_NEXT:%.*]] = add nuw nsw i32 [[LSR_IV]], 1
	; CHECK-NEXT: br label [[FOR_COND1_FOR_END_CRIT_EDGE:%.*]]			; CHECK-NEXT: br label [[FOR_COND1_FOR_END_CRIT_EDGE:%.*]]
	; CHECK: for.inc.split:			; CHECK: for.inc.split:
	; CHECK-NEXT: [[TMP1]] = add nuw nsw i32 [[LSR_IV]], 1			; CHECK-NEXT: [[TMP1]] = add nuw nsw i32 [[LSR_IV]], 1
	▲ Show 20 Lines • Show All 43 Lines • ▼ Show 20 Lines
	; CHECK-NEXT: entry:			; CHECK-NEXT: entry:
	; CHECK-NEXT: br label [[FOR_BODY2_PREHEADER:%.*]]			; CHECK-NEXT: br label [[FOR_BODY2_PREHEADER:%.*]]
	; CHECK: for.body.preheader:			; CHECK: for.body.preheader:
	; CHECK-NEXT: br label [[FOR_BODY:%.*]]			; CHECK-NEXT: br label [[FOR_BODY:%.*]]
	; CHECK: for.body:			; CHECK: for.body:
	; CHECK-NEXT: [[INC41:%.]] = phi i32 [ [[INC4:%.]], [[FOR_INC3:%.]] ], [ undef, [[FOR_BODY_PREHEADER:%.]] ]			; CHECK-NEXT: [[INC41:%.]] = phi i32 [ [[INC4:%.]], [[FOR_INC3:%.]] ], [ undef, [[FOR_BODY_PREHEADER:%.]] ]
	; CHECK-NEXT: [[IDXPROM:%.*]] = sext i32 [[INC41]] to i64			; CHECK-NEXT: [[IDXPROM:%.*]] = sext i32 [[INC41]] to i64
	; CHECK-NEXT: [[ARRAYIDX:%.]] = getelementptr inbounds [5 x i32], [5 x i32] @b, i64 0, i64 [[IDXPROM]]			; CHECK-NEXT: [[ARRAYIDX:%.]] = getelementptr inbounds [5 x i32], [5 x i32] @b, i64 0, i64 [[IDXPROM]]
	; CHECK-NEXT: br label [[FOR_BODY2_SPLIT:%.*]]			; CHECK-NEXT: br label [[FOR_INC:%.*]]
	; CHECK: for.body2.preheader:			; CHECK: for.body2.preheader:
	; CHECK-NEXT: br label [[FOR_BODY2:%.*]]			; CHECK-NEXT: br label [[FOR_BODY2:%.*]]
	; CHECK: for.body2:			; CHECK: for.body2:
	; CHECK-NEXT: [[LSR_IV:%.]] = phi i32 [ [[TMP1:%.]], [[FOR_INC_SPLIT:%.*]] ], [ 1, [[FOR_BODY2_PREHEADER]] ]			; CHECK-NEXT: [[LSR_IV:%.]] = phi i32 [ [[TMP1:%.]], [[FOR_INC_SPLIT:%.*]] ], [ 1, [[FOR_BODY2_PREHEADER]] ]
	; CHECK-NEXT: br label [[FOR_BODY_PREHEADER]]			; CHECK-NEXT: br label [[FOR_BODY_PREHEADER]]
	; CHECK: for.body2.split:
	; CHECK-NEXT: br label [[FOR_INC:%.*]]
	; CHECK: for.inc:			; CHECK: for.inc:
	; CHECK-NEXT: [[TMP0:%.]] = load i32, i32 [[ARRAYIDX]], align 4			; CHECK-NEXT: [[TMP0:%.]] = load i32, i32 [[ARRAYIDX]], align 4
	; CHECK-NEXT: [[CMP:%.*]] = icmp slt i32 [[LSR_IV]], 4			; CHECK-NEXT: [[CMP:%.*]] = icmp slt i32 [[LSR_IV]], 4
	; CHECK-NEXT: [[CMP_ZEXT:%.*]] = zext i1 [[CMP]] to i32			; CHECK-NEXT: [[CMP_ZEXT:%.*]] = zext i1 [[CMP]] to i32
	; CHECK-NEXT: store i32 [[CMP_ZEXT]], i32* [[ARRAYIDX]], align 4			; CHECK-NEXT: store i32 [[CMP_ZEXT]], i32* [[ARRAYIDX]], align 4
	; CHECK-NEXT: [[LSR_IV_NEXT:%.*]] = add nuw nsw i32 [[LSR_IV]], 1			; CHECK-NEXT: [[LSR_IV_NEXT:%.*]] = add nuw nsw i32 [[LSR_IV]], 1
	; CHECK-NEXT: br label [[FOR_COND1_FOR_END_CRIT_EDGE:%.*]]			; CHECK-NEXT: br label [[FOR_COND1_FOR_END_CRIT_EDGE:%.*]]
	; CHECK: for.inc.split:			; CHECK: for.inc.split:
	▲ Show 20 Lines • Show All 42 Lines • Show Last 20 Lines

llvm/test/Transforms/LoopInterchange/pr57148.ll

This file was added.

				; NOTE: Assertions have been autogenerated by utils/update_test_checks.py
				; RUN: opt < %s -passes=loop-interchange -cache-line-size=4 -verify-dom-info -verify-loop-info -verify-scev -verify-loop-lcssa -S \| FileCheck %s

				; Make sure the loops are in LCSSA form after loop interchange,
				; and loop interchange does not hit assertion errors and crash.

				target triple = "x86_64-unknown-linux-gnu"

				@b = external global [512 x [4 x i32]], align 1
				@c = external global [2 x [4 x i32]], align 1

				define void @test1() {
				; CHECK-LABEL: @test1(
				; CHECK-NEXT: entry:
				; CHECK-NEXT: br label [[FOR_COND37_PREHEADER_PREHEADER:%.*]]
				; CHECK: for.cond33.preheader.preheader:
				; CHECK-NEXT: br label [[FOR_COND33_PREHEADER:%.*]]
				; CHECK: for.cond33.preheader:
				; CHECK-NEXT: [[I_011:%.]] = phi i16 [ [[INC69:%.]], [[FOR_END67:%.]] ], [ 0, [[FOR_COND33_PREHEADER_PREHEADER:%.]] ]
				; CHECK-NEXT: br label [[FOR_BODY42_SPLIT1:%.*]]
				; CHECK: for.body42.preheader:
				; CHECK-NEXT: br label [[FOR_BODY42:%.*]]
				; CHECK: for.cond37.preheader.preheader:
				; CHECK-NEXT: br label [[FOR_COND37_PREHEADER:%.*]]
				; CHECK: for.cond37.preheader:
				; CHECK-NEXT: [[J_010:%.]] = phi i16 [ [[INC66:%.]], [[FOR_END64:%.*]] ], [ 0, [[FOR_COND37_PREHEADER_PREHEADER]] ]
				; CHECK-NEXT: br label [[FOR_BODY42_PREHEADER:%.*]]
				; CHECK: for.body42:
				; CHECK-NEXT: [[K_09:%.]] = phi i16 [ [[TMP1:%.]], [[FOR_BODY42_SPLIT:%.*]] ], [ -512, [[FOR_BODY42_PREHEADER]] ]
				; CHECK-NEXT: br label [[FOR_COND33_PREHEADER_PREHEADER]]
				; CHECK: for.body42.split1:
				; CHECK-NEXT: [[SUB51:%.*]] = add nsw i16 [[K_09]], 512
				; CHECK-NEXT: [[ARRAYIDX55:%.*]] = getelementptr inbounds [512 x [4 x i32]], ptr @b, i16 0, i16 [[SUB51]], i16 [[J_010]]
				; CHECK-NEXT: [[TMP0:%.*]] = load i32, ptr [[ARRAYIDX55]], align 1
				; CHECK-NEXT: [[ADD61:%.*]] = add i32 undef, undef
				; CHECK-NEXT: [[INC63:%.*]] = add nsw i16 [[K_09]], 1
				; CHECK-NEXT: br label [[FOR_END67]]
				; CHECK: for.body42.split:
				; CHECK-NEXT: [[ADD61_LCSSA:%.*]] = phi i32 [ [[ADD61]], [[FOR_END67]] ]
				; CHECK-NEXT: [[TMP1]] = add nsw i16 [[K_09]], 1
				; CHECK-NEXT: br i1 true, label [[FOR_END64]], label [[FOR_BODY42]]
				; CHECK: for.end64:
				; CHECK-NEXT: [[ADD61_LCSSA_LCSSA:%.*]] = phi i32 [ [[ADD61_LCSSA]], [[FOR_BODY42_SPLIT]] ]
				; CHECK-NEXT: store i32 [[ADD61_LCSSA_LCSSA]], ptr undef, align 1
				; CHECK-NEXT: [[INC66]] = add nuw nsw i16 [[J_010]], 1
				; CHECK-NEXT: br i1 true, label [[FOR_COND75_PREHEADER:%.*]], label [[FOR_COND37_PREHEADER]]
				; CHECK: for.end67:
				; CHECK-NEXT: [[INC69]] = add nuw nsw i16 [[I_011]], 1
				; CHECK-NEXT: [[EXITCOND13_NOT:%.*]] = icmp eq i16 [[INC69]], 2
				; CHECK-NEXT: br i1 [[EXITCOND13_NOT]], label [[FOR_BODY42_SPLIT]], label [[FOR_COND33_PREHEADER]]
				; CHECK: for.cond75.preheader:
				; CHECK-NEXT: br label [[FOR_COND75:%.*]]
				; CHECK: for.cond75:
				; CHECK-NEXT: br label [[FOR_COND75]]
				;
				entry:
				br label %for.cond33.preheader

				for.cond33.preheader: ; preds = %for.end67, %entry
				%i.011 = phi i16 [ 0, %entry ], [ %inc69, %for.end67 ]
				br label %for.cond37.preheader

				for.cond37.preheader: ; preds = %for.end64, %for.cond33.preheader
				%j.010 = phi i16 [ 0, %for.cond33.preheader ], [ %inc66, %for.end64 ]
				br label %for.body42

				for.body42: ; preds = %for.body42, %for.cond37.preheader
				%k.09 = phi i16 [ -512, %for.cond37.preheader ], [ %inc63, %for.body42 ]
				%sub51 = add nsw i16 %k.09, 512
				%arrayidx55 = getelementptr inbounds [512 x [4 x i32]], ptr @b, i16 0, i16 %sub51, i16 %j.010
				%0 = load i32, ptr %arrayidx55, align 1
				%add61 = add i32 undef, undef
				%inc63 = add nsw i16 %k.09, 1
				br i1 true, label %for.end64, label %for.body42

				for.end64: ; preds = %for.body42
				store i32 %add61, ptr undef, align 1
				%inc66 = add nuw nsw i16 %j.010, 1
				br i1 true, label %for.end67, label %for.cond37.preheader

				for.end67: ; preds = %for.end64
				%inc69 = add nuw nsw i16 %i.011, 1
				%exitcond13.not = icmp eq i16 %inc69, 2
				br i1 %exitcond13.not, label %for.cond75, label %for.cond33.preheader

				for.cond75: ; preds = %for.cond75, %for.end67
				br label %for.cond75
				}


				; Make sure that we split the phi nodes in the middle loop header
				; into a separate basic block to avoid the situation where use of
				; the outermost indvar appears before its def after interchanging
				; the outermost and the middle loop. Otherwise loop interchange
				; would crash.

				define void @test2() {
				; CHECK-LABEL: @test2(
				; CHECK-NEXT: entry:
				; CHECK-NEXT: br label [[FOR_COND37_PREHEADER_PREHEADER:%.*]]
				; CHECK: for.cond33.preheader.preheader:
				; CHECK-NEXT: br label [[FOR_COND33_PREHEADER:%.*]]
				; CHECK: for.cond33.preheader:
				; CHECK-NEXT: [[I_166:%.]] = phi i16 [ [[INC69:%.]], [[FOR_INC68:%.]] ], [ 0, [[FOR_COND33_PREHEADER_PREHEADER:%.]] ]
				; CHECK-NEXT: [[ARRAYIDX60:%.]] = getelementptr inbounds [2 x [4 x i32]], ptr @c, i16 0, i16 [[I_166]], i16 [[J_165:%.]]
				; CHECK-NEXT: br label [[VECTOR_BODY85_SPLIT1:%.*]]
				; CHECK: for.cond37.preheader.preheader:
				; CHECK-NEXT: br label [[FOR_COND37_PREHEADER:%.*]]
				; CHECK: for.cond37.preheader:
				; CHECK-NEXT: [[J_165]] = phi i16 [ [[INC66:%.]], [[MIDDLE_BLOCK80:%.]] ], [ 0, [[FOR_COND37_PREHEADER_PREHEADER]] ]
				; CHECK-NEXT: br label [[FOR_COND37_PREHEADER_SPLIT:%.*]]
				; CHECK: for.cond37.preheader.split:
				; CHECK-NEXT: br label [[VECTOR_BODY85:%.*]]
				; CHECK: vector.body85:
				; CHECK-NEXT: [[INDEX86:%.]] = phi i16 [ 0, [[FOR_COND37_PREHEADER_SPLIT]] ], [ [[TMP3:%.]], [[VECTOR_BODY85_SPLIT:%.*]] ]
				; CHECK-NEXT: br label [[FOR_COND33_PREHEADER_PREHEADER]]
				; CHECK: vector.body85.split1:
				; CHECK-NEXT: [[TMP0:%.*]] = or i16 [[INDEX86]], 2
				; CHECK-NEXT: [[TMP1:%.*]] = getelementptr inbounds [512 x [4 x i32]], ptr @b, i16 0, i16 [[TMP0]], i16 [[J_165]]
				; CHECK-NEXT: [[TMP2:%.*]] = load i32, ptr [[TMP1]], align 1
				; CHECK-NEXT: [[INDEX_NEXT87:%.*]] = add nuw i16 [[INDEX86]], 4
				; CHECK-NEXT: br label [[FOR_INC68]]
				; CHECK: vector.body85.split:
				; CHECK-NEXT: [[TMP3]] = add nuw i16 [[INDEX86]], 4
				; CHECK-NEXT: br i1 true, label [[MIDDLE_BLOCK80]], label [[VECTOR_BODY85]]
				; CHECK: middle.block80:
				; CHECK-NEXT: [[INC66]] = add nuw nsw i16 [[J_165]], 1
				; CHECK-NEXT: br i1 true, label [[FOR_COND75_PREHEADER:%.*]], label [[FOR_COND37_PREHEADER]]
				; CHECK: for.inc68:
				; CHECK-NEXT: [[INC69]] = add nuw nsw i16 [[I_166]], 1
				; CHECK-NEXT: [[EXITCOND77_NOT:%.*]] = icmp eq i16 [[INC69]], 2
				; CHECK-NEXT: br i1 [[EXITCOND77_NOT]], label [[VECTOR_BODY85_SPLIT]], label [[FOR_COND33_PREHEADER]]
				; CHECK: for.cond75.preheader:
				; CHECK-NEXT: unreachable
				;
				entry:
				br label %for.cond33.preheader

				for.cond33.preheader: ; preds = %for.inc68, %entry
				%i.166 = phi i16 [ %inc69, %for.inc68 ], [ 0, %entry ]
				br label %for.cond37.preheader

				for.cond37.preheader: ; preds = %middle.block80, %for.cond33.preheader
				%j.165 = phi i16 [ 0, %for.cond33.preheader ], [ %inc66, %middle.block80 ]
				%arrayidx60 = getelementptr inbounds [2 x [4 x i32]], ptr @c, i16 0, i16 %i.166, i16 %j.165
				br label %vector.body85

				vector.body85: ; preds = %vector.body85, %for.cond37.preheader
				%index86 = phi i16 [ 0, %for.cond37.preheader ], [ %index.next87, %vector.body85 ]
				%0 = or i16 %index86, 2
				%1 = getelementptr inbounds [512 x [4 x i32]], ptr @b, i16 0, i16 %0, i16 %j.165
				%2 = load i32, ptr %1, align 1
				%index.next87 = add nuw i16 %index86, 4
				br i1 undef, label %middle.block80, label %vector.body85

				middle.block80: ; preds = %vector.body85
				%inc66 = add nuw nsw i16 %j.165, 1
				br i1 undef, label %for.inc68, label %for.cond37.preheader

				for.inc68: ; preds = %middle.block80
				%inc69 = add nuw nsw i16 %i.166, 1
				%exitcond77.not = icmp eq i16 %inc69, 2
				br i1 %exitcond77.not, label %for.cond75.preheader, label %for.cond33.preheader

				for.cond75.preheader: ; preds = %for.inc68
				unreachable
				}