This is an archive of the discontinued LLVM Phabricator instance.

Paths

Table of Contentst

-
llvm/
-
lib/Transforms/Scalar/
-
Transforms/
-
Scalar/
-
LoopInterchange.cpp
-
test/Transforms/LoopInterchange/
-
Transforms/
-
LoopInterchange/
2/5
lcssa-preheader.ll
-
perserve-lcssa.ll
-
pr45743-move-from-inner-preheader.ll

Differential D118102

[LoopInterchange] Detect output dependency of a store instruction with itself
ClosedPublic

Authored by congzhe on Jan 24 2022, 10:01 PM.

Download Raw Diff

Details

Reviewers

Whitney
bmahjour
Meinersbur

Group Reviewers

Restricted Project

Commits

rGabc8ca65c3de: [LoopInterchange] Detect output dependency of a store instruction with itself

Summary

This patch is motivated by pr48057 (https://bugs.llvm.org/show_bug.cgi?id=48057), where an output dependency is not detected since loop interchange did not check a store instruction with itself.

Diff Detail

Repository: rG LLVM Github Monorepo

Event Timeline

congzhe created this revision.Jan 24 2022, 10:01 PM

Herald added a subscriber: hiraditya. · View Herald TranscriptJan 24 2022, 10:01 PM

congzhe requested review of this revision.Jan 24 2022, 10:01 PM

Herald added a project: Restricted Project. · View Herald TranscriptJan 24 2022, 10:01 PM

Herald added a subscriber: llvm-commits. · View Herald Transcript

Harbormaster completed remote builds in B145397: Diff 402761.Jan 26 2022, 1:04 PM

As mentioned in the LoopWG call, I don't think this has anything to do with dominance relations or diverging branches. Instead store i16 %conv9.i, i16* @e may be executed multiple times and the last written value written to @e must the value stored after the loop (the problem should occur even if the store is executed every time, i.e. not diverging). This is a classic output-dependency (write-after-write). I suggest to have a look at the dependency analysis and why it missed this case.

In D118102#3280415, @Meinersbur wrote:

As mentioned in the LoopWG call, I don't think this has anything to do with dominance relations or diverging branches. Instead store i16 %conv9.i, i16* @e may be executed multiple times and the last written value written to @e must the value stored after the loop (the problem should occur even if the store is executed every time, i.e. not diverging). This is a classic output-dependency (write-after-write). I suggest to have a look at the dependency analysis and why it missed this case.

Thanks Michael! I did look futher into it, and I do agree that there is output dependency that writes into @e in each iteration. Nevertheless if the store is executed every time (not diverging), IMHO if I'm not mistaken, it seems that this output dependency does not invalid loop interchange since we only care about the final value written into @e, which is the value of array b[d][0] at the final interation. This value b[d=0][0] does not change before and after interchange. So the output dependence is this example seems to be okay, please correct me if I'm wrong.

Regarding dependency analysis: for this example (e.g., test1() in pr48057.ll), no dependence is detected, since for all mem instructions (on lines 57, 63 ,65 in pr48057.ll), the memory locations do not alias.

I'd appreciate if you could let me know if what I said makes sense to you, thank you very much!

Couldn't the same problem happen in theory without control flow divergence? For example consider a loop like this:

for (c = 0; c <= 7; c++) {
  for (d = 4; d; d--)
    e = ((b[d+2][c]) ? b[d][0] : e);

where the ternary operator turns into a select instruction in LLVM IR.

In D118102#3280506, @congzhe wrote:

Nevertheless if the store is executed every time (not diverging), IMHO if I'm not mistaken, it seems that this output dependency does not invalid loop interchange since we only care about the final value written into @e, which is the value of array b[d][0] at the final interation. This value b[d=0][0] does not change before and after interchange. So the output dependence is this example seems to be okay, please correct me if I'm wrong.

Traditional dependency analysis only considers pairwises dependencies ("hazards") between to execution, which is formulated like "StmtA(i) and StmtB(j) are dependent if they may access the same memory at least on of them is a write" (Read-after-Read is not a dependency). In principle an output-dependency does not matter if overwritten anyway before being used. However, DependenceAnalysis doesn't even have an API to consider that. It also does not make a difference between "unconditionally overwrite" (in literature called a "kill") and a conditional or partial write:

bool Dependence::isOutput() const {
  return Src->mayWriteToMemory() && Dst->mayWriteToMemory();
}

Regarding dependency analysis: for this example (e.g., test1() in pr48057.ll), no dependence is detected, since for all mem instructions (on lines 57, 63 ,65 in pr48057.ll), the memory locations do not alias.

store i16 %conv9.i, i16* @e hazards with itself from different iterations (same memory, both are "write"). It would not be very different if it was an array and indexed e[f(d,c)] where f is a speculable/const function (and potentially always returns 0). If no dependence is detected, I'd consider it a bug.

llvm/test/Transforms/LoopInterchange/pr48057.ll
13–22 ↗	(On Diff #402761)	Instead of using the fuzzer output, it would be nice to reduce it making it more similar to what a user would write (e.g. avoid globals, explicit loop variables, etc). Eg.: void test1(char b[][5], int e) { for (int c = 0; c < 8; ++c) for (int d = 0; d < 5; ++d) if(b[c][d]) e = c + d; }

In D118102#3285218, @bmahjour wrote:
Couldn't the same problem happen in theory without control flow divergence? For example consider a loop like this:
for (c = 0; c <= 7; c++) {
  for (d = 4; d; d--)
    e = ((b[d+2][c]) ? b[d][0] : e);
where the ternary operator turns into a select instruction in LLVM IR.

Thanks for the comment! IIUC the source code you wrote would likely result in control-flow divergence as well, even it is "select" form in the source code. The inner loop would look like the following where e is represented by the phi node %cond:

for.body2:                                        ; preds = %for.cond1.preheader, %cond.end
  %indvars.iv = phi i64 [ 4, %for.cond1.preheader ], [ %indvars.iv.next, %cond.end ]
  %cond12 = phi i16 [ %cond1.lcssa57, %for.cond1.preheader ], [ %cond, %cond.end ]
  %4 = add nuw nsw i64 %indvars.iv, 2
  %arrayidx4 = getelementptr inbounds [8 x [8 x i8]], [8 x [8 x i8]]* @b, i64 0, i64 %4, i64 %indvars.iv9
  %5 = load i8, i8* %arrayidx4, align 1, !tbaa !9
  %tobool5.not = icmp eq i8 %5, 0
  br i1 %tobool5.not, label %cond.end, label %cond.true

cond.true:                                        ; preds = %for.body2
  %arrayidx8 = getelementptr inbounds [8 x [8 x i8]], [8 x [8 x i8]]* @b, i64 0, i64 %indvars.iv, i64 0
  %6 = load i8, i8* %arrayidx8, align 8, !tbaa !9
  %conv9 = sext i8 %6 to i16
  br label %cond.end

cond.end:                                         ; preds = %for.body2, %cond.true
  %cond = phi i16 [ %conv9, %cond.true ], [ %cond12, %for.body2 ]
  %indvars.iv.next = add nsw i64 %indvars.iv, -1
  %tobool.not = icmp eq i64 %indvars.iv.next, 0
  br i1 %tobool.not, label %for.inc12, label %for.body2, !llvm.loop !10

Secondly, I've manually changed the IR to make it use "select" instructions and not divergent (doing if-conversion essentially). This way loop interchange would bail out from "unable to recognize reduction or induction phis", since e is a phi node and the operation on e is like a "reduction" but the "reduction operator" is the select instruction, so it is not a real reduction and loop interchange bails. Hence I think this case is under our control.

Just to summarize, IMO I guess we would need to handle output dependency under control-flow dependence anyways. I'm looking forward to your thoughts.

In D118102#3287827, @Meinersbur wrote:

store i16 %conv9.i, i16* @e hazards with itself from different iterations (same memory, both are "write"). It would not be very different if it was an array and indexed e[f(d,c)] where f is a speculable/const function (and potentially always returns 0). If no dependence is detected, I'd consider it a bug.

Thanks Michael, when we check dependency in loop interchange, we check instructions in pair and we did not check an instruction with itself. That's why we did not detect the output dependency that store i16 %conv9.i, i16* @e hazards with itself. I've updated the check so I could detect the output dependency now (patch not updated yet).

In principle an output-dependency does not matter if overwritten anyway before being used. However, DependenceAnalysis doesn't even have an API to consider that. It also does not make a difference between "unconditionally overwrite" (in literature called a "kill") and a conditional or partial write:

I agree with you. The problem in this bug is the "conditional or partial write" you mentioned, in other words, output dependency under control-flow divergence. I can add code to detect "output dependency under control-flow divergence" in loop interchange, or in DependenceAnalysis.cpp. IIUC you prefer to do it in DependenceAnalysis.cpp and add new API there, so I'm working on an design and I'd appreciate it if we could further discuss it maybe on the next loop opt meeting.

In D118102#3305242, @congzhe wrote:

I've updated the check so I could detect the output dependency now (patch not updated yet).

I don't see the update yet.

The problem in this bug is the "conditional or partial write" you mentioned, in other words, output dependency under control-flow divergence. I can add code to detect "output dependency under control-flow divergence" in loop interchange, or in DependenceAnalysis.cpp.

The problem is more complex than that, it's not just control-flow divergence (by which I think you mean the write being conditional; if conditional on a loop-invariant condition it could be solved with a loop unswitch), but any kind of "non-kill" memory write (e.g. partial overwrite of only some of the bytes, unpredictable/non-constant indices or base addresses, reuse of previous values in eg. an overlapping memmove call, etc.) and potential reads before the kill instruction. If the address can be accessed by multiple instructions, this is an NP-complete problem.

You are free to tackle that problem maybe just for special cases, but I suggest to try that in a separate patch.

congzhe updated this revision to Diff 407018.Feb 8 2022, 5:58 PM

congzhe retitled this revision from [LoopInterchange] Prevent interchange with unsafe control-flow divergence inside inner loops (PR48057) to [LoopInterchange] WIP: Prevent interchange with unsafe control-flow divergence inside inner loops (PR48057).

In D118102#3306204, @Meinersbur wrote:

In D118102#3305242, @congzhe wrote:

I've updated the check so I could detect the output dependency now (patch not updated yet).

I don't see the update yet.

Updated the patch for now, such that during dependence analysis in loop interchange, it takes into account a store instruction with itself, thus can determine the output dependency. If we detect a "Scalar" output dependency under control-flow divergence, we would just bail.

Nevertheless this change exposed another problem. It fails three lit test cases under loop interchange: lcssa-preheader.ll, perserve-lcssa.ll, pr45743-move-from-inner-preheader.ll. They fail because there is a store instruction inside the innermost loop like this:

; inside the inner loop, @Array is a 2D array
%Address  = gep @Array, 0, InnerIndvar, OuterIndvar
store 0,  %Address

We should be able to analyze that there is no output dependency between this store and itself since the direction vector should be an all-zero vector. However, da returns a direction vector of "* *" for this output dependency meaning we don't know about dependency directions and we bail from loop interchange, which we should not.

Harbormaster completed remote builds in B148395: Diff 407018.Feb 8 2022, 8:52 PM

congzhe updated this revision to Diff 407746.Feb 10 2022, 7:19 PM

congzhe retitled this revision from [LoopInterchange] WIP: Prevent interchange with unsafe control-flow divergence inside inner loops (PR48057) to [LoopInterchange] Detect output dependency of a store instruction with itself.

congzhe edited the summary of this revision. (Show Details)

According to our discussion, I've updated the test cases and retitled this patch to "Detect output dependency of a store instruction with itself". I revised three test files to make dependence analysis work more reasonably: lcssa-preheader.ll, perserve-lcssa.ll, pr45743-move-from-inner-preheader.ll.

I removed the legacy test case lcssa_08() in lcssa-preheader.ll since it says we should not move the instruction %wide.trip.count = zext i32 %m to i64 to BB outer.header otherwise lcssa would break. The original patch (https://reviews.llvm.org/D75943) claimed it should be moved into outer.preheader, which was the initial behavior of this test.

In the early days our capability to deal with lcssa phis was limited. More recently after this patch https://reviews.llvm.org/rG8393b9fd1f36d9273fa0720872e3996495aacc1c, we do move %wide.trip.count = zext i32 %m to i64` to BB outer.header, because we've dealt with lcssa phis more appropriately. This to some extent invalidates the original test lcssa_08(), hence I removed it in this patch.

Harbormaster completed remote builds in B148891: Diff 407746.Feb 10 2022, 8:04 PM

LGTM, thank you

This revision is now accepted and ready to land.Mar 2 2022, 6:52 AM

Herald added a project: Restricted Project. · View Herald TranscriptMar 2 2022, 6:52 AM

bmahjour requested changes to this revision.Mar 2 2022, 6:56 AM

bmahjour added inline comments.

llvm/test/Transforms/LoopInterchange/lcssa-preheader.ll
5	This is a valid test case to interchange, and looks like after this patch, it won't be interchanged anymore. I believe @congzhe is looking into seeing if we can treat scalar output dependencies as interchange-preventing so that we can still get scenarios like this one.

This revision now requires changes to proceed.Mar 2 2022, 6:56 AM

llvm/test/Transforms/LoopInterchange/lcssa-preheader.ll
5	Thanks Bardia, likely I was not clear enough during our last discussion -- just to clarify the fact that this test case wont be interchanged after this patch is due to the insufficient capability of DependenceAnalysis/Delinearization which gives [* ] as the direction vector. Remember that we now detect output dependency and take the store instruction into consideration, which results in a direction vector of [ ]. IIUC this problem seems independent of the current patch -- this patch fixes the deficiency that output dependencies are not considered, if it exposes other problems in other aspects I'm thinking to fix them in other patches maybe? Previously I thought to remove this test since it seems not very meaningful. But actually with a trivial change in the test (similar to the trival changes in the two other tests in this patch), DependenceAnalysis could work properly and this test can be interchanged. I'll update the patch accordingly. Regarding your comment that `if we can treat scalar output dependencies as interchange-preventing so that we can still get scenarios like this one`: I would like to first clarify again that scalar dependency is not related to this test/scenario, what fails this test is that dependence analysis gives [ *] as the direction vector. I've also followed your suggestion and checked that if we treat scalar output dependencies as interchange-preventing, we would fail the following 10 tests. I looked into a few of them and IMHO it seems that scalar output dependencies seem to be legal in terms of interchange. Please correct me if I'm wrong. Failed Tests (10): LLVM :: Transforms/LoopInterchange/inner-only-reductions.ll LLVM :: Transforms/LoopInterchange/innermost-latch-uses-values-in-middle-header.ll LLVM :: Transforms/LoopInterchange/interchange-no-deps.ll LLVM :: Transforms/LoopInterchange/lcssa.ll LLVM :: Transforms/LoopInterchange/outer-header-jump-to-inner-latch.ll LLVM :: Transforms/LoopInterchange/phi-ordering.ll LLVM :: Transforms/LoopInterchange/pr43176-move-to-new-latch.ll LLVM :: Transforms/LoopInterchange/pr43797-lcssa-for-multiple-outer-loop-blocks.ll LLVM :: Transforms/LoopInterchange/reductions-across-inner-and-outer-loop.ll LLVM :: Transforms/LoopInterchange/vector-gep-operand.ll

Updated a previous failing test case in lcssa-preheader.ll.

Previously this test could not be interchanged since dependence analysis would return [* *] as the direction vector for the output dependency detected, now it could be interchanged.

Harbormaster completed remote builds in B152289: Diff 412596.Mar 2 2022, 8:56 PM

bmahjour added inline comments.Mar 3 2022, 6:45 AM

llvm/test/Transforms/LoopInterchange/lcssa-preheader.ll
5	... this test case wont be interchanged after this patch is due to the insufficient capability of DependenceAnalysis/Delinearization which gives [* ] as the direction vector. Remember that we now detect output dependency and take the store instruction into consideration, which results in a direction vector of [ ]. Sure, DependenceAnalysis may have issues, but I'm wondering if those same "issues" are the reason this patch prevents interchange for the motivating test case. For instance, for `lcssa_08` the reason we see [ *] is because of delinearization issues which could be improved in future (you can test and verify this with `-da-disable-delinearization-checks`). I think keeping this test and adding `-da-disable-delinearization-checks` is better than removing it all together. Also with `-da-disable-delinearization-checks` does the motivating test case still pass with this patch? I've also followed your suggestion and checked that if we treat scalar output dependencies as interchange-preventing, we would fail the following 10 tests. Thanks for trying this. I looked into a few of them and IMHO it seems that scalar output dependencies seem to be legal in terms of interchange Could you elaborate on why you think they should not be interchange preventing? This whole patch is premised on detecting output dependencies and treating them as interchange preventing.

congzhe added inline comments.Mar 4 2022, 9:47 AM

llvm/test/Transforms/LoopInterchange/lcssa-preheader.ll
5	I think keeping this test and adding -da-disable-delinearization-checks is better than removing it all together. I fully agree, we can interchange this case after adding -da-disable-delinearization-checks. I'll update the patch shortly. I'm wondering if those same "issues" are the reason this patch prevents interchange for the motivating test case. Just to clarify a bit, the current patch does not prevent interchange for any case, what this patch does is to fix the deficiency that we did not detect output dependency before. I'm hoping this is one step closer to solving the motivating bug. Could you elaborate on why you think they should not be interchange preventing? This whole patch is premised on detecting output dependencies and treating them as interchange preventing. If we take a look at `no_mem_instrs()` in `Transforms/LoopInterchange/interchange-no-deps.ll`, the output dependency on `store i64 %indvars.iv, i64* %ptr, align 4` is a scalar output dependency and is supposed to be legal to interchange, if we treat scalar output dependencies as interchange-preventing, this test would fail. Similar situation occurs for `lcssa_05()` and `lcssa_06()` in `Transforms/LoopInterchange/lcssa.ll`.

Updated test lcssa_08() by adding -da-disable-delinearization-check such that it could be interchanged.

Fixed an error in my previous update.

Harbormaster completed remote builds in B152620: Diff 413052.Mar 4 2022, 10:21 AM

bmahjour added inline comments.Mar 7 2022, 12:29 PM

llvm/test/Transforms/LoopInterchange/lcssa-preheader.ll
5	Just to clarify a bit, the current patch does not prevent interchange for any case, what this patch does is to fix the deficiency that we did not detect output dependency before. I'm hoping this is one step closer to solving the motivating bug. Does that mean this patch does not solve the motivating problem (pr48057) either? If that's the case, I think we should try to understand the problem better before proceeding further. I noticed a different miscompile due to incorrect handling of scalar dependencies and reported it here: https://github.com/llvm/llvm-project/issues/54176. Scalar dependencies are special in that they carry across all iterations of the loop, so they must be handled with care. For correctness, I think we should treat them more conservatively by default and only allow interchange when it can be proven to be safe. I'm planning to take a deeper look into the legality logic of loop interchange later this week.

With the recent updates we are no longer losing test coverage. Per discussion in the Loop Opt WG call, I'll approve to get this in as a small incremental improvement. I'll look into https://github.com/llvm/llvm-project/issues/54176 and follow up with more patches.

This revision is now accepted and ready to land.Mar 9 2022, 9:04 AM

In D118102#3370252, @bmahjour wrote:

With the recent updates we are no longer losing test coverage. Per discussion in the Loop Opt WG call, I'll approve to get this in as a small incremental improvement. I'll look into https://github.com/llvm/llvm-project/issues/54176 and follow up with more patches.

Thanks a lot Bardia, very much appreciate it :)

This revision was landed with ongoing or failed builds.Mar 9 2022, 12:56 PM

Closed by commit rGabc8ca65c3de: [LoopInterchange] Detect output dependency of a store instruction with itself (authored by congzhe). · Explain Why

This revision was automatically updated to reflect the committed changes.

congzhe added a commit: rGabc8ca65c3de: [LoopInterchange] Detect output dependency of a store instruction with itself.

Revision Contents

Path

Size

llvm/

lib/

Transforms/

Scalar/

LoopInterchange.cpp

2 lines

test/

Transforms/

LoopInterchange/

lcssa-preheader.ll

89 lines

perserve-lcssa.ll

8 lines

pr45743-move-from-inner-preheader.ll

10 lines

Diff 414192

llvm/lib/Transforms/Scalar/LoopInterchange.cpp

Show First 20 Lines • Show All 112 Lines • ▼ Show 20 Lines	static bool populateDependencyMatrix(CharMatrix &DepMatrix, unsigned Level,

ValueVector::iterator I, IE, J, JE;		ValueVector::iterator I, IE, J, JE;

for (I = MemInstr.begin(), IE = MemInstr.end(); I != IE; ++I) {		for (I = MemInstr.begin(), IE = MemInstr.end(); I != IE; ++I) {
for (J = I, JE = MemInstr.end(); J != JE; ++J) {		for (J = I, JE = MemInstr.end(); J != JE; ++J) {
std::vector<char> Dep;		std::vector<char> Dep;
Instruction Src = cast<Instruction>(I);		Instruction Src = cast<Instruction>(I);
Instruction Dst = cast<Instruction>(J);		Instruction Dst = cast<Instruction>(J);
if (Src == Dst)
continue;
// Ignore Input dependencies.		// Ignore Input dependencies.
if (isa<LoadInst>(Src) && isa<LoadInst>(Dst))		if (isa<LoadInst>(Src) && isa<LoadInst>(Dst))
continue;		continue;
// Track Output, Flow, and Anti dependencies.		// Track Output, Flow, and Anti dependencies.
if (auto D = DI->depends(Src, Dst, true)) {		if (auto D = DI->depends(Src, Dst, true)) {
assert(D->isOrdered() && "Expected an output, flow or anti dep.");		assert(D->isOrdered() && "Expected an output, flow or anti dep.");
LLVM_DEBUG(StringRef DepType =		LLVM_DEBUG(StringRef DepType =
D->isFlow() ? "flow" : D->isAnti() ? "anti" : "output";		D->isFlow() ? "flow" : D->isAnti() ? "anti" : "output";
▲ Show 20 Lines • Show All 1,597 Lines • Show Last 20 Lines

llvm/test/Transforms/LoopInterchange/lcssa-preheader.ll

	; NOTE: Assertions have been autogenerated by utils/update_test_checks.py			; NOTE: Assertions have been autogenerated by utils/update_test_checks.py
	; RUN: opt < %s -basic-aa -loop-interchange -pass-remarks-missed='loop-interchange' -verify-loop-lcssa -S \| FileCheck %s			; RUN: opt < %s -basic-aa -loop-interchange -pass-remarks-missed='loop-interchange' -verify-loop-lcssa -S \| FileCheck %s
				; RUN: opt < %s -basic-aa -loop-interchange -da-disable-delinearization-checks -pass-remarks-missed='loop-interchange' -verify-loop-lcssa -S \| FileCheck -check-prefix=CHECK-DELIN %s

	target datalayout = "e-m:e-i64:64-f80:128-n8:16:32:64-S128"			target datalayout = "e-m:e-i64:64-f80:128-n8:16:32:64-S128"

	bmahjourUnsubmitted Not Done Reply Inline Actions This is a valid test case to interchange, and looks like after this patch, it won't be interchanged anymore. I believe @congzhe is looking into seeing if we can treat scalar output dependencies as interchange-preventing so that we can still get scenarios like this one. bmahjour: This is a valid test case to interchange, and looks like after this patch, it won't be…
	congzheAuthorUnsubmitted Done Reply Inline Actions Thanks Bardia, likely I was not clear enough during our last discussion -- just to clarify the fact that this test case wont be interchanged after this patch is due to the insufficient capability of DependenceAnalysis/Delinearization which gives [* ] as the direction vector. Remember that we now detect output dependency and take the store instruction into consideration, which results in a direction vector of [ ]. IIUC this problem seems independent of the current patch -- this patch fixes the deficiency that output dependencies are not considered, if it exposes other problems in other aspects I'm thinking to fix them in other patches maybe? Previously I thought to remove this test since it seems not very meaningful. But actually with a trivial change in the test (similar to the trival changes in the two other tests in this patch), DependenceAnalysis could work properly and this test can be interchanged. I'll update the patch accordingly. Regarding your comment that `if we can treat scalar output dependencies as interchange-preventing so that we can still get scenarios like this one`: I would like to first clarify again that scalar dependency is not related to this test/scenario, what fails this test is that dependence analysis gives [ ] as the direction vector. I've also followed your suggestion and checked that if we treat scalar output dependencies as interchange-preventing, we would fail the following 10 tests. I looked into a few of them and IMHO it seems that scalar output dependencies seem to be legal in terms of interchange. Please correct me if I'm wrong. Failed Tests (10): LLVM :: Transforms/LoopInterchange/inner-only-reductions.ll LLVM :: Transforms/LoopInterchange/innermost-latch-uses-values-in-middle-header.ll LLVM :: Transforms/LoopInterchange/interchange-no-deps.ll LLVM :: Transforms/LoopInterchange/lcssa.ll LLVM :: Transforms/LoopInterchange/outer-header-jump-to-inner-latch.ll LLVM :: Transforms/LoopInterchange/phi-ordering.ll LLVM :: Transforms/LoopInterchange/pr43176-move-to-new-latch.ll LLVM :: Transforms/LoopInterchange/pr43797-lcssa-for-multiple-outer-loop-blocks.ll LLVM :: Transforms/LoopInterchange/reductions-across-inner-and-outer-loop.ll LLVM :: Transforms/LoopInterchange/vector-gep-operand.ll congzhe:* Thanks Bardia, likely I was not clear enough during our last discussion -- just to clarify the…
	bmahjourUnsubmitted Not Done Reply Inline Actions ... this test case wont be interchanged after this patch is due to the insufficient capability of DependenceAnalysis/Delinearization which gives [* ] as the direction vector. Remember that we now detect output dependency and take the store instruction into consideration, which results in a direction vector of [ ]. Sure, DependenceAnalysis may have issues, but I'm wondering if those same "issues" are the reason this patch prevents interchange for the motivating test case. For instance, for `lcssa_08` the reason we see [ ] is because of delinearization issues which could be improved in future (you can test and verify this with `-da-disable-delinearization-checks`). I think keeping this test and adding `-da-disable-delinearization-checks` is better than removing it all together. Also with `-da-disable-delinearization-checks` does the motivating test case still pass with this patch? I've also followed your suggestion and checked that if we treat scalar output dependencies as interchange-preventing, we would fail the following 10 tests. Thanks for trying this. I looked into a few of them and IMHO it seems that scalar output dependencies seem to be legal in terms of interchange Could you elaborate on why you think they should not be interchange preventing? This whole patch is premised on detecting output dependencies and treating them as interchange preventing. bmahjour:* > ... this test case wont be interchanged after this patch is due to the insufficient…
	congzheAuthorUnsubmitted Done Reply Inline Actions I think keeping this test and adding -da-disable-delinearization-checks is better than removing it all together. I fully agree, we can interchange this case after adding -da-disable-delinearization-checks. I'll update the patch shortly. I'm wondering if those same "issues" are the reason this patch prevents interchange for the motivating test case. Just to clarify a bit, the current patch does not prevent interchange for any case, what this patch does is to fix the deficiency that we did not detect output dependency before. I'm hoping this is one step closer to solving the motivating bug. Could you elaborate on why you think they should not be interchange preventing? This whole patch is premised on detecting output dependencies and treating them as interchange preventing. If we take a look at `no_mem_instrs()` in `Transforms/LoopInterchange/interchange-no-deps.ll`, the output dependency on `store i64 %indvars.iv, i64* %ptr, align 4` is a scalar output dependency and is supposed to be legal to interchange, if we treat scalar output dependencies as interchange-preventing, this test would fail. Similar situation occurs for `lcssa_05()` and `lcssa_06()` in `Transforms/LoopInterchange/lcssa.ll`. congzhe: > I think keeping this test and adding -da-disable-delinearization-checks is better than…
	bmahjourUnsubmitted Not Done Reply Inline Actions Just to clarify a bit, the current patch does not prevent interchange for any case, what this patch does is to fix the deficiency that we did not detect output dependency before. I'm hoping this is one step closer to solving the motivating bug. Does that mean this patch does not solve the motivating problem (pr48057) either? If that's the case, I think we should try to understand the problem better before proceeding further. I noticed a different miscompile due to incorrect handling of scalar dependencies and reported it here: https://github.com/llvm/llvm-project/issues/54176. Scalar dependencies are special in that they carry across all iterations of the loop, so they must be handled with care. For correctness, I think we should treat them more conservatively by default and only allow interchange when it can be proven to be safe. I'm planning to take a deeper look into the legality logic of loop interchange later this week. bmahjour: > Just to clarify a bit, the current patch does not prevent interchange for any case, what this…
	; void foo(int n, int m) {			; void foo(int n, int m) {
	; int temp[16][16];			; int temp[16][16];
	; int res[16][16];			; int res[16][16];
	; for(int i = 0; i < n; i++) {			; for(int i = 0; i < n; i++) {
	; for(int j = 0; j < m; j++)			; for(int j = 0; j < m; j++)
	; res[j][i] = temp[j][i];			; res[j][i] = temp[j][i];
	; }			; }
	; }			; }

	define void @lcssa_08(i32 %n, i32 %m) {			;; This loop can be interchanged with -da-disable-delinearization-checks, otherwise it cannot
	; CHECK-LABEL: @lcssa_08(			;; be interchanged due to dependence.
	; CHECK-NEXT: entry:			define void @lcssa_08(i32 %n, i32 %m) {;
	; CHECK-NEXT: [[TEMP:%.*]] = alloca [16 x [16 x i32]], align 4			; CHECK-DELIN-LABEL: @lcssa_08(
	; CHECK-NEXT: [[RES:%.*]] = alloca [16 x [16 x i32]], align 4			; CHECK-DELIN-NEXT: entry:
	; CHECK-NEXT: [[CMP24:%.]] = icmp sgt i32 [[N:%.]], 0			; CHECK-DELIN-NEXT: [[TEMP:%.*]] = alloca [16 x [16 x i32]], align 4
	; CHECK-NEXT: br i1 [[CMP24]], label [[INNER_PREHEADER:%.]], label [[FOR_COND_CLEANUP:%.]]			; CHECK-DELIN-NEXT: [[RES:%.*]] = alloca [16 x [16 x i32]], align 4
	; CHECK: outer.preheader:			; CHECK-DELIN-NEXT: [[CMP24:%.]] = icmp sgt i32 [[N:%.]], 0
	; CHECK-NEXT: br label [[OUTER_HEADER:%.*]]			; CHECK-DELIN-NEXT: br i1 [[CMP24]], label [[INNER_PREHEADER:%.]], label [[FOR_COND_CLEANUP:%.]]
	; CHECK: outer.header:			; CHECK-DELIN: outer.preheader:
	; CHECK-NEXT: [[INDVARS_IV27:%.]] = phi i64 [ 0, [[OUTER_PREHEADER:%.]] ], [ [[INDVARS_IV_NEXT28:%.]], [[OUTER_LATCH:%.]] ]			; CHECK-DELIN-NEXT: br label [[OUTER_HEADER:%.*]]
	; CHECK-NEXT: [[CMP222:%.]] = icmp sgt i32 [[M:%.]], 0			; CHECK-DELIN: outer.header:
	; CHECK-NEXT: [[WIDE_TRIP_COUNT:%.*]] = zext i32 [[M]] to i64			; CHECK-DELIN-NEXT: [[INDVARS_IV27:%.]] = phi i64 [ 0, [[OUTER_PREHEADER:%.]] ], [ [[INDVARS_IV_NEXT28:%.]], [[OUTER_LATCH:%.]] ]
	; CHECK-NEXT: br i1 [[CMP222]], label [[INNER_FOR_BODY_SPLIT1:%.]], label [[INNER_FOR_BODY_SPLIT:%.]]			; CHECK-DELIN-NEXT: [[CMP222:%.]] = icmp sgt i32 [[M:%.]], 0
	; CHECK: inner.preheader:			; CHECK-DELIN-NEXT: [[WIDE_TRIP_COUNT:%.*]] = zext i32 [[M]] to i64
	; CHECK-NEXT: [[WIDE_TRIP_COUNT29:%.*]] = zext i32 [[N]] to i64			; CHECK-DELIN-NEXT: br i1 [[CMP222]], label [[INNER_FOR_BODY_SPLIT1:%.]], label [[INNER_FOR_BODY_SPLIT:%.]]
	; CHECK-NEXT: br label [[INNER_FOR_BODY:%.*]]			; CHECK-DELIN: inner.preheader:
	; CHECK: inner.for.body:			; CHECK-DELIN-NEXT: [[WIDE_TRIP_COUNT29:%.*]] = zext i32 [[N]] to i64
	; CHECK-NEXT: [[INDVARS_IV:%.]] = phi i64 [ 0, [[INNER_PREHEADER]] ], [ [[TMP1:%.]], [[INNER_FOR_BODY_SPLIT]] ]			; CHECK-DELIN-NEXT: br label [[INNER_FOR_BODY:%.*]]
	; CHECK-NEXT: br label [[OUTER_PREHEADER]]			; CHECK-DELIN: inner.for.body:
	; CHECK: inner.for.body.split1:			; CHECK-DELIN-NEXT: [[INDVARS_IV:%.]] = phi i64 [ 0, [[INNER_PREHEADER]] ], [ [[TMP1:%.]], [[INNER_FOR_BODY_SPLIT]] ]
	; CHECK-NEXT: [[ARRAYIDX6:%.]] = getelementptr inbounds [16 x [16 x i32]], [16 x [16 x i32]] [[TEMP]], i64 0, i64 [[INDVARS_IV]], i64 [[INDVARS_IV27]]			; CHECK-DELIN-NEXT: br label [[OUTER_PREHEADER]]
	; CHECK-NEXT: [[TMP0:%.]] = load i32, i32 [[ARRAYIDX6]], align 4			; CHECK-DELIN: inner.for.body.split1:
	; CHECK-NEXT: [[ARRAYIDX8:%.]] = getelementptr inbounds [16 x [16 x i32]], [16 x [16 x i32]] [[RES]], i64 0, i64 [[INDVARS_IV]], i64 [[INDVARS_IV27]]			; CHECK-DELIN-NEXT: [[ARRAYIDX6:%.]] = getelementptr inbounds [16 x [16 x i32]], [16 x [16 x i32]] [[TEMP]], i64 0, i64 [[INDVARS_IV]], i64 [[INDVARS_IV27]]
	; CHECK-NEXT: store i32 [[TMP0]], i32* [[ARRAYIDX8]], align 4			; CHECK-DELIN-NEXT: [[TMP0:%.]] = load i32, i32 [[ARRAYIDX6]], align 4
	; CHECK-NEXT: [[INDVARS_IV_NEXT:%.*]] = add nuw nsw i64 [[INDVARS_IV]], 1			; CHECK-DELIN-NEXT: [[ARRAYIDX8:%.]] = getelementptr inbounds [16 x [16 x i32]], [16 x [16 x i32]] [[RES]], i64 0, i64 [[INDVARS_IV]], i64 [[INDVARS_IV27]]
	; CHECK-NEXT: [[EXITCOND:%.*]] = icmp ne i64 [[INDVARS_IV_NEXT]], [[WIDE_TRIP_COUNT]]			; CHECK-DELIN-NEXT: store i32 [[TMP0]], i32* [[ARRAYIDX8]], align 4
	; CHECK-NEXT: br label [[INNER_CRIT_EDGE:%.*]]			; CHECK-DELIN-NEXT: [[INDVARS_IV_NEXT:%.*]] = add nuw nsw i64 [[INDVARS_IV]], 1
	; CHECK: inner.for.body.split:			; CHECK-DELIN-NEXT: [[EXITCOND:%.*]] = icmp ne i64 [[INDVARS_IV_NEXT]], [[WIDE_TRIP_COUNT]]
	; CHECK-NEXT: [[WIDE_TRIP_COUNT_LCSSA:%.*]] = phi i64 [ [[WIDE_TRIP_COUNT]], [[OUTER_LATCH]] ], [ [[WIDE_TRIP_COUNT]], [[OUTER_HEADER]] ]			; CHECK-DELIN-NEXT: br label [[INNER_CRIT_EDGE:%.*]]
	; CHECK-NEXT: [[TMP1]] = add nuw nsw i64 [[INDVARS_IV]], 1			; CHECK-DELIN: inner.for.body.split:
	; CHECK-NEXT: [[TMP2:%.*]] = icmp ne i64 [[TMP1]], [[WIDE_TRIP_COUNT_LCSSA]]			; CHECK-DELIN-NEXT: [[WIDE_TRIP_COUNT_LCSSA:%.*]] = phi i64 [ [[WIDE_TRIP_COUNT]], [[OUTER_LATCH]] ], [ [[WIDE_TRIP_COUNT]], [[OUTER_HEADER]] ]
	; CHECK-NEXT: br i1 [[TMP2]], label [[INNER_FOR_BODY]], label [[OUTER_CRIT_EDGE:%.*]]			; CHECK-DELIN-NEXT: [[TMP1]] = add nuw nsw i64 [[INDVARS_IV]], 1
	; CHECK: inner.crit_edge:			; CHECK-DELIN-NEXT: [[TMP2:%.*]] = icmp ne i64 [[TMP1]], [[WIDE_TRIP_COUNT_LCSSA]]
	; CHECK-NEXT: br label [[OUTER_LATCH]]			; CHECK-DELIN-NEXT: br i1 [[TMP2]], label [[INNER_FOR_BODY]], label [[OUTER_CRIT_EDGE:%.*]]
	; CHECK: outer.latch:			; CHECK-DELIN: inner.crit_edge:
	; CHECK-NEXT: [[INDVARS_IV_NEXT28]] = add nuw nsw i64 [[INDVARS_IV27]], 1			; CHECK-DELIN-NEXT: br label [[OUTER_LATCH]]
	; CHECK-NEXT: [[EXITCOND30:%.*]] = icmp ne i64 [[INDVARS_IV_NEXT28]], [[WIDE_TRIP_COUNT29]]			; CHECK-DELIN: outer.latch:
	; CHECK-NEXT: br i1 [[EXITCOND30]], label [[OUTER_HEADER]], label [[INNER_FOR_BODY_SPLIT]]			; CHECK-DELIN-NEXT: [[INDVARS_IV_NEXT28]] = add nuw nsw i64 [[INDVARS_IV27]], 1
	; CHECK: outer.crit_edge:			; CHECK-DELIN-NEXT: [[EXITCOND30:%.*]] = icmp ne i64 [[INDVARS_IV_NEXT28]], [[WIDE_TRIP_COUNT29]]
	; CHECK-NEXT: br label [[FOR_COND_CLEANUP]]			; CHECK-DELIN-NEXT: br i1 [[EXITCOND30]], label [[OUTER_HEADER]], label [[INNER_FOR_BODY_SPLIT]]
	; CHECK: for.cond.cleanup:			; CHECK-DELIN: outer.crit_edge:
	; CHECK-NEXT: ret void			; CHECK-DELIN-NEXT: br label [[FOR_COND_CLEANUP]]
				; CHECK-DELIN: for.cond.cleanup:
				; CHECK-DELIN-NEXT: ret void
	;			;
	entry:			entry:
	%temp = alloca [16 x [16 x i32]], align 4			%temp = alloca [16 x [16 x i32]], align 4
	%res = alloca [16 x [16 x i32]], align 4			%res = alloca [16 x [16 x i32]], align 4
	%cmp24 = icmp sgt i32 %n, 0			%cmp24 = icmp sgt i32 %n, 0
	br i1 %cmp24, label %outer.preheader, label %for.cond.cleanup			br i1 %cmp24, label %outer.preheader, label %for.cond.cleanup

	outer.preheader: ; preds = %entry			outer.preheader: ; preds = %entry
	▲ Show 20 Lines • Show All 101 Lines • Show Last 20 Lines

llvm/test/Transforms/LoopInterchange/perserve-lcssa.ll

; RUN: opt < %s -loop-interchange -loop-interchange-threshold=-100 -verify-loop-lcssa -S \| FileCheck %s		; RUN: opt < %s -loop-interchange -loop-interchange-threshold=-100 -verify-loop-lcssa -S \| FileCheck %s

; Test case for PR41725. The induction variables in the latches escape the		; Test case for PR41725. The induction variables in the latches escape the
; loops and we must move some PHIs around.		; loops and we must move some PHIs around.

@a = common dso_local global i64 0, align 4		@a = common dso_local global i64 0, align 4
@b = common dso_local global i64 0, align 4		@b = common dso_local global i64 0, align 4
@c = common dso_local global [10 x [1 x i32 ]] zeroinitializer, align 16		@c = common dso_local global [10 x [10 x i32 ]] zeroinitializer, align 16


define void @test_lcssa_indvars1() {		define void @test_lcssa_indvars1() {
; CHECK-LABEL: @test_lcssa_indvars1()		; CHECK-LABEL: @test_lcssa_indvars1()
; CHECK-LABEL: inner.body:		; CHECK-LABEL: inner.body:
; CHECK-NEXT: %iv.inner = phi i64 [ %[[IVNEXT:[0-9]+]], %inner.body.split ], [ 5, %inner.body.preheader ]		; CHECK-NEXT: %iv.inner = phi i64 [ %[[IVNEXT:[0-9]+]], %inner.body.split ], [ 5, %inner.body.preheader ]

; CHECK-LABEL: inner.body.split:		; CHECK-LABEL: inner.body.split:
Show All 12 Lines	entry:
br label %outer.header		br label %outer.header

outer.header: ; preds = %outer.latch, %entry		outer.header: ; preds = %outer.latch, %entry
%iv.outer = phi i64 [ 0, %entry ], [ %iv.outer.next, %outer.latch ]		%iv.outer = phi i64 [ 0, %entry ], [ %iv.outer.next, %outer.latch ]
br label %inner.body		br label %inner.body

inner.body: ; preds = %inner.body, %outer.header		inner.body: ; preds = %inner.body, %outer.header
%iv.inner = phi i64 [ 5, %outer.header ], [ %iv.inner.next, %inner.body ]		%iv.inner = phi i64 [ 5, %outer.header ], [ %iv.inner.next, %inner.body ]
%v7 = getelementptr inbounds [10 x [1 x i32]], [10 x [1 x i32]]* @c, i64 0, i64 %iv.inner, i64 %iv.outer		%v7 = getelementptr inbounds [10 x [10 x i32]], [10 x [10 x i32]]* @c, i64 0, i64 %iv.inner, i64 %iv.outer
store i32 0, i32* %v7, align 4		store i32 0, i32* %v7, align 4
%iv.inner.next = add nsw i64 %iv.inner, -1		%iv.inner.next = add nsw i64 %iv.inner, -1
%v9 = icmp eq i64 %iv.inner, 0		%v9 = icmp eq i64 %iv.inner, 0
br i1 %v9, label %outer.latch, label %inner.body		br i1 %v9, label %outer.latch, label %inner.body

outer.latch: ; preds = %inner.body		outer.latch: ; preds = %inner.body
%v8.lcssa = phi i64 [ %iv.inner.next, %inner.body ]		%v8.lcssa = phi i64 [ %iv.inner.next, %inner.body ]
%iv.outer.next = add nuw nsw i64 %iv.outer, 1		%iv.outer.next = add nuw nsw i64 %iv.outer, 1
Show All 30 Lines	entry:
br label %outer.header		br label %outer.header

outer.header: ; preds = %outer.latch, %entry		outer.header: ; preds = %outer.latch, %entry
%iv.outer = phi i64 [ 0, %entry ], [ %iv.outer.next, %outer.latch ]		%iv.outer = phi i64 [ 0, %entry ], [ %iv.outer.next, %outer.latch ]
br label %inner.body		br label %inner.body

inner.body: ; preds = %inner.body, %outer.header		inner.body: ; preds = %inner.body, %outer.header
%iv.inner = phi i64 [ 5, %outer.header ], [ %iv.inner.next, %inner.body ]		%iv.inner = phi i64 [ 5, %outer.header ], [ %iv.inner.next, %inner.body ]
%v7 = getelementptr inbounds [10 x [1 x i32]], [10 x [1 x i32]]* @c, i64 0, i64 %iv.inner, i64 %iv.outer		%v7 = getelementptr inbounds [10 x [10 x i32]], [10 x [10 x i32]]* @c, i64 0, i64 %iv.inner, i64 %iv.outer
store i32 0, i32* %v7, align 4		store i32 0, i32* %v7, align 4
%iv.inner.next = add nsw i64 %iv.inner, -1		%iv.inner.next = add nsw i64 %iv.inner, -1
%v9 = icmp eq i64 %iv.inner.next, 0		%v9 = icmp eq i64 %iv.inner.next, 0
br i1 %v9, label %outer.latch, label %inner.body		br i1 %v9, label %outer.latch, label %inner.body

outer.latch: ; preds = %inner.body		outer.latch: ; preds = %inner.body
%v8.lcssa = phi i64 [ %iv.inner, %inner.body ]		%v8.lcssa = phi i64 [ %iv.inner, %inner.body ]
%iv.outer.next = add nuw nsw i64 %iv.outer, 1		%iv.outer.next = add nuw nsw i64 %iv.outer, 1
Show All 32 Lines	entry:
br label %outer.header		br label %outer.header

outer.header: ; preds = %outer.latch, %entry		outer.header: ; preds = %outer.latch, %entry
%iv.outer = phi i64 [ 0, %entry ], [ %iv.outer.next, %outer.latch ]		%iv.outer = phi i64 [ 0, %entry ], [ %iv.outer.next, %outer.latch ]
br label %inner.body		br label %inner.body

inner.body: ; preds = %inner.body, %outer.header		inner.body: ; preds = %inner.body, %outer.header
%iv.inner = phi i64 [ 5, %outer.header ], [ %iv.inner.next, %inner.body ]		%iv.inner = phi i64 [ 5, %outer.header ], [ %iv.inner.next, %inner.body ]
%v7 = getelementptr inbounds [10 x [1 x i32]], [10 x [1 x i32]]* @c, i64 0, i64 %iv.inner, i64 %iv.outer		%v7 = getelementptr inbounds [10 x [10 x i32]], [10 x [10 x i32]]* @c, i64 0, i64 %iv.inner, i64 %iv.outer
store i32 0, i32* %v7, align 4		store i32 0, i32* %v7, align 4
%iv.inner.next = add nsw i64 %iv.inner, -1		%iv.inner.next = add nsw i64 %iv.inner, -1
%v9 = icmp eq i64 %iv.inner, 0		%v9 = icmp eq i64 %iv.inner, 0
br i1 %v9, label %outer.latch, label %inner.body		br i1 %v9, label %outer.latch, label %inner.body

outer.latch: ; preds = %inner.body		outer.latch: ; preds = %inner.body
%v8.lcssa = phi i64 [ %iv.inner.next, %inner.body ]		%v8.lcssa = phi i64 [ %iv.inner.next, %inner.body ]
;%const.lcssa = phi i64 [ 111, %inner.body ]		;%const.lcssa = phi i64 [ 111, %inner.body ]
▲ Show 20 Lines • Show All 59 Lines • Show Last 20 Lines

llvm/test/Transforms/LoopInterchange/pr45743-move-from-inner-preheader.ll

	; NOTE: Assertions have been autogenerated by utils/update_test_checks.py			; NOTE: Assertions have been autogenerated by utils/update_test_checks.py
	; RUN: opt -loop-interchange -S %s \| FileCheck %s			; RUN: opt -loop-interchange -S %s \| FileCheck %s

	@global = external local_unnamed_addr global [2 x [10 x i32]], align 16			@global = external local_unnamed_addr global [400 x [400 x i32]], align 16

	; We need to move %tmp4 from the inner loop pre header to the outer loop header			; We need to move %tmp4 from the inner loop pre header to the outer loop header
	; before interchanging.			; before interchanging.
	define void @test1() local_unnamed_addr #0 {			define void @test1() local_unnamed_addr #0 {
	; CHECK-LABEL: @test1(			; CHECK-LABEL: @test1(
	; CHECK-NEXT: bb:			; CHECK-NEXT: bb:
	; CHECK-NEXT: br label [[INNER_PH:%.*]]			; CHECK-NEXT: br label [[INNER_PH:%.*]]
	; CHECK: outer.header.preheader:			; CHECK: outer.header.preheader:
	; CHECK-NEXT: br label [[OUTER_HEADER:%.*]]			; CHECK-NEXT: br label [[OUTER_HEADER:%.*]]
	; CHECK: outer.header:			; CHECK: outer.header:
	; CHECK-NEXT: [[OUTER_IV:%.]] = phi i64 [ [[OUTER_IV_NEXT:%.]], [[OUTER_LATCH:%.]] ], [ 0, [[OUTER_HEADER_PREHEADER:%.]] ]			; CHECK-NEXT: [[OUTER_IV:%.]] = phi i64 [ [[OUTER_IV_NEXT:%.]], [[OUTER_LATCH:%.]] ], [ 0, [[OUTER_HEADER_PREHEADER:%.]] ]
	; CHECK-NEXT: [[INNER_RED:%.]] = phi i32 [ [[OUTER_RED:%.]], [[OUTER_HEADER_PREHEADER]] ], [ [[RED_NEXT:%.*]], [[OUTER_LATCH]] ]			; CHECK-NEXT: [[INNER_RED:%.]] = phi i32 [ [[OUTER_RED:%.]], [[OUTER_HEADER_PREHEADER]] ], [ [[RED_NEXT:%.*]], [[OUTER_LATCH]] ]
	; CHECK-NEXT: [[TMP4:%.*]] = add nsw i64 [[OUTER_IV]], 9			; CHECK-NEXT: [[TMP4:%.*]] = add nsw i64 [[OUTER_IV]], 9
	; CHECK-NEXT: br label [[INNER_SPLIT1:%.*]]			; CHECK-NEXT: br label [[INNER_SPLIT1:%.*]]
	; CHECK: inner.ph:			; CHECK: inner.ph:
	; CHECK-NEXT: br label [[INNER:%.*]]			; CHECK-NEXT: br label [[INNER:%.*]]
	; CHECK: inner:			; CHECK: inner:
	; CHECK-NEXT: [[INNER_IV:%.]] = phi i64 [ 0, [[INNER_PH]] ], [ [[TMP0:%.]], [[INNER_SPLIT:%.*]] ]			; CHECK-NEXT: [[INNER_IV:%.]] = phi i64 [ 0, [[INNER_PH]] ], [ [[TMP0:%.]], [[INNER_SPLIT:%.*]] ]
	; CHECK-NEXT: [[OUTER_RED]] = phi i32 [ [[RED_NEXT_LCSSA:%.*]], [[INNER_SPLIT]] ], [ 0, [[INNER_PH]] ]			; CHECK-NEXT: [[OUTER_RED]] = phi i32 [ [[RED_NEXT_LCSSA:%.*]], [[INNER_SPLIT]] ], [ 0, [[INNER_PH]] ]
	; CHECK-NEXT: br label [[OUTER_HEADER_PREHEADER]]			; CHECK-NEXT: br label [[OUTER_HEADER_PREHEADER]]
	; CHECK: inner.split1:			; CHECK: inner.split1:
	; CHECK-NEXT: [[PTR:%.]] = getelementptr inbounds [2 x [10 x i32]], [2 x [10 x i32]] @global, i64 0, i64 [[INNER_IV]], i64 [[TMP4]]			; CHECK-NEXT: [[PTR:%.]] = getelementptr inbounds [400 x [400 x i32]], [400 x [400 x i32]] @global, i64 0, i64 [[INNER_IV]], i64 [[TMP4]]
	; CHECK-NEXT: store i32 0, i32* [[PTR]], align 4			; CHECK-NEXT: store i32 0, i32* [[PTR]], align 4
	; CHECK-NEXT: [[RED_NEXT]] = or i32 [[INNER_RED]], 20			; CHECK-NEXT: [[RED_NEXT]] = or i32 [[INNER_RED]], 20
	; CHECK-NEXT: [[INNER_IV_NEXT:%.*]] = add nsw i64 [[INNER_IV]], 1			; CHECK-NEXT: [[INNER_IV_NEXT:%.*]] = add nsw i64 [[INNER_IV]], 1
	; CHECK-NEXT: [[EC_1:%.*]] = icmp eq i64 [[INNER_IV_NEXT]], 400			; CHECK-NEXT: [[EC_1:%.*]] = icmp eq i64 [[INNER_IV_NEXT]], 400
	; CHECK-NEXT: br label [[OUTER_LATCH]]			; CHECK-NEXT: br label [[OUTER_LATCH]]
	; CHECK: inner.split:			; CHECK: inner.split:
	; CHECK-NEXT: [[RED_NEXT_LCSSA]] = phi i32 [ [[RED_NEXT]], [[OUTER_LATCH]] ]			; CHECK-NEXT: [[RED_NEXT_LCSSA]] = phi i32 [ [[RED_NEXT]], [[OUTER_LATCH]] ]
	; CHECK-NEXT: [[TMP0]] = add nsw i64 [[INNER_IV]], 1			; CHECK-NEXT: [[TMP0]] = add nsw i64 [[INNER_IV]], 1
	Show All 16 Lines

	inner.ph: ; preds = %bb1			inner.ph: ; preds = %bb1
	%tmp4 = add nsw i64 %outer.iv, 9			%tmp4 = add nsw i64 %outer.iv, 9
	br label %inner			br label %inner

	inner: ; preds = %bb5, %bb3			inner: ; preds = %bb5, %bb3
	%inner.iv = phi i64 [ 0, %inner.ph ], [ %inner.iv.next, %inner ]			%inner.iv = phi i64 [ 0, %inner.ph ], [ %inner.iv.next, %inner ]
	%inner.red = phi i32 [ %outer.red, %inner.ph ], [ %red.next, %inner ]			%inner.red = phi i32 [ %outer.red, %inner.ph ], [ %red.next, %inner ]
	%ptr = getelementptr inbounds [2 x [10 x i32]], [2 x [10 x i32]]* @global, i64 0, i64 %inner.iv, i64 %tmp4			%ptr = getelementptr inbounds [400 x [400 x i32]], [400 x [400 x i32]]* @global, i64 0, i64 %inner.iv, i64 %tmp4
	store i32 0, i32* %ptr			store i32 0, i32* %ptr
	%red.next = or i32 %inner.red, 20			%red.next = or i32 %inner.red, 20
	%inner.iv.next = add nsw i64 %inner.iv, 1			%inner.iv.next = add nsw i64 %inner.iv, 1
	%ec.1 = icmp eq i64 %inner.iv.next, 400			%ec.1 = icmp eq i64 %inner.iv.next, 400
	br i1 %ec.1, label %outer.latch, label %inner			br i1 %ec.1, label %outer.latch, label %inner

	outer.latch: ; preds = %bb5			outer.latch: ; preds = %bb5
	%red.next.lcssa = phi i32 [ %red.next, %inner ]			%red.next.lcssa = phi i32 [ %red.next, %inner ]
	Show All 20 Lines
	; CHECK-NEXT: br label [[INNER_PH:%.*]]			; CHECK-NEXT: br label [[INNER_PH:%.*]]
	; CHECK: inner.ph:			; CHECK: inner.ph:
	; CHECK-NEXT: [[TMP4:%.*]] = add nsw i64 [[OUTER_IV]], 9			; CHECK-NEXT: [[TMP4:%.*]] = add nsw i64 [[OUTER_IV]], 9
	; CHECK-NEXT: call void @side_effect()			; CHECK-NEXT: call void @side_effect()
	; CHECK-NEXT: br label [[INNER:%.*]]			; CHECK-NEXT: br label [[INNER:%.*]]
	; CHECK: inner:			; CHECK: inner:
	; CHECK-NEXT: [[INNER_IV:%.]] = phi i64 [ 0, [[INNER_PH]] ], [ [[INNER_IV_NEXT:%.]], [[INNER]] ]			; CHECK-NEXT: [[INNER_IV:%.]] = phi i64 [ 0, [[INNER_PH]] ], [ [[INNER_IV_NEXT:%.]], [[INNER]] ]
	; CHECK-NEXT: [[INNER_RED:%.]] = phi i32 [ [[OUTER_RED]], [[INNER_PH]] ], [ [[RED_NEXT:%.]], [[INNER]] ]			; CHECK-NEXT: [[INNER_RED:%.]] = phi i32 [ [[OUTER_RED]], [[INNER_PH]] ], [ [[RED_NEXT:%.]], [[INNER]] ]
	; CHECK-NEXT: [[PTR:%.]] = getelementptr inbounds [2 x [10 x i32]], [2 x [10 x i32]] @global, i64 0, i64 [[INNER_IV]], i64 [[TMP4]]			; CHECK-NEXT: [[PTR:%.]] = getelementptr inbounds [400 x [400 x i32]], [400 x [400 x i32]] @global, i64 0, i64 [[INNER_IV]], i64 [[TMP4]]
	; CHECK-NEXT: store i32 0, i32* [[PTR]], align 4			; CHECK-NEXT: store i32 0, i32* [[PTR]], align 4
	; CHECK-NEXT: [[RED_NEXT]] = or i32 [[INNER_RED]], 20			; CHECK-NEXT: [[RED_NEXT]] = or i32 [[INNER_RED]], 20
	; CHECK-NEXT: [[INNER_IV_NEXT]] = add nsw i64 [[INNER_IV]], 1			; CHECK-NEXT: [[INNER_IV_NEXT]] = add nsw i64 [[INNER_IV]], 1
	; CHECK-NEXT: [[EC_1:%.*]] = icmp eq i64 [[INNER_IV_NEXT]], 400			; CHECK-NEXT: [[EC_1:%.*]] = icmp eq i64 [[INNER_IV_NEXT]], 400
	; CHECK-NEXT: br i1 [[EC_1]], label [[OUTER_LATCH]], label [[INNER]]			; CHECK-NEXT: br i1 [[EC_1]], label [[OUTER_LATCH]], label [[INNER]]
	; CHECK: outer.latch:			; CHECK: outer.latch:
	; CHECK-NEXT: [[RED_NEXT_LCSSA]] = phi i32 [ [[RED_NEXT]], [[INNER]] ]			; CHECK-NEXT: [[RED_NEXT_LCSSA]] = phi i32 [ [[RED_NEXT]], [[INNER]] ]
	; CHECK-NEXT: [[OUTER_IV_NEXT]] = add nsw i64 [[OUTER_IV]], 1			; CHECK-NEXT: [[OUTER_IV_NEXT]] = add nsw i64 [[OUTER_IV]], 1
	Show All 13 Lines
	inner.ph: ; preds = %bb1			inner.ph: ; preds = %bb1
	%tmp4 = add nsw i64 %outer.iv, 9			%tmp4 = add nsw i64 %outer.iv, 9
	call void @side_effect()			call void @side_effect()
	br label %inner			br label %inner

	inner: ; preds = %bb5, %bb3			inner: ; preds = %bb5, %bb3
	%inner.iv = phi i64 [ 0, %inner.ph ], [ %inner.iv.next, %inner ]			%inner.iv = phi i64 [ 0, %inner.ph ], [ %inner.iv.next, %inner ]
	%inner.red = phi i32 [ %outer.red, %inner.ph ], [ %red.next, %inner ]			%inner.red = phi i32 [ %outer.red, %inner.ph ], [ %red.next, %inner ]
	%ptr = getelementptr inbounds [2 x [10 x i32]], [2 x [10 x i32]]* @global, i64 0, i64 %inner.iv, i64 %tmp4			%ptr = getelementptr inbounds [400 x [400 x i32]], [400 x [400 x i32]]* @global, i64 0, i64 %inner.iv, i64 %tmp4
	store i32 0, i32* %ptr			store i32 0, i32* %ptr
	%red.next = or i32 %inner.red, 20			%red.next = or i32 %inner.red, 20
	%inner.iv.next = add nsw i64 %inner.iv, 1			%inner.iv.next = add nsw i64 %inner.iv, 1
	%ec.1 = icmp eq i64 %inner.iv.next, 400			%ec.1 = icmp eq i64 %inner.iv.next, 400
	br i1 %ec.1, label %outer.latch, label %inner			br i1 %ec.1, label %outer.latch, label %inner

	outer.latch: ; preds = %bb5			outer.latch: ; preds = %bb5
	%red.next.lcssa = phi i32 [ %red.next, %inner ]			%red.next.lcssa = phi i32 [ %red.next, %inner ]
	%outer.iv.next = add nsw i64 %outer.iv, 1			%outer.iv.next = add nsw i64 %outer.iv, 1
	%ec.2 = icmp eq i64 %outer.iv.next, 400			%ec.2 = icmp eq i64 %outer.iv.next, 400
	br i1 %ec.2, label %exit, label %outer.header			br i1 %ec.2, label %exit, label %outer.header

	exit: ; preds = %bb11			exit: ; preds = %bb11
	ret void			ret void
	}			}