This is an archive of the discontinued LLVM Phabricator instance.

[Patch] Loop Interchange Pass
ClosedPublic

Authored by karthikthecool on Feb 9 2015, 5:24 AM.

Download Raw Diff

Details

Reviewers

jmolloy
hfinkel
pekka.jaaskelainen

Summary

Hi All,
Please find attached the patch for Loop Interchange Pass for llvm. Initial RFC and design was submitted at http://reviews.llvm.org/D7432 .
This pass is disabled by default.

To give a brief intorduction it consists of 3 stages-

LoopInterchangeLegality : Checks the legality of loop interchange based on distance/direction vector.
LoopInterchangeProfitability: A very basic heuristic has been added to check for profitibility. This will evolve over time.
LoopInterchangeTransform : Which does the actual transform.

Current Limitation:

Only handles leve 2 loops for now. Will extend it going forward to support any level of loops as James had suggested during RFC.
Triangular loops are not yet supported.

As Hal had suggested during RFC i went through TSVC Benchmark. Unfortunetly i didnt get time to run it but i went through the test case for loop interchange. One of the test cases s231() which was not being vectorized previously now gets vectorized. Added a similar test case in this patch.

This patch seems to be working fine and producing correct result (i.e. interchanging doesn't change the o/p of the program) to best of my knowledge.

Wanted some comments on how to go about writing test cases for this transform? Please let me know your inputs of this.
Also is it ok to do further development on trunk once this patch is finalized?

Thanks and Regards
Karthik Bhat

Diff Detail

Repository: rL LLVM

Event Timeline

karthikthecool retitled this revision from to [Patch] Loop Interchange Pass.Feb 9 2015, 5:24 AM

karthikthecool updated this object.

karthikthecool edited the test plan for this revision. (Show Details)

karthikthecool added reviewers: hfinkel, jmolloy, pekka.jaaskelainen.

karthikthecool updated this revision to Diff 19574.Feb 9 2015, 5:24 AM

karthikthecool set the repository for this revision to rL LLVM.

karthikthecool added a subscriber: Unknown Object (MLST).

Please adjust all of the variable names to start with a capital letter, see: http://llvm.org/docs/CodingStandards.html#name-types-functions-variables-and-enumerators-properly

As Hal had suggested during RFC i went through TSVC Benchmark. Unfortunetly i didnt get time to run it but i went through the test case for loop interchange. One of the test cases s231() which was not being vectorized previously now gets vectorized. Added a similar test case in this patch.

Great, thanks!

Also is it ok to do further development on trunk once this patch is finalized?

Yes, once this functionality is finalized, we'll move further development to trunk.

include/llvm/Transforms/Scalar.h
143	How about, "This pass interchanges loops to provide a more cache-friendly memory access patterns."
lib/Transforms/Scalar/LoopInterchange.cpp
50	Please add a comment explaining what this function computes?
55	Did you mean += ?
62	LoopInfo already has a getLoopDepth() function. Can you use that?
76	This is not actually what you want. If the loop is branched to by, for example, multiple entries of the switch statement, the predecessor can be listed multiple times in the predecessor list (and, thus, you'll have more than two incoming values even though you have only 2 predecessor blocks). I suspect that what you actually want is that there is a unique latch and a unique predecessor, so you want that L->getLoopLatch() && L->getLoopPredecessor() [neither are nullptr].
82	Do you also need to check that AddRec->isAffine()?
87	Let's say: // FIXME: Handle loops with more than one induction variable. Note that, currently, legality makes sure we have only one induction variable.
213	I'd move this FIXME comment somewhere else, it is not particularly useful here. It is more useful to tag places that assume only two levels.
262	Either you should handle the case where the dyn_cast fails, or if it can't fail (because we've already verified that this must be a BranchInst), then use cast<> instead. This same comment applies to many places below as well. Only use dyn_cast if the cast can fail (in which case you should handle the nullptr case). Otherwise, use cast<>.
274	Space before 'Any'
276	licm -> LICM
298	What are you actually trying to check here? Instructions with side effects? Maybe you want I->mayHaveSideEffects()?
310	These checks look identical to those above, please make a function (a lambda function is fine).
364	Can you include Src and Des in these debug messages so that we can see what instructions are relevant?
416	How are you checking for reductions here? Do you need to check that the one PHI you've found is not used outside of the loop?
447	Why are you only counting uses in the latch block? Should the increment be in some other block, then what?
460	Let's say, "Inner or outer loops lack a preheader"? Also, for the future, adding a preheader when one is not present is pretty easy (you just need to call InsertPreheaderForLoop from llvm/Transforms/Utils/LoopUtils.h), we this is a limitation that should be removed sooner rather than later (although after the initial commit is okay).
501	if (GetElementPtrInst *GEP = dyn_cast<GetElementPtrInst>(UseInstr)) { ... }
503	What happens if it is not the IV directly, but some expression of the IV? I think you'd be better off using ScalarEvolution here, get the AddRec of the GEP, and see if the "outer" AddRec is provided in terms of the SCEV of the IV (or something like that).
551	Use SplitBlock from include/llvm/Transforms/Utils/BasicBlockUtils.h? (same for other functions below)?
726	Use llvm_unreachable, not assert(0 &&
788	Why?
818	PHIs are always at the beginning of the block; once you hit the first non-PHI, you can exit the loop (you should never find another).

Hi Hal,
Thanks for the review. Please find my comments updated below. Will upload modified patch shortly.

P.S. Sorry for the long comments,
Thanks and Regards
Karthik Bhat

include/llvm/Transforms/Scalar.h
143	Yes of course your comment makes more sense.
lib/Transforms/Scalar/LoopInterchange.cpp
50	This function get the maximum nesting level of the innermost loop. We use this to push loops of depth 2 to worklist. For e.g. for(int i=0;i<N;i++) for(int j=0j<M;j++) for(int k=0;k<K,k++) here we want to return 3 as the max nesting level is 3. I have renamed the function and added comment also modified this function a bit to correctly return the max loop depth in case we have multiple inner loops. For e.g. for(int i=0;i<N;i++) { for(int j=0j<M;j++) { for(int k=0;k<K;k++) { // this loop has depth 3 } } for(intk=0;k<K;k++) { // this loop has depth 2 } } In the above case we still return 3 as it is the max depth.
62	getLoopDepth currently only returns the nesting level of the current loop. Since we have access to outer loop here we always get nesting level as 1. So had to go with the recursive function above.
76	Updated the code. Thanks for clarifying the problem.
82	Yes i fell we need to check isAffine as well. Thanks updated the code.
87	OK. Done.
213	OK..
262	Updated code to use cast<> wherever possible. Added null checks in places were dyn_cast is being used.
274	Modified comment.
276	Modified comment.
298	Hi Hal, The way i'm trying to conclude that a loop is tightly nested is as follows- There should not be any extra block between the outer loop and inner loop. (i.e. in this case the outer loop header would branch to inner loop preheader/inner loop body && the other branch in the header would go to the outer loop latch). With this check we can catch loops which have a block inbetween outer and inner loop such as - for(int i=0;i<N;i++) { if(X) { } for(int j=0;j<N;j++) { } } and conclude these as not tightly nested. Second type of non nested loops can be- for(int i=0;i<N;i++) { a = i; k = A[i]; for(int j=0;j<N;j++) { } } these kind of loops will be caught by the second check which check we have a single use of indvar in latch or header which is the operand to Induction Phi(i.e used to increment/decrement the loop counter). I have modified this function a bit in the updated patch. In the 3rd case i was trying to catch loops such as - for(int i=0;i<N;i++) { foo(); for(int j=0;j<N;j++) { } } I think we can do it using a combination of mayHaveSideEffects and mayReadFromMemory. Updated the patch.
310	Done.
364	Done.
416	We are currently checking if there is only 1 PHI node in the lop header which will corrospond to the induction variable. If we find any other PHI's either due to reductions or triangular loop structure. We currently exit as current limiation.
447	We count the uses in loop latch only as we split the latch based on this instruction. This was done because the mentioned example generated code as - for.body3: ; preds = %for.body3, %for.body3.lr.ph %j.018 = phi i32 [ 0, %for.body3.lr.ph ], [ %add6, %for.body3 ] %arrayidx4 = getelementptr inbounds [100 x [100 x i32]]* @A, i32 0, i32 %j.018, i32 %i.020 %5 = load i32* %arrayidx4, align 4, !tbaa !1 %add = add nsw i32 %3, %5 %add6 = add nuw nsw i32 %j.018, 1 %arrayidx8 = getelementptr inbounds [100 x [100 x i32]]* @A, i32 0, i32 %add6, i32 %add5 store i32 %add, i32* %arrayidx8, align 4, !tbaa !1 %exitcond = icmp eq i32 %j.018, %4 br i1 %exitcond, label %for.inc9.loopexit, label %for.body3 since we cannot split at %add6 = add nuw nsw i32 %j.018, 1 we give up in this case. But now that i think about it counting uses may not be the right method to check if we can split the inner loop latch. Consider the following valid loop were we fail with this check- for(int i=0;i<100;i++) for(int j=0;j<100;j++) A[j][i] = A[j][i]+k; here we get the inner loop latch as - for.body3: ; preds = %for.body3, %for.cond1.preheader %j.015 = phi i32 [ 0, %for.cond1.preheader ], [ %inc, %for.body3 ] %arrayidx4 = getelementptr inbounds [100 x [100 x i32]]* @A, i32 0, i32 %j.015, i32 %i.016 %1 = load i32* %arrayidx4, align 4, !tbaa !1 %add = add nsw i32 %0, %1 store i32 %add, i32* %arrayidx4, align 4, !tbaa !1 %inc = add nuw nsw i32 %j.015, 1 %exitcond = icmp eq i32 %inc, 100 br i1 %exitcond, label %for.inc7, label %for.body3 This could have been splitted at %inc = add nuw nsw i32 %j.015, 1 but we fail as we find more than 1 uses. Modified the logic to check tightly grouped inner loop latch which can be splitted.
460	Updated code to add a preheader when not present.
501	Done.
503	Updated code to use SCEV to get the loop from which we get the operand to decide it is a good or bad load. Able to handle code like- for(i=0;i<N;i+=1) for(j=0;j<N;j++) A[j-1][i-1] = A[j-1][i-1]+C[j-1][i-1]; after change. This now gets vectorized after interchange.
551	Updated code.
726	Done.
818	Yes you are right. Modified code.

Hi Hal,
Thanks a lot for the review. Updated the patch to address review comments. Also fixed a few issues which I found during testing.
Major changes include-

Logic to calculate profitibility has been made more acurate.
Logic to detect were to split the inner loop is changed to be more acurate.
Added test case to check the updated profitibility model.

Please let me know your inputs on this.

This still needs major work to support generic loop depths and improved profitability model.
Hopefully will be able to complete it with help from the community.

Thanks once again for the support.

Regards
Karthik Bhat

hfinkel added inline comments.Feb 12 2015, 10:42 PM

lib/Transforms/Scalar/LoopInterchange.cpp
416	No, I mean uses outside of the loops in general. I don't think you check for that. You check for: if (numUsageinLatch + numUsageinHeader != 1) return false; but the PHI could be used in any block dominated by the loop. Do you need to check for that? int i, j; for (int i = 0; i < n; ++i) for (int j = 0; j < m; ++j) a[i][j] = 7; cout << "final i, j = " << i << ", " << j << "\n";

karthikthecool added inline comments.Feb 13 2015, 1:59 AM

lib/Transforms/Scalar/LoopInterchange.cpp
416	Hi Hal, The way i was handling this was- since the loops were tightly coupled we were getting the lcssa phi for these loops in the outer loop latch which i was splitting and moving outside loop. I was able to get the correct value for i and j in this case. But i think i can add check to avoid these cases as well. Since we do not want uses outside loop is it ok to have a check like- if (isa<PHINode>(InnerLoopLatch->begin())) return false; if (isa<PHINode>(OuterLoopLatch->begin())) return false; This will make sure we do not have any outside uses defined inside the loop. Does this check look good. Will this make this transform too restrictive? Thanks for answering my silly queries i'm still getting hold of loop optimizations. Regards Karthik Bhat

hfinkel added inline comments.Feb 13 2015, 2:27 AM

lib/Transforms/Scalar/LoopInterchange.cpp
416	Ah, you're right. I think that, given our current restrictions, the final values outside the loop nest will always be the same, so this is fine. (we should have a regression test showing that we still can interchange in this case).

Hi Hal,
Updated the test case to add a test case to cover case were we have a usage of PHI outside the loop. I have added gi,gj as global vaiable and used them as induction variables in the loop to simulate this case.

It would be great if you could give me some inputs on writing test case for this pass.
Currently the test cases i have added are all more or less similar(i.e. they get vectorized after interchange).
For checking loop that are just interchanged but not vectorized do we have to check the exact instructions after interchange or may be check the PHI instruction order in .ll (after interchange the Induction PHI will be in the reverse order) ?

Thanks
Karthik Bhat

In D7499#123108, @karthikthecool wrote:

Hi Hal,
Updated the test case to add a test case to cover case were we have a usage of PHI outside the loop. I have added gi,gj as global vaiable and used them as induction variables in the loop to simulate this case.

It would be great if you could give me some inputs on writing test case for this pass.
Currently the test cases i have added are all more or less similar(i.e. they get vectorized after interchange).
For checking loop that are just interchanged but not vectorized do we have to check the exact instructions after interchange or may be check the PHI instruction order in .ll (after interchange the Induction PHI will be in the reverse order) ?

Good point; I'd not looked carefully at the tests yet. We should not test this pass by using the vectorizer, but rather, should test the output of the interchange pass directly.

We don't need to make this harder than necessary, but, I think that for a few representative cases, we should check all of the relevant parts of the output. Then for cases that are structurally similar to those, checking the PHI order (or some other signature of the interchange) is fine.

We also should have negative tests (some loops that aren't quite tightly nested, maybe with some function call or extra memory access, etc.) and make sure they're not interchanged. We should also add tests for current limitations (like that loops with reductions are not interchanged), and put in some FIXME comments stating that these are just current limitations.

Feel free to borrow IR from the files in test/Analysis/DependenceAnalysis and adapt them as tests here.

Thanks
Karthik Bhat

Hi Hal,
Sorry for the delay in followup. I was on a vaction.
Please find the updated and rebased patch. Added test cases as per your suggestion. I also verified the o/p of the programs on randomly generated array and o/p's are same before and after interchange in cases were loops are interchanged.

I will start to work on generic version of loop interchange (i.e. to support loops of any depth) after the initial version is committed.

Please let me know your inputs on this. Thanks again for your time and review. I really appreciate your help.

Regrads
Karthik Bhat

hfinkel added inline comments.Feb 19 2015, 6:46 PM

test/Transforms/LoopInterchange/vectorize.ll
1 ↗	(On Diff #20150)	Don't run the vectorizer here. Just run interchange, verify that it does what it should, and if you want end-to-end coverage through the vectorizer, add a vectorizer regression test for the interchanged loops (likely, in the subfolder of the vectorizer's regression tests for your target of interest).

Hi Hal,
Thanks for the review.
Please find the updated patch. Moved tests from vectorize.ll to profitability.ll and interchange.ll. Checking for loop interchange as per comments. Also added a negative test case to check profitabilitymodel.(i.e. were it is legal but not profitable to interchange).

Please let me know if this looks good for initial commit. This pass is currently disabled by default.

Thanks and Regards
Karthik Bhat

Hi Hal,
I had some time to work on generic version of loop interchange to support any depth. This updated patch supports loops of any depth.
The loop selection algorithm currently selects the innermost loop for interchange. Going forward we can improve this heuristic to select the most profitable loop based on Dependency matrix.
To keep it simple in the first version loops with LCSSA phi are currently not handled. I will work on handling them in later iterations.
The legality and profitability logic is pretty much the same. We use dependency matrix to conclude legality of interchange of 2 loops.

One of the TSVC benchmark test case (s231) gives 2X improvement with this patch.

I ran llvm lnt performance tests based on http://llvm.org/docs/lnt/quickstart.html#running-tests with sample size of 3 but every time i see a lot of variations in the results. I will try to run lnt with larger sample size and update the results here.

It would be great if you could let me know your inputs on this patch.

P.S. Are there any build bots which we can use to run llvm lnt/performance tests for this patch?

Thanks and Regards
Karthik Bhat

I had some time to work on generic version of loop interchange to support any depth. This updated patch supports loops of any depth.

Nice! A few comments...

include/llvm/Transforms/Scalar.h
144	You've un-improved this comment; please change it back.
lib/Transforms/IPO/PassManagerBuilder.cpp
276	I'd certainly like to have this on by default eventually, but we should be more conservative at first. Please add a command-line flag to enable this (there are several in this file already), so we can do further testing.
lib/Transforms/Scalar/LoopInterchange.cpp
12	You should use the 'cache-friendly memory access pattern' terminology here too.
51	Is this matrix generally sparse? (or could we make it sparse by picking some default). If so, is this the right data structure?
75	I'm somewhat worried about doing this eagerly for all loops; what if they're really large with lots of memory accesses? Maybe we should have a cutoff?
653	No need for { } here.
856	I don't really understand this comment. I think we can assume that LICM has run first. (and if this pass detects loop-invariant code better than LICM, that is another problem to fix, but not here).
917	No need for the { }

Hi Hal,
Please find my comments inline. Updated the patch as per review comments and fixed few issues found during llvm lnt regression.
The current version of loop interchange gives some 30% improvement in execution time in 2 benchmarks. This is because it contains code fragments like -

for (i = 0; i < _PB_N; i++)
 for (j = 0; j < _PB_N; j++)
   x[i] = x[i] + beta * A[j][i] * y[j];

which gets benefited after interchange.
There are few compile time regression which can be because of the heavy legality checks in loop interchange. I will try to fix this in next iteration.
Apart from this we found a crash in Dependency Analysis module which I'm planning to fix seperatly as i need to understand it in more detail. Will raise a bug on the same.

Thanks and Regards
Karthik Bhat

lib/Transforms/IPO/PassManagerBuilder.cpp
276	Sure. I had added it for testing performance forgot to revert before checkin. Will add a command line flag and disable it by default.
lib/Transforms/Scalar/LoopInterchange.cpp
12	Done.
51	Yes this matrix can be sparse depending on the dependence carried by the loop. I will check more on this front. Have added a TODO for now.
75	Makes sense. Added a limit of max 10 loops(Columns in the dependency matrix) and 100 dependencies(Rows of the dependency matrix).
653	Done.
856	Hi Hal, Consider the below code of matrix multiplication- for(int i=0;i<N;i++) for(int j=0;j<N;j++) for(int k=0;k<N;k++) A[i][j]= A[i][j]+B[i][k]C[k][j] In this example the direction vector would be - [= = \|<] (i.e. '=' dependency in i, '=' dependency in j and is loop independent dependency in k). The LICM pass would move getElementPointer for A[i][j] outside the inner loop but it cannot move the complete statement outside the inner loop. Now since vectorizer only works on inner loop. The above code is not vectorized for i,j. But if we interchange the loops to - for(int k=0;k<N;k++) for(int i=0;i<N;i++) for(int j=0;j<N;j++) A[i][j]= A[i][j]+B[i][k]C[k][j] now the loop gets vectorized. It is mostly profitable to keep loop independent dependencies such as the above at the outermost possible level. We try to achieve the same here.
917	Done.

Hi Hal,
Please find the updated patch attached. Addressed review comments and fixed few issues in Loop Interchange found during llvm lnt regression. After this change we find some improvement in execution time of 2 benchamrk test cases as shown in the previous post. There are few issues in Dependency analysis as you mentioned I'm planning to address it seperatly after looking into the module in more detail. Hope that should be fine?
Please let me know your inputs on the patch.
Thanks for your time and help.
Regards
Karthik Bhat

Hi All,
Rebase to trunc and update the test cases to reflect recent changes in IR format.

Patch to fix the crash in Dependency analysis mentioned above submitted at D8059. With this patch we do not see any failures in llvm lnt. As mentioned in previous comments we see execution time improvement in 2 tests and compile time regression in few test cases.

Please if you could let me know if this is good for initial checkin with pass disabled by default. We still have some work to do in this pass e.g.-

Add support for reductions and lcssa phi.
Improve profitability model.
Improve loop selection algorithm.
Improve compile time regression found in llvm lnt due to this pass.
Fix issues in Dependency Analysis module.

I would like to address them one by one on trunc if everyone is OK with it.
Awaiting response.
Regards
Karthik Bhat

Thanks for continuing to work on this. I have a few minor comments below, but we can move this in-tree. Please go ahead an commit, and we'll continue to iterate/test. When you commit, please commit the change to lib/Analysis/DependenceAnalysis.cpp separately.

lib/Transforms/Scalar/LoopInterchange.cpp
515	ENABLE_DEBUGGING is too generic for this. How about calling this: DUMP_DEP_MATRICIES
547	Please make this TODO more specific. What happens now and what should happen instead?
688	Please make this more specific. We do handle anti deps. What needs to happen?

This revision is now accepted and ready to land.Mar 5 2015, 7:23 PM

Thanks Hal. Committed as r231458 after implementing review comments. Will raise a review for DependencyAnalysis fix shortly.
Thanks and Regards
Karthik Bhat

karthikthecool mentioned this in D7432: [RFC] Loop Interchane Pass.Mar 8 2015, 9:52 PM

Revision Contents

Path

Size

include/

llvm/

InitializePasses.h

1 line

LinkAllPasses.h

1 line

Transforms/

Scalar.h

7 lines

lib/

Analysis/

DependenceAnalysis.cpp

9 lines

Transforms/

IPO/

PassManagerBuilder.cpp

9 lines

Scalar/

CMakeLists.txt

1 line

LoopInterchange.cpp

1288 lines

Scalar.cpp

1 line

test/

Transforms/

LoopInterchange/

currentLimitation.ll

58 lines

interchange.ll

557 lines

profitability.ll

205 lines

Diff 21271

include/llvm/InitializePasses.h

	Show First 20 Lines • Show All 160 Lines • ▼ Show 20 Lines
	void initializeLiveRegMatrixPass(PassRegistry&);			void initializeLiveRegMatrixPass(PassRegistry&);
	void initializeLiveStacksPass(PassRegistry&);			void initializeLiveStacksPass(PassRegistry&);
	void initializeLiveVariablesPass(PassRegistry&);			void initializeLiveVariablesPass(PassRegistry&);
	void initializeLoaderPassPass(PassRegistry&);			void initializeLoaderPassPass(PassRegistry&);
	void initializeLocalStackSlotPassPass(PassRegistry&);			void initializeLocalStackSlotPassPass(PassRegistry&);
	void initializeLoopDeletionPass(PassRegistry&);			void initializeLoopDeletionPass(PassRegistry&);
	void initializeLoopExtractorPass(PassRegistry&);			void initializeLoopExtractorPass(PassRegistry&);
	void initializeLoopInfoWrapperPassPass(PassRegistry&);			void initializeLoopInfoWrapperPassPass(PassRegistry&);
				void initializeLoopInterchangePass(PassRegistry &);
	void initializeLoopInstSimplifyPass(PassRegistry&);			void initializeLoopInstSimplifyPass(PassRegistry&);
	void initializeLoopRotatePass(PassRegistry&);			void initializeLoopRotatePass(PassRegistry&);
	void initializeLoopSimplifyPass(PassRegistry&);			void initializeLoopSimplifyPass(PassRegistry&);
	void initializeLoopStrengthReducePass(PassRegistry&);			void initializeLoopStrengthReducePass(PassRegistry&);
	void initializeGlobalMergePass(PassRegistry&);			void initializeGlobalMergePass(PassRegistry&);
	void initializeLoopRerollPass(PassRegistry&);			void initializeLoopRerollPass(PassRegistry&);
	void initializeLoopUnrollPass(PassRegistry&);			void initializeLoopUnrollPass(PassRegistry&);
	void initializeLoopUnswitchPass(PassRegistry&);			void initializeLoopUnswitchPass(PassRegistry&);
	▲ Show 20 Lines • Show All 123 Lines • Show Last 20 Lines

include/llvm/LinkAllPasses.h

Show First 20 Lines • Show All 89 Lines • ▼ Show 20 Lines	ForcePassLinking() {
(void) llvm::createInductiveRangeCheckEliminationPass();		(void) llvm::createInductiveRangeCheckEliminationPass();
(void) llvm::createIndVarSimplifyPass();		(void) llvm::createIndVarSimplifyPass();
(void) llvm::createInstructionCombiningPass();		(void) llvm::createInstructionCombiningPass();
(void) llvm::createInternalizePass();		(void) llvm::createInternalizePass();
(void) llvm::createLCSSAPass();		(void) llvm::createLCSSAPass();
(void) llvm::createLICMPass();		(void) llvm::createLICMPass();
(void) llvm::createLazyValueInfoPass();		(void) llvm::createLazyValueInfoPass();
(void) llvm::createLoopExtractorPass();		(void) llvm::createLoopExtractorPass();
		(void)llvm::createLoopInterchangePass();
(void) llvm::createLoopSimplifyPass();		(void) llvm::createLoopSimplifyPass();
(void) llvm::createLoopStrengthReducePass();		(void) llvm::createLoopStrengthReducePass();
(void) llvm::createLoopRerollPass();		(void) llvm::createLoopRerollPass();
(void) llvm::createLoopUnrollPass();		(void) llvm::createLoopUnrollPass();
(void) llvm::createLoopUnswitchPass();		(void) llvm::createLoopUnswitchPass();
(void) llvm::createLoopIdiomPass();		(void) llvm::createLoopIdiomPass();
(void) llvm::createLoopRotatePass();		(void) llvm::createLoopRotatePass();
(void) llvm::createLowerExpectIntrinsicPass();		(void) llvm::createLowerExpectIntrinsicPass();
▲ Show 20 Lines • Show All 78 Lines • Show Last 20 Lines

include/llvm/Transforms/Scalar.h

	Show First 20 Lines • Show All 134 Lines • ▼ Show 20 Lines
	//===----------------------------------------------------------------------===//			//===----------------------------------------------------------------------===//
	//			//
	// LICM - This pass is a loop invariant code motion and memory promotion pass.			// LICM - This pass is a loop invariant code motion and memory promotion pass.
	//			//
	Pass *createLICMPass();			Pass *createLICMPass();

	//===----------------------------------------------------------------------===//			//===----------------------------------------------------------------------===//
	//			//
				// LoopInterchange - This pass interchanges loops to provide a more
				hfinkelUnsubmitted Not Done Reply Inline Actions How about, "This pass interchanges loops to provide a more cache-friendly memory access patterns." hfinkel: How about, "This pass interchanges loops to provide a more cache-friendly memory access…
				karthikthecoolAuthorUnsubmitted Not Done Reply Inline Actions Yes of course your comment makes more sense. karthikthecool: Yes of course your comment makes more sense.
				// cache-friendly memory access patterns.
				hfinkelUnsubmitted Not Done Reply Inline Actions You've un-improved this comment; please change it back. hfinkel: You've un-improved this comment; please change it back.
				//
				Pass *createLoopInterchangePass();

				//===----------------------------------------------------------------------===//
				//
	// LoopStrengthReduce - This pass is strength reduces GEP instructions that use			// LoopStrengthReduce - This pass is strength reduces GEP instructions that use
	// a loop's canonical induction variable as one of their indices.			// a loop's canonical induction variable as one of their indices.
	//			//
	Pass *createLoopStrengthReducePass();			Pass *createLoopStrengthReducePass();

	Pass createGlobalMergePass(const TargetMachine TM, unsigned MaximalOffset);			Pass createGlobalMergePass(const TargetMachine TM, unsigned MaximalOffset);

	//===----------------------------------------------------------------------===//			//===----------------------------------------------------------------------===//
	▲ Show 20 Lines • Show All 295 Lines • Show Last 20 Lines

lib/Analysis/DependenceAnalysis.cpp

Show First 20 Lines • Show All 3,354 Lines • ▼ Show 20 Lines	DependenceAnalysis::depends(Instruction Src, Instruction Dst,
if (UsefulGEP) {		if (UsefulGEP) {
DEBUG(dbgs() << " using GEPs\n");		DEBUG(dbgs() << " using GEPs\n");
unsigned P = 0;		unsigned P = 0;
for (GEPOperator::const_op_iterator SrcIdx = SrcGEP->idx_begin(),		for (GEPOperator::const_op_iterator SrcIdx = SrcGEP->idx_begin(),
SrcEnd = SrcGEP->idx_end(),		SrcEnd = SrcGEP->idx_end(),
DstIdx = DstGEP->idx_begin();		DstIdx = DstGEP->idx_begin();
SrcIdx != SrcEnd;		SrcIdx != SrcEnd;
++SrcIdx, ++DstIdx, ++P) {		++SrcIdx, ++DstIdx, ++P) {
		// If not isSCEVable return dependency as not computable.
		if (!SE->isSCEVable((*SrcIdx)->getType()) \|\|
		!SE->isSCEVable((*DstIdx)->getType())) {
		return make_unique<Dependence>(Src, Dst);
		}
Pair[P].Src = SE->getSCEV(*SrcIdx);		Pair[P].Src = SE->getSCEV(*SrcIdx);
Pair[P].Dst = SE->getSCEV(*DstIdx);		Pair[P].Dst = SE->getSCEV(*DstIdx);
unifySubscriptType(&Pair[P]);		unifySubscriptType(&Pair[P]);
}		}
}		}
else {		else {
DEBUG(dbgs() << " ignoring GEPs\n");		DEBUG(dbgs() << " ignoring GEPs\n");
const SCEV *SrcSCEV = SE->getSCEV(SrcPtr);		const SCEV *SrcSCEV = SE->getSCEV(SrcPtr);
▲ Show 20 Lines • Show All 410 Lines • ▼ Show 20 Lines	const SCEV *DependenceAnalysis::getSplitIteration(const Dependence &Dep,
SmallVector<Subscript, 4> Pair(Pairs);		SmallVector<Subscript, 4> Pair(Pairs);
if (UsefulGEP) {		if (UsefulGEP) {
unsigned P = 0;		unsigned P = 0;
for (GEPOperator::const_op_iterator SrcIdx = SrcGEP->idx_begin(),		for (GEPOperator::const_op_iterator SrcIdx = SrcGEP->idx_begin(),
SrcEnd = SrcGEP->idx_end(),		SrcEnd = SrcGEP->idx_end(),
DstIdx = DstGEP->idx_begin();		DstIdx = DstGEP->idx_begin();
SrcIdx != SrcEnd;		SrcIdx != SrcEnd;
++SrcIdx, ++DstIdx, ++P) {		++SrcIdx, ++DstIdx, ++P) {
		if (!SE->isSCEVable((*SrcIdx)->getType()) \|\|
		!SE->isSCEVable((*DstIdx)->getType())) {
		return nullptr;
		}
Pair[P].Src = SE->getSCEV(*SrcIdx);		Pair[P].Src = SE->getSCEV(*SrcIdx);
Pair[P].Dst = SE->getSCEV(*DstIdx);		Pair[P].Dst = SE->getSCEV(*DstIdx);
}		}
}		}
else {		else {
const SCEV *SrcSCEV = SE->getSCEV(SrcPtr);		const SCEV *SrcSCEV = SE->getSCEV(SrcPtr);
const SCEV *DstSCEV = SE->getSCEV(DstPtr);		const SCEV *DstSCEV = SE->getSCEV(DstPtr);
Pair[0].Src = SrcSCEV;		Pair[0].Src = SrcSCEV;
▲ Show 20 Lines • Show All 152 Lines • Show Last 20 Lines

lib/Transforms/IPO/PassManagerBuilder.cpp

Show First 20 Lines • Show All 71 Lines • ▼ Show 20 Lines
static cl::opt<bool> UseCFLAA("use-cfl-aa",		static cl::opt<bool> UseCFLAA("use-cfl-aa",
cl::init(false), cl::Hidden,		cl::init(false), cl::Hidden,
cl::desc("Enable the new, experimental CFL alias analysis"));		cl::desc("Enable the new, experimental CFL alias analysis"));

static cl::opt<bool>		static cl::opt<bool>
EnableMLSM("mlsm", cl::init(true), cl::Hidden,		EnableMLSM("mlsm", cl::init(true), cl::Hidden,
cl::desc("Enable motion of merged load and store"));		cl::desc("Enable motion of merged load and store"));

		static cl::opt<bool> EnableLoopInterchange(
		"enable-loopinterchange", cl::init(false), cl::Hidden,
		cl::desc("Enable the new, experimental LoopInterchange Pass"));

PassManagerBuilder::PassManagerBuilder() {		PassManagerBuilder::PassManagerBuilder() {
OptLevel = 2;		OptLevel = 2;
SizeLevel = 0;		SizeLevel = 0;
LibraryInfo = nullptr;		LibraryInfo = nullptr;
Inliner = nullptr;		Inliner = nullptr;
DisableTailCalls = false;		DisableTailCalls = false;
DisableUnitAtATime = false;		DisableUnitAtATime = false;
DisableUnrollLoops = false;		DisableUnrollLoops = false;
▲ Show 20 Lines • Show All 146 Lines • ▼ Show 20 Lines	void PassManagerBuilder::populateModulePassManager(
// Rotate Loop - disable header duplication at -Oz		// Rotate Loop - disable header duplication at -Oz
MPM.add(createLoopRotatePass(SizeLevel == 2 ? 0 : -1));		MPM.add(createLoopRotatePass(SizeLevel == 2 ? 0 : -1));
MPM.add(createLICMPass()); // Hoist loop invariants		MPM.add(createLICMPass()); // Hoist loop invariants
MPM.add(createLoopUnswitchPass(SizeLevel \|\| OptLevel < 3));		MPM.add(createLoopUnswitchPass(SizeLevel \|\| OptLevel < 3));
MPM.add(createInstructionCombiningPass());		MPM.add(createInstructionCombiningPass());
MPM.add(createIndVarSimplifyPass()); // Canonicalize indvars		MPM.add(createIndVarSimplifyPass()); // Canonicalize indvars
MPM.add(createLoopIdiomPass()); // Recognize idioms like memset.		MPM.add(createLoopIdiomPass()); // Recognize idioms like memset.
MPM.add(createLoopDeletionPass()); // Delete dead loops		MPM.add(createLoopDeletionPass()); // Delete dead loops
		if (EnableLoopInterchange)
		MPM.add(createLoopInterchangePass()); // Interchange loops

if (!DisableUnrollLoops)		if (!DisableUnrollLoops)
MPM.add(createSimpleLoopUnrollPass()); // Unroll small loops		MPM.add(createSimpleLoopUnrollPass()); // Unroll small loops
addExtensionsToPM(EP_LoopOptimizerEnd, MPM);		addExtensionsToPM(EP_LoopOptimizerEnd, MPM);

if (OptLevel > 1) {		if (OptLevel > 1) {
if (EnableMLSM)		if (EnableMLSM)
MPM.add(createMergedLoadStoreMotionPass()); // Merge ld/st in diamonds		MPM.add(createMergedLoadStoreMotionPass()); // Merge ld/st in diamonds
Show All 12 Lines	void PassManagerBuilder::populateModulePassManager(
MPM.add(createInstructionCombiningPass());		MPM.add(createInstructionCombiningPass());
addExtensionsToPM(EP_Peephole, MPM);		addExtensionsToPM(EP_Peephole, MPM);
MPM.add(createJumpThreadingPass()); // Thread jumps		MPM.add(createJumpThreadingPass()); // Thread jumps
MPM.add(createCorrelatedValuePropagationPass());		MPM.add(createCorrelatedValuePropagationPass());
MPM.add(createDeadStoreEliminationPass()); // Delete dead stores		MPM.add(createDeadStoreEliminationPass()); // Delete dead stores
MPM.add(createLICMPass());		MPM.add(createLICMPass());

addExtensionsToPM(EP_ScalarOptimizerLate, MPM);		addExtensionsToPM(EP_ScalarOptimizerLate, MPM);

		hfinkelUnsubmitted Not Done Reply Inline Actions I'd certainly like to have this on by default eventually, but we should be more conservative at first. Please add a command-line flag to enable this (there are several in this file already), so we can do further testing. hfinkel: I'd certainly like to have this on by default eventually, but we should be more conservative at…
		karthikthecoolAuthorUnsubmitted Not Done Reply Inline Actions Sure. I had added it for testing performance forgot to revert before checkin. Will add a command line flag and disable it by default. karthikthecool: Sure. I had added it for testing performance forgot to revert before checkin. Will add a…
if (RerollLoops)		if (RerollLoops)
MPM.add(createLoopRerollPass());		MPM.add(createLoopRerollPass());
if (!RunSLPAfterLoopVectorization) {		if (!RunSLPAfterLoopVectorization) {
if (SLPVectorize)		if (SLPVectorize)
MPM.add(createSLPVectorizerPass()); // Vectorize parallel scalar chains.		MPM.add(createSLPVectorizerPass()); // Vectorize parallel scalar chains.

if (BBVectorize) {		if (BBVectorize) {
MPM.add(createBBVectorizePass());		MPM.add(createBBVectorizePass());
▲ Show 20 Lines • Show All 170 Lines • ▼ Show 20 Lines	void PassManagerBuilder::addLTOOptimizationPasses(legacy::PassManagerBase &PM) {
PM.add(createMemCpyOptPass()); // Remove dead memcpys.		PM.add(createMemCpyOptPass()); // Remove dead memcpys.

// Nuke dead stores.		// Nuke dead stores.
PM.add(createDeadStoreEliminationPass());		PM.add(createDeadStoreEliminationPass());

// More loops are countable; try to optimize them.		// More loops are countable; try to optimize them.
PM.add(createIndVarSimplifyPass());		PM.add(createIndVarSimplifyPass());
PM.add(createLoopDeletionPass());		PM.add(createLoopDeletionPass());
		if (EnableLoopInterchange)
		PM.add(createLoopInterchangePass());

PM.add(createLoopVectorizePass(true, LoopVectorize));		PM.add(createLoopVectorizePass(true, LoopVectorize));

// More scalar chains could be vectorized due to more alias information		// More scalar chains could be vectorized due to more alias information
if (RunSLPAfterLoopVectorization)		if (RunSLPAfterLoopVectorization)
if (SLPVectorize)		if (SLPVectorize)
PM.add(createSLPVectorizerPass()); // Vectorize parallel scalar chains.		PM.add(createSLPVectorizerPass()); // Vectorize parallel scalar chains.

// After vectorization, assume intrinsics may tell us more about pointer		// After vectorization, assume intrinsics may tell us more about pointer
▲ Show 20 Lines • Show All 138 Lines • Show Last 20 Lines

lib/Transforms/Scalar/CMakeLists.txt

Show All 12 Lines	add_llvm_library(LLVMScalarOpts
InductiveRangeCheckElimination.cpp		InductiveRangeCheckElimination.cpp
IndVarSimplify.cpp		IndVarSimplify.cpp
JumpThreading.cpp		JumpThreading.cpp
LICM.cpp		LICM.cpp
LoadCombine.cpp		LoadCombine.cpp
LoopDeletion.cpp		LoopDeletion.cpp
LoopIdiomRecognize.cpp		LoopIdiomRecognize.cpp
LoopInstSimplify.cpp		LoopInstSimplify.cpp
		LoopInterchange.cpp
LoopRerollPass.cpp		LoopRerollPass.cpp
LoopRotation.cpp		LoopRotation.cpp
LoopStrengthReduce.cpp		LoopStrengthReduce.cpp
LoopUnrollPass.cpp		LoopUnrollPass.cpp
LoopUnswitch.cpp		LoopUnswitch.cpp
LowerAtomic.cpp		LowerAtomic.cpp
LowerExpectIntrinsic.cpp		LowerExpectIntrinsic.cpp
MemCpyOptimizer.cpp		MemCpyOptimizer.cpp
Show All 25 Lines

lib/Transforms/Scalar/LoopInterchange.cpp

				//===- LoopInterchange.cpp - Loop interchange pass------------------------===//
				//
				// The LLVM Compiler Infrastructure
				//
				// This file is distributed under the University of Illinois Open Source
				// License. See LICENSE.TXT for details.
				//
				//===----------------------------------------------------------------------===//
				//
				// This Pass handles loop interchange transform.
				// This pass interchanges loops to provide a more cache-friendly memory access
				// patterns.
				hfinkelUnsubmitted Not Done Reply Inline Actions You should use the 'cache-friendly memory access pattern' terminology here too. hfinkel: You should use the 'cache-friendly memory access pattern' terminology here too.
				karthikthecoolAuthorUnsubmitted Not Done Reply Inline Actions Done. karthikthecool: Done.
				//
				//===----------------------------------------------------------------------===//

				#include "llvm/ADT/SmallVector.h"
				#include "llvm/Analysis/AliasAnalysis.h"
				#include "llvm/Analysis/AliasSetTracker.h"
				#include "llvm/Analysis/AssumptionCache.h"
				#include "llvm/Analysis/BlockFrequencyInfo.h"
				#include "llvm/Analysis/CodeMetrics.h"
				#include "llvm/Analysis/DependenceAnalysis.h"
				#include "llvm/Analysis/LoopInfo.h"
				#include "llvm/Analysis/LoopIterator.h"
				#include "llvm/Analysis/LoopPass.h"
				#include "llvm/Analysis/ScalarEvolution.h"
				#include "llvm/Analysis/ScalarEvolutionExpander.h"
				#include "llvm/Analysis/ScalarEvolutionExpressions.h"
				#include "llvm/Analysis/TargetTransformInfo.h"
				#include "llvm/Analysis/ValueTracking.h"
				#include "llvm/Transforms/Scalar.h"
				#include "llvm/IR/Function.h"
				#include "llvm/IR/IRBuilder.h"
				#include "llvm/IR/IntrinsicInst.h"
				#include "llvm/IR/InstIterator.h"
				#include "llvm/IR/Dominators.h"
				#include "llvm/Pass.h"
				#include "llvm/Support/Debug.h"
				#include "llvm/Transforms/Utils/SSAUpdater.h"
				#include "llvm/Support/raw_ostream.h"
				#include "llvm/Transforms/Utils/LoopUtils.h"
				#include "llvm/Transforms/Utils/BasicBlockUtils.h"
				using namespace llvm;

				#define DEBUG_TYPE "loop-interchange"

				namespace {

				typedef SmallVector<Loop *, 8> LoopVector;

				hfinkelUnsubmitted Not Done Reply Inline Actions Please add a comment explaining what this function computes? hfinkel: Please add a comment explaining what this function computes?
				karthikthecoolAuthorUnsubmitted Not Done Reply Inline Actions This function get the maximum nesting level of the innermost loop. We use this to push loops of depth 2 to worklist. For e.g. for(int i=0;i<N;i++) for(int j=0j<M;j++) for(int k=0;k<K,k++) here we want to return 3 as the max nesting level is 3. I have renamed the function and added comment also modified this function a bit to correctly return the max loop depth in case we have multiple inner loops. For e.g. for(int i=0;i<N;i++) { for(int j=0j<M;j++) { for(int k=0;k<K;k++) { // this loop has depth 3 } } for(intk=0;k<K;k++) { // this loop has depth 2 } } In the above case we still return 3 as it is the max depth. karthikthecool: This function get the maximum nesting level of the innermost loop. We use this to push loops of…
				// TODO: Check if we can use a sparse matrix here.
				hfinkelUnsubmitted Not Done Reply Inline Actions Is this matrix generally sparse? (or could we make it sparse by picking some default). If so, is this the right data structure? hfinkel: Is this matrix generally sparse? (or could we make it sparse by picking some default). If so…
				karthikthecoolAuthorUnsubmitted Not Done Reply Inline Actions Yes this matrix can be sparse depending on the dependence carried by the loop. I will check more on this front. Have added a TODO for now. karthikthecool: Yes this matrix can be sparse depending on the dependence carried by the loop. I will check…
				typedef std::vector<std::vector<char>> CharMatrix;

				// Maximum number of dependencies that can be handled in the dependency matrix.
				static const unsigned MaxMemInstrCount = 100;
				hfinkelUnsubmitted Not Done Reply Inline Actions Did you mean += ? hfinkel: Did you mean += ?

				// Maximum loop depth supported.
				static const unsigned MaxLoopNestDepth = 10;

				class LoopInterchange;
				// Returns the maximum inner loop depth starting from L this is used to populate
				// worklist with loops of depth 2.
				hfinkelUnsubmitted Not Done Reply Inline Actions LoopInfo already has a getLoopDepth() function. Can you use that? hfinkel: LoopInfo already has a getLoopDepth() function. Can you use that?
				karthikthecoolAuthorUnsubmitted Not Done Reply Inline Actions getLoopDepth currently only returns the nesting level of the current loop. Since we have access to outer loop here we always get nesting level as 1. So had to go with the recursive function above. karthikthecool: getLoopDepth currently only returns the nesting level of the current loop. Since we have access…
				unsigned getMaxNestingLevel(Loop &L) {
				unsigned Level = 0;
				if (L.empty())
				return 1;
				for (Loop *InnerL : L)
				Level = std::max(Level, getMaxNestingLevel(*InnerL) + 1);
				return Level;
				}

				void printDepMatrix(CharMatrix &DepMatrix) {
				#ifdef ENABLE_DEBUGGING
				for (auto I = DepMatrix.begin(), E = DepMatrix.end(); I != E; ++I) {
				std::vector<char> Vec = *I;
				hfinkelUnsubmitted Not Done Reply Inline Actions I'm somewhat worried about doing this eagerly for all loops; what if they're really large with lots of memory accesses? Maybe we should have a cutoff? hfinkel: I'm somewhat worried about doing this eagerly for all loops; what if they're really large with…
				karthikthecoolAuthorUnsubmitted Not Done Reply Inline Actions Makes sense. Added a limit of max 10 loops(Columns in the dependency matrix) and 100 dependencies(Rows of the dependency matrix). karthikthecool: Makes sense. Added a limit of max 10 loops(Columns in the dependency matrix) and 100…
				for (auto II = Vec.begin(), EE = Vec.end(); II != EE; ++II)
				hfinkelUnsubmitted Not Done Reply Inline Actions This is not actually what you want. If the loop is branched to by, for example, multiple entries of the switch statement, the predecessor can be listed multiple times in the predecessor list (and, thus, you'll have more than two incoming values even though you have only 2 predecessor blocks). I suspect that what you actually want is that there is a unique latch and a unique predecessor, so you want that L->getLoopLatch() && L->getLoopPredecessor() [neither are nullptr]. hfinkel: This is not actually what you want. If the loop is branched to by, for example, multiple…
				karthikthecoolAuthorUnsubmitted Not Done Reply Inline Actions Updated the code. Thanks for clarifying the problem. karthikthecool: Updated the code. Thanks for clarifying the problem.
				DEBUG(dbgs() << *II << " ");
				DEBUG(dbgs() << "\n");
				}
				#endif
				}

				hfinkelUnsubmitted Not Done Reply Inline Actions Do you also need to check that AddRec->isAffine()? hfinkel: Do you also need to check that AddRec->isAffine()?
				karthikthecoolAuthorUnsubmitted Not Done Reply Inline Actions Yes i fell we need to check isAffine as well. Thanks updated the code. karthikthecool: Yes i fell we need to check isAffine as well. Thanks updated the code.
				bool populateDependencyMatrix(CharMatrix &DepMatrix, unsigned Level, Loop *L,
				DependenceAnalysis *DA) {
				typedef SmallVector<Value *, 16> ValueVector;
				ValueVector MemInstr;

				hfinkelUnsubmitted Not Done Reply Inline Actions Let's say: // FIXME: Handle loops with more than one induction variable. Note that, currently, legality makes sure we have only one induction variable. hfinkel: Let's say: // FIXME: Handle loops with more than one induction variable. Note that…
				karthikthecoolAuthorUnsubmitted Not Done Reply Inline Actions OK. Done. karthikthecool: OK. Done.
				if (Level > MaxLoopNestDepth) {
				DEBUG(dbgs() << "Cannot handle loops of depth greater than "
				<< MaxLoopNestDepth << "\n");
				return false;
				}

				// For each block.
				for (Loop::block_iterator BB = L->block_begin(), BE = L->block_end();
				BB != BE; ++BB) {
				// Scan the BB and collect legal loads and stores.
				for (BasicBlock::iterator I = (BB)->begin(), E = (BB)->end(); I != E;
				++I) {
				Instruction *Ins = dyn_cast<Instruction>(I);
				if (!Ins)
				return false;
				LoadInst *Ld = dyn_cast<LoadInst>(I);
				StoreInst *St = dyn_cast<StoreInst>(I);
				if (!St && !Ld)
				continue;
				if (Ld && !Ld->isSimple())
				return false;
				if (St && !St->isSimple())
				return false;
				MemInstr.push_back(I);
				}
				}

				DEBUG(dbgs() << "Found " << MemInstr.size()
				<< " Loads and Stores to analyze\n");

				ValueVector::iterator I, IE, J, JE;

				for (I = MemInstr.begin(), IE = MemInstr.end(); I != IE; ++I) {
				for (J = I, JE = MemInstr.end(); J != JE; ++J) {
				std::vector<char> Dep;
				Instruction Src = dyn_cast<Instruction>(I);
				Instruction Des = dyn_cast<Instruction>(J);
				if (Src == Des)
				continue;
				if (isa<LoadInst>(Src) && isa<LoadInst>(Des))
				continue;
				if (auto D = DA->depends(Src, Des, true)) {
				DEBUG(dbgs() << "Found Dependency between Src=" << Src << " Des=" << Des
				<< "\n");
				if (D->isFlow()) {
				// TODO: Handle Flow dependence
				DEBUG(dbgs() << "Flow dependence not handled");
				return false;
				}
				if (D->isAnti()) {
				DEBUG(dbgs() << "Found Anti dependence \n");
				unsigned Levels = D->getLevels();
				char Direction;
				for (unsigned II = 1; II <= Levels; ++II) {
				const SCEV *Distance = D->getDistance(II);
				const SCEVConstant *SCEVConst =
				dyn_cast_or_null<SCEVConstant>(Distance);
				if (SCEVConst) {
				const ConstantInt *CI = SCEVConst->getValue();
				if (CI->isNegative())
				Direction = '<';
				else if (CI->isZero())
				Direction = '=';
				else
				Direction = '>';
				Dep.push_back(Direction);
				} else if (D->isScalar(II)) {
				Direction = 'S';
				Dep.push_back(Direction);
				} else {
				unsigned Dir = D->getDirection(II);
				if (Dir == Dependence::DVEntry::LT \|\|
				Dir == Dependence::DVEntry::LE)
				Direction = '<';
				else if (Dir == Dependence::DVEntry::GT \|\|
				Dir == Dependence::DVEntry::GE)
				Direction = '>';
				else if (Dir == Dependence::DVEntry::EQ)
				Direction = '=';
				else
				Direction = '*';
				Dep.push_back(Direction);
				}
				}
				while (Dep.size() != Level) {
				Dep.push_back('I');
				}

				DepMatrix.push_back(Dep);
				if (DepMatrix.size() > MaxMemInstrCount) {
				DEBUG(dbgs() << "Cannot handle more than " << MaxMemInstrCount
				<< " dependencies inside loop\n");
				return false;
				}
				}
				}
				}
				}

				// We don't have a DepMatrix to check legality return false
				if (DepMatrix.size() == 0)
				return false;
				return true;
				}

				// A loop is moved from index 'from' to an index 'to'. Update the Dependence
				// matrix by exchanging the two columns.
				void interChangeDepedencies(CharMatrix &DepMatrix, unsigned FromIndx,
				unsigned ToIndx) {
				unsigned numRows = DepMatrix.size();
				for (unsigned i = 0; i < numRows; ++i) {
				char TmpVal = DepMatrix[i][ToIndx];
				DepMatrix[i][ToIndx] = DepMatrix[i][FromIndx];
				DepMatrix[i][FromIndx] = TmpVal;
				}
				}

				// Checks if outermost non '=','S'or'I' dependence in the dependence matrix is
				// '>'
				bool isOuterMostDepPositive(CharMatrix &DepMatrix, unsigned Row,
				unsigned Column) {
				for (unsigned i = 0; i <= Column; ++i) {
				if (DepMatrix[Row][i] == '<')
				return false;
				if (DepMatrix[Row][i] == '>')
				return true;
				hfinkelUnsubmitted Not Done Reply Inline Actions I'd move this FIXME comment somewhere else, it is not particularly useful here. It is more useful to tag places that assume only two levels. hfinkel: I'd move this FIXME comment somewhere else, it is not particularly useful here. It is more…
				karthikthecoolAuthorUnsubmitted Not Done Reply Inline Actions OK.. karthikthecool: OK..
				}
				// All dependencies were '=','S' or 'I'
				return false;
				}

				// Checks if no dependence exist in the dependency matrix in Row before Column.
				bool containsNoDependence(CharMatrix &DepMatrix, unsigned Row,
				unsigned Column) {
				for (unsigned i = 0; i < Column; ++i) {
				if (DepMatrix[Row][i] != '=' \|\| DepMatrix[Row][i] != 'S' \|\|
				DepMatrix[Row][i] != 'I')
				return false;
				}
				return true;
				}

				bool validDepInterchange(CharMatrix &DepMatrix, unsigned Row,
				unsigned OuterLoopId, char InnerDep, char OuterDep) {

				if (isOuterMostDepPositive(DepMatrix, Row, OuterLoopId))
				return false;

				if (InnerDep == OuterDep)
				return true;

				// It is legal to interchange if and only if after interchange no row has a
				// '>' direction as the leftmost non-'='.

				if (InnerDep == '=' \|\| InnerDep == 'S' \|\| InnerDep == 'I')
				return true;

				if (InnerDep == '<')
				return true;

				if (InnerDep == '>') {
				// If OuterLoopId represents outermost loop then interchanging will make the
				// 1st dependency as '>'
				if (OuterLoopId == 0)
				return false;

				// If all dependencies before OuterloopId are '=','S'or 'I'. Then
				// interchanging will result in this row having an outermost non '='
				// dependency of '>'
				if (!containsNoDependence(DepMatrix, Row, OuterLoopId))
				return true;
				}

				return false;
				}
				hfinkelUnsubmitted Not Done Reply Inline Actions Either you should handle the case where the dyn_cast fails, or if it can't fail (because we've already verified that this must be a BranchInst), then use cast<> instead. This same comment applies to many places below as well. Only use dyn_cast if the cast can fail (in which case you should handle the nullptr case). Otherwise, use cast<>. hfinkel: Either you should handle the case where the dyn_cast fails, or if it can't fail (because we've…
				karthikthecoolAuthorUnsubmitted Not Done Reply Inline Actions Updated code to use cast<> wherever possible. Added null checks in places were dyn_cast is being used. karthikthecool: Updated code to use cast<> wherever possible. Added null checks in places were dyn_cast is…

				// Checks if it is legal to interchange 2 loops.
				// [Theorm] A permutation of the loops in a perfect nest is legal if and only if
				// the direction matrix, after the same permutation is applied to its columns,
				// has no ">" direction as the leftmost non-"=" direction in any row.
				bool isLegalToInterChangeLoops(CharMatrix &DepMatrix, unsigned InnerLoopId,
				unsigned OuterLoopId) {

				unsigned NumRows = DepMatrix.size();
				// For each row check if it is valid to interchange.
				for (unsigned Row = 0; Row < NumRows; ++Row) {
				char InnerDep = DepMatrix[Row][InnerLoopId];
				hfinkelUnsubmitted Not Done Reply Inline Actions Space before 'Any' hfinkel: Space before 'Any'
				karthikthecoolAuthorUnsubmitted Not Done Reply Inline Actions Modified comment. karthikthecool: Modified comment.
				char OuterDep = DepMatrix[Row][OuterLoopId];
				if (InnerDep == '' \|\| OuterDep == '')
				hfinkelUnsubmitted Not Done Reply Inline Actions licm -> LICM hfinkel: licm -> LICM
				karthikthecoolAuthorUnsubmitted Not Done Reply Inline Actions Modified comment. karthikthecool: Modified comment.
				return false;
				else if (!validDepInterchange(DepMatrix, Row, OuterLoopId, InnerDep,
				OuterDep))
				return false;
				}
				return true;
				}

				static void populateWorklist(Loop &L, SmallVector<LoopVector, 8> &V) {

				DEBUG(dbgs() << "Calling populateWorklist called\n");
				LoopVector LoopList;
				Loop *CurrentLoop = &L;
				std::vector<Loop *> vec = CurrentLoop->getSubLoopsVector();
				while (vec.size() != 0) {
				// The current loop has multiple subloops in it hence it is not tightly
				// nested.
				// Discard all loops above it added into Worklist.
				if (vec.size() != 1) {
				LoopList.clear();
				return;
				}
				hfinkelUnsubmitted Not Done Reply Inline Actions What are you actually trying to check here? Instructions with side effects? Maybe you want I->mayHaveSideEffects()? hfinkel: What are you actually trying to check here? Instructions with side effects? Maybe you want I…
				karthikthecoolAuthorUnsubmitted Not Done Reply Inline Actions Hi Hal, The way i'm trying to conclude that a loop is tightly nested is as follows- There should not be any extra block between the outer loop and inner loop. (i.e. in this case the outer loop header would branch to inner loop preheader/inner loop body && the other branch in the header would go to the outer loop latch). With this check we can catch loops which have a block inbetween outer and inner loop such as - for(int i=0;i<N;i++) { if(X) { } for(int j=0;j<N;j++) { } } and conclude these as not tightly nested. Second type of non nested loops can be- for(int i=0;i<N;i++) { a = i; k = A[i]; for(int j=0;j<N;j++) { } } these kind of loops will be caught by the second check which check we have a single use of indvar in latch or header which is the operand to Induction Phi(i.e used to increment/decrement the loop counter). I have modified this function a bit in the updated patch. In the 3rd case i was trying to catch loops such as - for(int i=0;i<N;i++) { foo(); for(int j=0;j<N;j++) { } } I think we can do it using a combination of mayHaveSideEffects and mayReadFromMemory. Updated the patch. karthikthecool: Hi Hal, The way i'm trying to conclude that a loop is tightly nested is as follows- 1) There…
				LoopList.push_back(CurrentLoop);
				CurrentLoop = *(vec.begin());
				vec = CurrentLoop->getSubLoopsVector();
				}
				LoopList.push_back(CurrentLoop);
				V.push_back(LoopList);
				}

				static PHINode getInductionVariable(Loop L, ScalarEvolution *SE) {
				PHINode *InnerIndexVar = L->getCanonicalInductionVariable();
				if (InnerIndexVar)
				return InnerIndexVar;
				hfinkelUnsubmitted Not Done Reply Inline Actions These checks look identical to those above, please make a function (a lambda function is fine). hfinkel: These checks look identical to those above, please make a function (a lambda function is fine).
				karthikthecoolAuthorUnsubmitted Not Done Reply Inline Actions Done. karthikthecool: Done.
				if (L->getLoopLatch() == nullptr \|\| L->getLoopPredecessor() == nullptr)
				return nullptr;
				for (BasicBlock::iterator I = L->getHeader()->begin(); isa<PHINode>(I); ++I) {
				PHINode *PhiVar = cast<PHINode>(I);
				Type *PhiTy = PhiVar->getType();
				if (!PhiTy->isIntegerTy() && !PhiTy->isFloatingPointTy() &&
				!PhiTy->isPointerTy())
				return nullptr;
				const SCEVAddRecExpr *AddRec =
				dyn_cast<SCEVAddRecExpr>(SE->getSCEV(PhiVar));
				if (!AddRec \|\| !AddRec->isAffine())
				continue;
				const SCEV Step = AddRec->getStepRecurrence(SE);
				const SCEVConstant *C = dyn_cast<SCEVConstant>(Step);
				if (!C)
				continue;
				// Found the induction variable.
				// FIXME: Handle loops with more than one induction variable. Note that,
				// currently, legality makes sure we have only one induction variable.
				return PhiVar;
				}
				return nullptr;
				}

				/// LoopInterchangeLegality checks if it is legal to interchange the loop.
				class LoopInterchangeLegality {
				public:
				LoopInterchangeLegality(Loop Outer, Loop Inner, ScalarEvolution *SE,
				DependenceAnalysis DA, LoopInterchange Pass)
				: OuterLoop(Outer), InnerLoop(Inner), SE(SE), DA(DA), Parent(Pass) {}

				/// Check if the loops can be interchanged.
				bool canInterchangeLoops(unsigned InnerLoopId, unsigned OuterLoopId,
				CharMatrix &DepMatrix);
				/// Check if the loop structure is understood. We do not handle triangular
				/// loops for now.
				bool isLoopStructureUnderstood(PHINode *InnerInductionVar);

				bool currentLimitations();

				private:
				bool checkDependence(Loop Outer, DependenceAnalysis DA);
				bool tightlyNested(Loop Outer, Loop Inner);

				Loop *OuterLoop;
				Loop *InnerLoop;

				/// Scev analysis.
				ScalarEvolution *SE;
				/// Dependence analysis.
				DependenceAnalysis *DA;
				LoopInterchange *Parent;
				};

				hfinkelUnsubmitted Not Done Reply Inline Actions Can you include Src and Des in these debug messages so that we can see what instructions are relevant? hfinkel: Can you include Src and Des in these debug messages so that we can see what instructions are…
				karthikthecoolAuthorUnsubmitted Not Done Reply Inline Actions Done. karthikthecool: Done.
				/// LoopInterchangeProfitability checks if it is profitable to interchange the
				/// loop.
				class LoopInterchangeProfitability {
				public:
				LoopInterchangeProfitability(Loop Outer, Loop Inner, ScalarEvolution *SE)
				: OuterLoop(Outer), InnerLoop(Inner), SE(SE) {}

				/// Check if the loop interchange is profitable
				bool isProfitable(unsigned InnerLoopId, unsigned OuterLoopId,
				CharMatrix &DepMatrix);

				private:
				int getInstrOrderCost();

				Loop *OuterLoop;
				Loop *InnerLoop;

				/// Scev analysis.
				ScalarEvolution *SE;
				};

				/// LoopInterchangeTransform interchanges the loop
				class LoopInterchangeTransform {
				public:
				LoopInterchangeTransform(Loop Outer, Loop Inner, ScalarEvolution *SE,
				LoopInfo LI, DominatorTree DT,
				LoopInterchange Pass, BasicBlock LoopNestExit)
				: OuterLoop(Outer), InnerLoop(Inner), SE(SE), LI(LI), DT(DT),
				Parent(Pass), LoopExit(LoopNestExit) {
				initialize();
				}

				/// Interchange OuterLoop and InnerLoop.
				bool transform();
				void restructureLoops(Loop InnerLoop, Loop OuterLoop);
				void removeChildLoop(Loop OuterLoop, Loop InnerLoop);
				void initialize();

				private:
				void splitInnerLoopLatch(Instruction *);
				void splitOuterLoopLatch();
				void splitInnerLoopHeader();
				bool adjustLoopLinks();
				void adjustLoopPreheaders();
				void adjustOuterLoopPreheader();
				void adjustInnerLoopPreheader();
				bool adjustLoopBranches();

				Loop *OuterLoop;
				Loop *InnerLoop;

				/// Scev analysis.
				hfinkelUnsubmitted Not Done Reply Inline Actions How are you checking for reductions here? Do you need to check that the one PHI you've found is not used outside of the loop? hfinkel: How are you checking for reductions here? Do you need to check that the one PHI you've found is…
				karthikthecoolAuthorUnsubmitted Not Done Reply Inline Actions We are currently checking if there is only 1 PHI node in the lop header which will corrospond to the induction variable. If we find any other PHI's either due to reductions or triangular loop structure. We currently exit as current limiation. karthikthecool: We are currently checking if there is only 1 PHI node in the lop header which will corrospond…
				hfinkelUnsubmitted Not Done Reply Inline Actions No, I mean uses outside of the loops in general. I don't think you check for that. You check for: if (numUsageinLatch + numUsageinHeader != 1) return false; but the PHI could be used in any block dominated by the loop. Do you need to check for that? int i, j; for (int i = 0; i < n; ++i) for (int j = 0; j < m; ++j) a[i][j] = 7; cout << "final i, j = " << i << ", " << j << "\n"; hfinkel: No, I mean uses outside of the loops in general. I don't think you check for that. You check…
				karthikthecoolAuthorUnsubmitted Not Done Reply Inline Actions Hi Hal, The way i was handling this was- since the loops were tightly coupled we were getting the lcssa phi for these loops in the outer loop latch which i was splitting and moving outside loop. I was able to get the correct value for i and j in this case. But i think i can add check to avoid these cases as well. Since we do not want uses outside loop is it ok to have a check like- if (isa<PHINode>(InnerLoopLatch->begin())) return false; if (isa<PHINode>(OuterLoopLatch->begin())) return false; This will make sure we do not have any outside uses defined inside the loop. Does this check look good. Will this make this transform too restrictive? Thanks for answering my silly queries i'm still getting hold of loop optimizations. Regards Karthik Bhat karthikthecool: Hi Hal, The way i was handling this was- since the loops were tightly coupled we were getting…
				hfinkelUnsubmitted Not Done Reply Inline Actions Ah, you're right. I think that, given our current restrictions, the final values outside the loop nest will always be the same, so this is fine. (we should have a regression test showing that we still can interchange in this case). hfinkel: Ah, you're right. I think that, given our current restrictions, the final values outside the…
				ScalarEvolution *SE;
				LoopInfo *LI;
				DominatorTree *DT;
				LoopInterchange *Parent;
				BasicBlock *LoopExit;
				};

				// Main LoopInterchange Pass
				struct LoopInterchange : public FunctionPass {
				static char ID;
				ScalarEvolution *SE;
				LoopInfo *LI;
				DependenceAnalysis *DA;
				DominatorTree *DT;
				LoopInterchange()
				: FunctionPass(ID), SE(nullptr), LI(nullptr), DA(nullptr), DT(nullptr) {
				initializeLoopInterchangePass(*PassRegistry::getPassRegistry());
				}

				void getAnalysisUsage(AnalysisUsage &AU) const override {
				AU.addRequired<ScalarEvolution>();
				AU.addRequired<AliasAnalysis>();
				AU.addRequired<DominatorTreeWrapperPass>();
				AU.addRequired<LoopInfoWrapperPass>();
				AU.addRequired<DependenceAnalysis>();
				AU.addRequiredID(LoopSimplifyID);
				AU.addRequiredID(LCSSAID);
				}

				bool runOnFunction(Function &F) override {
				SE = &getAnalysis<ScalarEvolution>();
				hfinkelUnsubmitted Not Done Reply Inline Actions Why are you only counting uses in the latch block? Should the increment be in some other block, then what? hfinkel: Why are you only counting uses in the latch block? Should the increment be in some other block…
				karthikthecoolAuthorUnsubmitted Not Done Reply Inline Actions We count the uses in loop latch only as we split the latch based on this instruction. This was done because the mentioned example generated code as - for.body3: ; preds = %for.body3, %for.body3.lr.ph %j.018 = phi i32 [ 0, %for.body3.lr.ph ], [ %add6, %for.body3 ] %arrayidx4 = getelementptr inbounds [100 x [100 x i32]]* @A, i32 0, i32 %j.018, i32 %i.020 %5 = load i32* %arrayidx4, align 4, !tbaa !1 %add = add nsw i32 %3, %5 %add6 = add nuw nsw i32 %j.018, 1 %arrayidx8 = getelementptr inbounds [100 x [100 x i32]]* @A, i32 0, i32 %add6, i32 %add5 store i32 %add, i32* %arrayidx8, align 4, !tbaa !1 %exitcond = icmp eq i32 %j.018, %4 br i1 %exitcond, label %for.inc9.loopexit, label %for.body3 since we cannot split at %add6 = add nuw nsw i32 %j.018, 1 we give up in this case. But now that i think about it counting uses may not be the right method to check if we can split the inner loop latch. Consider the following valid loop were we fail with this check- for(int i=0;i<100;i++) for(int j=0;j<100;j++) A[j][i] = A[j][i]+k; here we get the inner loop latch as - for.body3: ; preds = %for.body3, %for.cond1.preheader %j.015 = phi i32 [ 0, %for.cond1.preheader ], [ %inc, %for.body3 ] %arrayidx4 = getelementptr inbounds [100 x [100 x i32]]* @A, i32 0, i32 %j.015, i32 %i.016 %1 = load i32* %arrayidx4, align 4, !tbaa !1 %add = add nsw i32 %0, %1 store i32 %add, i32* %arrayidx4, align 4, !tbaa !1 %inc = add nuw nsw i32 %j.015, 1 %exitcond = icmp eq i32 %inc, 100 br i1 %exitcond, label %for.inc7, label %for.body3 This could have been splitted at %inc = add nuw nsw i32 %j.015, 1 but we fail as we find more than 1 uses. Modified the logic to check tightly grouped inner loop latch which can be splitted. karthikthecool: We count the uses in loop latch only as we split the latch based on this instruction. This was…
				LI = &getAnalysis<LoopInfoWrapperPass>().getLoopInfo();
				DA = &getAnalysis<DependenceAnalysis>();
				auto *DTWP = getAnalysisIfAvailable<DominatorTreeWrapperPass>();
				DT = DTWP ? &DTWP->getDomTree() : nullptr;
				// Build up a worklist of loop pairs to analyze.
				SmallVector<LoopVector, 8> Worklist;

				for (Loop L : LI)
				populateWorklist(*L, Worklist);

				DEBUG(dbgs() << "Worklist size = " << Worklist.size() << "\n");
				bool Changed = true;
				while (!Worklist.empty()) {
				hfinkelUnsubmitted Not Done Reply Inline Actions Let's say, "Inner or outer loops lack a preheader"? Also, for the future, adding a preheader when one is not present is pretty easy (you just need to call InsertPreheaderForLoop from llvm/Transforms/Utils/LoopUtils.h), we this is a limitation that should be removed sooner rather than later (although after the initial commit is okay). hfinkel: Let's say, "Inner or outer loops lack a preheader"? Also, for the future, adding a preheader…
				karthikthecoolAuthorUnsubmitted Not Done Reply Inline Actions Updated code to add a preheader when not present. karthikthecool: Updated code to add a preheader when not present.
				LoopVector LoopList = Worklist.pop_back_val();
				Changed = processLoopList(LoopList);
				}
				return Changed;
				}

				bool isComputableLoopNest(LoopVector LoopList) {
				for (auto I = LoopList.begin(), E = LoopList.end(); I != E; ++I) {
				Loop L = I;
				const SCEV *ExitCountOuter = SE->getBackedgeTakenCount(L);
				if (ExitCountOuter == SE->getCouldNotCompute()) {
				DEBUG(dbgs() << "Couldn't compute Backedge count\n");
				return false;
				}
				if (L->getNumBackEdges() != 1) {
				DEBUG(dbgs() << "NumBackEdges is not equal to 1\n");
				return false;
				}
				if (!L->getExitingBlock()) {
				DEBUG(dbgs() << "Loop Doesn't have unique exit block\n");
				return false;
				}
				}
				return true;
				}

				unsigned selectLoopForInterchange(LoopVector LoopList) {
				// TODO: Add a better heuristic to select the loop to be interchanged based
				// on the dependece matrix. Currently we select the innermost loop.
				return LoopList.size() - 1;
				}

				bool processLoopList(LoopVector LoopList) {
				bool Changed = false;
				bool containsLCSSAPHI = false;
				CharMatrix DependencyMatrix;
				if (LoopList.size() < 2) {
				DEBUG(dbgs() << "Loop doesn't contain minimum nesting level.\n");
				return false;
				}
				if (!isComputableLoopNest(LoopList)) {
				hfinkelUnsubmitted Not Done Reply Inline Actions if (GetElementPtrInst GEP = dyn_cast<GetElementPtrInst>(UseInstr)) { ... } hfinkel:* if (GetElementPtrInst *GEP = dyn_cast<GetElementPtrInst>(UseInstr)) { ... }
				karthikthecoolAuthorUnsubmitted Not Done Reply Inline Actions Done. karthikthecool: Done.
				DEBUG(dbgs() << "Not vaild loop candidate for interchange\n");
				return false;
				hfinkelUnsubmitted Not Done Reply Inline Actions What happens if it is not the IV directly, but some expression of the IV? I think you'd be better off using ScalarEvolution here, get the AddRec of the GEP, and see if the "outer" AddRec is provided in terms of the SCEV of the IV (or something like that). hfinkel: What happens if it is not the IV directly, but some expression of the IV? I think you'd be…
				karthikthecoolAuthorUnsubmitted Not Done Reply Inline Actions Updated code to use SCEV to get the loop from which we get the operand to decide it is a good or bad load. Able to handle code like- for(i=0;i<N;i+=1) for(j=0;j<N;j++) A[j-1][i-1] = A[j-1][i-1]+C[j-1][i-1]; after change. This now gets vectorized after interchange. karthikthecool: Updated code to use SCEV to get the loop from which we get the operand to decide it is a good…
				}
				Loop OuterMostLoop = (LoopList.begin());

				DEBUG(dbgs() << "Processing LoopList of size = " << LoopList.size()
				<< "\n");

				if (!populateDependencyMatrix(DependencyMatrix, LoopList.size(),
				OuterMostLoop, DA)) {
				DEBUG(dbgs() << "Populating Dependency matrix failed\n");
				return false;
				}
				#ifdef ENABLE_DEBUGGING
				hfinkelUnsubmitted Not Done Reply Inline Actions ENABLE_DEBUGGING is too generic for this. How about calling this: DUMP_DEP_MATRICIES hfinkel: ENABLE_DEBUGGING is too generic for this. How about calling this: DUMP_DEP_MATRICIES
				DEBUG(dbgs() << "Dependence before inter change \n");
				printDepMatrix(DependencyMatrix);
				#endif

				BasicBlock *OuterMostLoopLatch = OuterMostLoop->getLoopLatch();
				BranchInst *OuterMostLoopLatchBI =
				dyn_cast<BranchInst>(OuterMostLoopLatch->getTerminator());
				if (!OuterMostLoopLatchBI)
				return false;

				// Since we currently do not handle LCSSA PHI's any failure in loop
				// condition will now branch to LoopNestExit.
				// TODO: This should be removed once we handle LCSSA PHI nodes.

				// Get the Outermost loop exit.
				BasicBlock *LoopNestExit;
				if (OuterMostLoopLatchBI->getSuccessor(0) == OuterMostLoop->getHeader())
				LoopNestExit = OuterMostLoopLatchBI->getSuccessor(1);
				else
				LoopNestExit = OuterMostLoopLatchBI->getSuccessor(0);

				for (auto I = LoopList.begin(), E = LoopList.end(); I != E; ++I) {
				Loop L = I;
				BasicBlock *Latch = L->getLoopLatch();
				BasicBlock *Header = L->getHeader();
				if (Latch && Latch != Header && isa<PHINode>(Latch->begin())) {
				containsLCSSAPHI = true;
				break;
				}
				}

				// TODO: Handle lcssa PHI's.
				hfinkelUnsubmitted Not Done Reply Inline Actions Please make this TODO more specific. What happens now and what should happen instead? hfinkel: Please make this TODO more specific. What happens now and what should happen instead?
				if (containsLCSSAPHI)
				return false;

				unsigned SelecLoopId = selectLoopForInterchange(LoopList);
				hfinkelUnsubmitted Not Done Reply Inline Actions Use SplitBlock from include/llvm/Transforms/Utils/BasicBlockUtils.h? (same for other functions below)? hfinkel: Use SplitBlock from include/llvm/Transforms/Utils/BasicBlockUtils.h? (same for other functions…
				karthikthecoolAuthorUnsubmitted Not Done Reply Inline Actions Updated code. karthikthecool: Updated code.
				// Move the selected loop outwards to the best posible position.
				for (unsigned i = SelecLoopId; i > 0; i--) {
				bool Interchanged =
				processLoop(LoopList, i, i - 1, LoopNestExit, DependencyMatrix);
				if (!Interchanged)
				return Changed;
				// Loops interchanged reflect the same in LoopList
				Loop *OldOuterLoop = LoopList[i - 1];
				LoopList[i - 1] = LoopList[i];
				LoopList[i] = OldOuterLoop;

				// Update the DependencyMatrix
				interChangeDepedencies(DependencyMatrix, i, i - 1);

				#ifdef ENABLE_DEBUGGING
				DEBUG(dbgs() << "Dependence after inter change \n");
				printDepMatrix(DependencyMatrix);
				#endif
				Changed \|= Interchanged;
				}
				return Changed;
				}

				bool processLoop(LoopVector LoopList, unsigned InnerLoopId,
				unsigned OuterLoopId, BasicBlock *LoopNestExit,
				std::vector<std::vector<char>> &DependencyMatrix) {

				DEBUG(dbgs() << "Processing Innder Loop Id = " << InnerLoopId
				<< " and OuterLoopId = " << OuterLoopId << "\n");
				Loop *InnerLoop = LoopList[InnerLoopId];
				Loop *OuterLoop = LoopList[OuterLoopId];

				LoopInterchangeLegality LIL(OuterLoop, InnerLoop, SE, DA, this);
				if (!LIL.canInterchangeLoops(InnerLoopId, OuterLoopId, DependencyMatrix)) {
				DEBUG(dbgs() << "Not interchanging Loops. Cannot prove legality\n");
				return false;
				}
				DEBUG(dbgs() << "Loops are legal to interchange\n");
				LoopInterchangeProfitability LIP(OuterLoop, InnerLoop, SE);
				if (!LIP.isProfitable(InnerLoopId, OuterLoopId, DependencyMatrix)) {
				DEBUG(dbgs() << "Interchanging Loops not profitable\n");
				return false;
				}

				LoopInterchangeTransform LIT(OuterLoop, InnerLoop, SE, LI, DT, this,
				LoopNestExit);
				LIT.transform();
				DEBUG(dbgs() << "Loops interchanged\n");
				return true;
				}
				};

				} // end of namespace

				static bool containsUnsafeInstructions(BasicBlock *BB) {
				for (auto I = BB->begin(), E = BB->end(); I != E; ++I) {
				if (I->mayHaveSideEffects() \|\| I->mayReadFromMemory())
				return true;
				}
				return false;
				}

				bool LoopInterchangeLegality::tightlyNested(Loop OuterLoop, Loop InnerLoop) {
				BasicBlock *OuterLoopHeader = OuterLoop->getHeader();
				BasicBlock *InnerLoopPreHeader = InnerLoop->getLoopPreheader();
				BasicBlock *OuterLoopLatch = OuterLoop->getLoopLatch();

				DEBUG(dbgs() << "Checking if Loops are Tightly Nested\n");

				// A perfectly nested loop will not have any branch in between the outer and
				// inner block i.e. outer header will branch to either inner preheader and
				// outerloop latch.
				BranchInst *outerLoopHeaderBI =
				dyn_cast<BranchInst>(OuterLoopHeader->getTerminator());
				if (!outerLoopHeaderBI)
				return false;
				unsigned num = outerLoopHeaderBI->getNumSuccessors();
				for (unsigned i = 0; i < num; i++) {
				if (outerLoopHeaderBI->getSuccessor(i) != InnerLoopPreHeader &&
				outerLoopHeaderBI->getSuccessor(i) != OuterLoopLatch)
				return false;
				}

				DEBUG(dbgs() << "Checking instructions in Loop header and Loop latch \n");
				// We do not have any basic block in between now make sure the outer header
				// and outer loop latch doesnt contain any unsafe instructions.
				if (containsUnsafeInstructions(OuterLoopHeader) \|\|
				containsUnsafeInstructions(OuterLoopLatch))
				return false;

				DEBUG(dbgs() << "Loops are perfectly nested \n");
				// We have a perfect loop nest.
				return true;
				}

				bool LoopInterchangeLegality::checkDependence(Loop *Outer,
				DependenceAnalysis *DA) {

				typedef SmallVector<Value *, 16> ValueVector;
				// Holds Load and Store instructions.
				ValueVector MemInstr;
				// For each block.
				hfinkelUnsubmitted Not Done Reply Inline Actions No need for { } here. hfinkel: No need for { } here.
				karthikthecoolAuthorUnsubmitted Not Done Reply Inline Actions Done. karthikthecool: Done.
				for (Loop::block_iterator BI = Outer->block_begin(), BE = Outer->block_end();
				BI != BE; ++BI) {
				// Scan the BB and collect legal loads and stores.
				for (BasicBlock::iterator I = (BI)->begin(), E = (BI)->end(); I != E;
				++I) {
				Instruction *Ins = dyn_cast<Instruction>(I);
				if (!Ins)
				return false;
				LoadInst *Ld = dyn_cast<LoadInst>(I);
				StoreInst *St = dyn_cast<StoreInst>(I);
				if (!St && !Ld)
				continue;
				if (Ld && !Ld->isSimple())
				return false;
				if (St && !St->isSimple())
				return false;
				MemInstr.push_back(Ins);
				}
				}

				DEBUG(dbgs() << "Found " << MemInstr.size()
				<< " Loads and stores to analyze\n");

				ValueVector::iterator I, IE, J, JE;
				for (I = MemInstr.begin(), IE = MemInstr.end(); I != IE; ++I) {
				for (J = I, JE = MemInstr.end(); J != JE; ++J) {
				Instruction Src = cast<Instruction>(I);
				Instruction Des = cast<Instruction>(J);
				if (Src == Des)
				continue;
				DEBUG(dbgs() << "Checking Depencency between Src " << Src << " and Des"
				<< Des << "\n");

				if (auto D = DA->depends(Src, Des, true)) {
				// TODO: Fix his handle only anti/output dep for now.
				hfinkelUnsubmitted Not Done Reply Inline Actions Please make this more specific. We do handle anti deps. What needs to happen? hfinkel: Please make this more specific. We do handle anti deps. What needs to happen?
				if (D->isFlow()) {
				// TODO: Flow dependency can be interchanged??
				DEBUG(dbgs() << "Flow dependence not handled");
				return false;
				}
				if (D->isAnti()) {
				DEBUG(dbgs() << "Found Anti dependence \n");
				unsigned Levels = D->getLevels();

				// If the two memory instructions have an anti dependence check
				// the distance or the direction by which they vary.
				// Interchanging two loops with anti dependence is valid if the
				// dependence distance is not positive in each level.
				for (unsigned II = 1; II <= Levels; ++II) {
				const SCEV *Distance = D->getDistance(II);
				const SCEVConstant *SCEVConst =
				dyn_cast_or_null<SCEVConstant>(Distance);
				if (SCEVConst) {
				const ConstantInt *CI = SCEVConst->getValue();
				if (!CI \|\| (!CI->isNegative() && !CI->isZeroValue()))
				return false;
				} else if (D->isScalar(II)) {
				DEBUG(dbgs()
				<< "TODO:Scalars dependence are currently not handled\n");
				return false;
				} else {
				unsigned Direction = D->getDirection(II);
				if (Direction == Dependence::DVEntry::LT \|\|
				Direction == Dependence::DVEntry::LE \|\|
				Direction == Dependence::DVEntry::EQ)
				continue;
				return false;
				}
				}
				}
				}
				}
				}
				hfinkelUnsubmitted Not Done Reply Inline Actions Use llvm_unreachable, not assert(0 && hfinkel: Use llvm_unreachable, not assert(0 &&
				karthikthecoolAuthorUnsubmitted Not Done Reply Inline Actions Done. karthikthecool: Done.

				return true;
				}

				static unsigned getPHICount(BasicBlock *BB) {
				unsigned PhiCount = 0;
				for (auto I = BB->begin(); isa<PHINode>(I); ++I)
				PhiCount++;
				return PhiCount;
				}

				bool LoopInterchangeLegality::isLoopStructureUnderstood(
				PHINode *InnerInduction) {

				unsigned Num = InnerInduction->getNumOperands();
				BasicBlock *InnerLoopPreheader = InnerLoop->getLoopPreheader();
				for (unsigned i = 0; i < Num; ++i) {
				Value *Val = InnerInduction->getOperand(i);
				if (isa<Constant>(Val))
				continue;
				Instruction *I = dyn_cast<Instruction>(Val);
				if (!I)
				return false;
				// TODO: Handle triangular loops.
				// e.g. for(int i=0;i<N;i++)
				// for(int j=i;j<N;j++)
				unsigned IncomBlockIndx = PHINode::getIncomingValueNumForOperand(i);
				if (InnerInduction->getIncomingBlock(IncomBlockIndx) ==
				InnerLoopPreheader &&
				!OuterLoop->isLoopInvariant(I)) {
				return false;
				}
				}
				return true;
				}

				// This function indicates the current limitations in the transform as a result
				// of which we do not proceed.
				bool LoopInterchangeLegality::currentLimitations() {

				BasicBlock *InnerLoopPreHeader = InnerLoop->getLoopPreheader();
				BasicBlock *InnerLoopHeader = InnerLoop->getHeader();
				BasicBlock *OuterLoopHeader = OuterLoop->getHeader();
				BasicBlock *InnerLoopLatch = InnerLoop->getLoopLatch();
				BasicBlock *OuterLoopLatch = OuterLoop->getLoopLatch();

				PHINode *InnerInductionVar;
				PHINode *OuterInductionVar;

				// We currently handle only 1 induction variable inside the loop. We also do
				// not handle reductions as of now.
				if (getPHICount(InnerLoopHeader) > 1)
				return true;

				if (getPHICount(OuterLoopHeader) > 1)
				return true;

				InnerInductionVar = getInductionVariable(InnerLoop, SE);
				OuterInductionVar = getInductionVariable(OuterLoop, SE);

				if (!OuterInductionVar \|\| !InnerInductionVar) {
				DEBUG(dbgs() << "Induction variable not found\n");
				hfinkelUnsubmitted Not Done Reply Inline Actions Why? hfinkel: Why?
				return true;
				}

				// TODO: Triangular loops are not handled for now.
				if (!isLoopStructureUnderstood(InnerInductionVar)) {
				DEBUG(dbgs() << "Loop structure not understood by pass\n");
				return true;
				}

				// TODO: Loops with LCSSA PHI's are currently not handled.
				if (isa<PHINode>(OuterLoopLatch->begin())) {
				DEBUG(dbgs() << "Found and LCSSA PHI in outer loop latch\n");
				return true;
				}
				if (InnerLoopLatch != InnerLoopHeader &&
				isa<PHINode>(InnerLoopLatch->begin())) {
				DEBUG(dbgs() << "Found and LCSSA PHI in inner loop latch\n");
				return true;
				}

				// TODO: Current limitation: Since we split the inner loop latch at the point
				// were induction variable is incremented (induction.next); We cannot have
				// more than 1 user of induction.next since it would result in broken code
				// after split.
				// e.g.
				// for(i=0;i<N;i++) {
				// for(j = 0;j<M;j++) {
				// A[j+1][i+2] = A[j][i]+k;
				// }
				// }
				hfinkelUnsubmitted Not Done Reply Inline Actions PHIs are always at the beginning of the block; once you hit the first non-PHI, you can exit the loop (you should never find another). hfinkel: PHIs are always at the beginning of the block; once you hit the first non-PHI, you can exit the…
				karthikthecoolAuthorUnsubmitted Not Done Reply Inline Actions Yes you are right. Modified code. karthikthecool: Yes you are right. Modified code.
				bool FoundInduction = false;
				Instruction *InnerIndexVarInc = nullptr;
				if (InnerInductionVar->getIncomingBlock(0) == InnerLoopPreHeader)
				InnerIndexVarInc =
				dyn_cast<Instruction>(InnerInductionVar->getIncomingValue(1));
				else
				InnerIndexVarInc =
				dyn_cast<Instruction>(InnerInductionVar->getIncomingValue(0));

				if (!InnerIndexVarInc)
				return true;

				// Since we split the inner loop latch on this induction variable. Make sure
				// we do not have any instruction between the induction variable and branch
				// instruction.

				for (auto I = InnerLoopLatch->rbegin(), E = InnerLoopLatch->rend();
				I != E && !FoundInduction; ++I) {
				if (isa<BranchInst>(I) \|\| isa<CmpInst>(I) \|\| isa<TruncInst>(*I))
				continue;
				const Instruction &Ins = *I;
				// We found an instruction. If this is not induction variable then it is not
				// safe to split this loop latch.
				if (!Ins.isIdenticalTo(InnerIndexVarInc))
				return true;
				else
				FoundInduction = true;
				}
				// The loop latch ended and we didnt find the induction variable return as
				// current limitation.
				if (!FoundInduction)
				return true;

				return false;
				}

				bool LoopInterchangeLegality::canInterchangeLoops(unsigned InnerLoopId,
				unsigned OuterLoopId,
				hfinkelUnsubmitted Not Done Reply Inline Actions I don't really understand this comment. I think we can assume that LICM has run first. (and if this pass detects loop-invariant code better than LICM, that is another problem to fix, but not here). hfinkel: I don't really understand this comment. I think we can assume that LICM has run first. (and if…
				karthikthecoolAuthorUnsubmitted Not Done Reply Inline Actions Hi Hal, Consider the below code of matrix multiplication- for(int i=0;i<N;i++) for(int j=0;j<N;j++) for(int k=0;k<N;k++) A[i][j]= A[i][j]+B[i][k]C[k][j] In this example the direction vector would be - [= = \|<] (i.e. '=' dependency in i, '=' dependency in j and is loop independent dependency in k). The LICM pass would move getElementPointer for A[i][j] outside the inner loop but it cannot move the complete statement outside the inner loop. Now since vectorizer only works on inner loop. The above code is not vectorized for i,j. But if we interchange the loops to - for(int k=0;k<N;k++) for(int i=0;i<N;i++) for(int j=0;j<N;j++) A[i][j]= A[i][j]+B[i][k]C[k][j] now the loop gets vectorized. It is mostly profitable to keep loop independent dependencies such as the above at the outermost possible level. We try to achieve the same here. karthikthecool: Hi Hal, Consider the below code of matrix multiplication- for(int i=0;i<N;i++) for(int…
				CharMatrix &DepMatrix) {

				if (!isLegalToInterChangeLoops(DepMatrix, InnerLoopId, OuterLoopId)) {
				DEBUG(dbgs() << "Failed interchange InnerLoopId = " << InnerLoopId
				<< "and OuterLoopId = " << OuterLoopId
				<< "due to dependence\n");
				return false;
				}

				// Create unique Preheaders if we already do not have one.
				BasicBlock *OuterLoopPreHeader = OuterLoop->getLoopPreheader();
				BasicBlock *InnerLoopPreHeader = InnerLoop->getLoopPreheader();

				// Create a unique outer preheader -
				// 1) If OuterLoop preheader is not present.
				// 2) If OuterLoop Preheader is same as OuterLoop Header
				// 3) If OuterLoop Preheader is same as Header of the previous loop.
				// 4) If OuterLoop Preheader is Entry node.
				if (!OuterLoopPreHeader \|\| OuterLoopPreHeader == OuterLoop->getHeader() \|\|
				isa<PHINode>(OuterLoopPreHeader->begin()) \|\|
				!OuterLoopPreHeader->getUniquePredecessor()) {
				OuterLoopPreHeader = InsertPreheaderForLoop(OuterLoop, Parent);
				}

				if (!InnerLoopPreHeader \|\| InnerLoopPreHeader == InnerLoop->getHeader() \|\|
				InnerLoopPreHeader == OuterLoop->getHeader()) {
				InnerLoopPreHeader = InsertPreheaderForLoop(InnerLoop, Parent);
				}

				// Check if the loops are tightly nested.
				if (!tightlyNested(OuterLoop, InnerLoop)) {
				DEBUG(dbgs() << "Loops not tightly nested\n");
				return false;
				}

				// TODO: The loops could not be interchanged due to current limitations in the
				// transform module.
				if (currentLimitations()) {
				DEBUG(dbgs() << "Not legal because of current transform limitation\n");
				return false;
				}

				return true;
				}

				int LoopInterchangeProfitability::getInstrOrderCost() {
				unsigned GoodOrder, BadOrder;
				BadOrder = GoodOrder = 0;
				for (auto BI = InnerLoop->block_begin(), BE = InnerLoop->block_end();
				BI != BE; ++BI) {
				for (auto I = (BI)->begin(), E = (BI)->end(); I != E; ++I) {
				const Instruction &Ins = *I;
				if (const GetElementPtrInst *GEP = dyn_cast<GetElementPtrInst>(&Ins)) {
				unsigned NumOp = GEP->getNumOperands();
				bool FoundInnerInduction = false;
				bool FoundOuterInduction = false;
				for (unsigned i = 0; i < NumOp; ++i) {
				const SCEV *OperandVal = SE->getSCEV(GEP->getOperand(i));
				const SCEVAddRecExpr *AR = dyn_cast<SCEVAddRecExpr>(OperandVal);
				if (!AR)
				continue;
				hfinkelUnsubmitted Not Done Reply Inline Actions No need for the { } hfinkel: No need for the { }
				karthikthecoolAuthorUnsubmitted Not Done Reply Inline Actions Done. karthikthecool: Done.

				// If we find the inner induction after an outer induction e.g.
				// for(int i=0;i<N;i++)
				// for(int j=0;j<N;j++)
				// A[i][j] = A[i-1][j-1]+k;
				// then it is a good order.
				if (AR->getLoop() == InnerLoop) {
				// We found an InnerLoop induction after OuterLoop induction. It is
				// a good order.
				FoundInnerInduction = true;
				if (FoundOuterInduction) {
				GoodOrder++;
				break;
				}
				}
				// If we find the outer induction after an inner induction e.g.
				// for(int i=0;i<N;i++)
				// for(int j=0;j<N;j++)
				// A[j][i] = A[j-1][i-1]+k;
				// then it is a bad order.
				if (AR->getLoop() == OuterLoop) {
				// We found an OuterLoop induction after InnerLoop induction. It is
				// a bad order.
				FoundOuterInduction = true;
				if (FoundInnerInduction) {
				BadOrder++;
				break;
				}
				}
				}
				}
				}
				}
				return GoodOrder - BadOrder;
				}

				bool isProfitabileForVectorization(unsigned InnerLoopId, unsigned OuterLoopId,
				CharMatrix &DepMatrix) {
				// TODO: Improve this heuristic to catch more cases.
				// If the inner loop is loop independent or doesn't carry any dependency it is
				// profitable to move this to outer position.
				unsigned Row = DepMatrix.size();
				for (unsigned i = 0; i < Row; ++i) {
				if (DepMatrix[i][InnerLoopId] != 'S' && DepMatrix[i][InnerLoopId] != 'I')
				return false;
				// TODO: We need to improve this heuristic.
				if (DepMatrix[i][OuterLoopId] != '=')
				return false;
				}
				// If outer loop has dependence and inner loop is loop independent then it is
				// profitable to interchange to enable parallelism.
				return true;
				}

				bool LoopInterchangeProfitability::isProfitable(unsigned InnerLoopId,
				unsigned OuterLoopId,
				CharMatrix &DepMatrix) {

				// TODO: Add Better Profitibility checks.
				// e.g
				// 1) Construct dependency matrix and move the one with no loop carried dep
				// inside to enable vectorization.

				// This is rough cost estimation algorithm. It counts the good and bad order
				// of induction variables in the instruction and allows reordering if number
				// of bad orders is more than good.
				int Cost = 0;
				Cost += getInstrOrderCost();
				DEBUG(dbgs() << "Cost = " << Cost << "\n");
				if (Cost < 0)
				return true;

				// It is not profitable as per current cache profitibility model. But check if
				// we can move this loop outside to improve parallelism.
				bool ImprovesPar =
				isProfitabileForVectorization(InnerLoopId, OuterLoopId, DepMatrix);
				return ImprovesPar;
				}

				void LoopInterchangeTransform::removeChildLoop(Loop *OuterLoop,
				Loop *InnerLoop) {
				for (Loop::iterator I = OuterLoop->begin(), E = OuterLoop->end();; ++I) {
				assert(I != E && "Couldn't find loop");
				if (*I == InnerLoop) {
				OuterLoop->removeChildLoop(I);
				return;
				}
				}
				}
				void LoopInterchangeTransform::restructureLoops(Loop *InnerLoop,
				Loop *OuterLoop) {
				Loop *OuterLoopParent = OuterLoop->getParentLoop();
				if (OuterLoopParent) {
				// Remove the loop from its parent loop.
				removeChildLoop(OuterLoopParent, OuterLoop);
				removeChildLoop(OuterLoop, InnerLoop);
				OuterLoopParent->addChildLoop(InnerLoop);
				} else {
				removeChildLoop(OuterLoop, InnerLoop);
				LI->changeTopLevelLoop(OuterLoop, InnerLoop);
				}

				for (Loop::iterator I = InnerLoop->begin(), E = InnerLoop->end(); I != E; ++I)
				OuterLoop->addChildLoop(InnerLoop->removeChildLoop(I));

				InnerLoop->addChildLoop(OuterLoop);
				}

				bool LoopInterchangeTransform::transform() {

				DEBUG(dbgs() << "transform\n");
				bool Transformed = false;
				Instruction *InnerIndexVar;

				if (InnerLoop->getSubLoops().size() == 0) {
				BasicBlock *InnerLoopPreHeader = InnerLoop->getLoopPreheader();
				DEBUG(dbgs() << "Calling Split Inner Loop\n");
				PHINode *InductionPHI = getInductionVariable(InnerLoop, SE);
				if (!InductionPHI) {
				DEBUG(dbgs() << "Failed to find the point to split loop latch \n");
				return false;
				}

				if (InductionPHI->getIncomingBlock(0) == InnerLoopPreHeader)
				InnerIndexVar = dyn_cast<Instruction>(InductionPHI->getIncomingValue(1));
				else
				InnerIndexVar = dyn_cast<Instruction>(InductionPHI->getIncomingValue(0));

				//
				// Split at the place were the induction variable is
				// incremented/decremented.
				// TODO: This splitting logic may not work always. Fix this.
				splitInnerLoopLatch(InnerIndexVar);
				DEBUG(dbgs() << "splitInnerLoopLatch Done\n");

				// Splits the inner loops phi nodes out into a seperate basic block.
				splitInnerLoopHeader();
				DEBUG(dbgs() << "splitInnerLoopHeader Done\n");
				}

				Transformed \|= adjustLoopLinks();
				if (!Transformed) {
				DEBUG(dbgs() << "adjustLoopLinks Failed\n");
				return false;
				}

				restructureLoops(InnerLoop, OuterLoop);
				return true;
				}

				void LoopInterchangeTransform::initialize() {}

				void LoopInterchangeTransform::splitInnerLoopLatch(Instruction *inc) {

				BasicBlock *InnerLoopLatch = InnerLoop->getLoopLatch();
				BasicBlock::iterator I = InnerLoopLatch->begin();
				BasicBlock::iterator E = InnerLoopLatch->end();
				for (; I != E; ++I) {
				if (inc == I)
				break;
				}

				BasicBlock *InnerLoopLatchPred = InnerLoopLatch;
				InnerLoopLatch = SplitBlock(InnerLoopLatchPred, I, DT, LI);
				}

				void LoopInterchangeTransform::splitOuterLoopLatch() {
				BasicBlock *OuterLoopLatch = OuterLoop->getLoopLatch();
				BasicBlock *OuterLatchLcssaPhiBlock = OuterLoopLatch;
				OuterLoopLatch = SplitBlock(OuterLatchLcssaPhiBlock,
				OuterLoopLatch->getFirstNonPHI(), DT, LI);
				}

				void LoopInterchangeTransform::splitInnerLoopHeader() {

				// Split the inner loop header out.
				BasicBlock *InnerLoopHeader = InnerLoop->getHeader();
				SplitBlock(InnerLoopHeader, InnerLoopHeader->getFirstNonPHI(), DT, LI);

				DEBUG(dbgs() << "Output of splitInnerLoopHeader InnerLoopHeaderSucc & "
				"InnerLoopHeader \n");
				}

				void LoopInterchangeTransform::adjustOuterLoopPreheader() {
				BasicBlock *OuterLoopPreHeader = OuterLoop->getLoopPreheader();
				SmallVector<Instruction *, 8> Inst;
				for (auto I = OuterLoopPreHeader->begin(), E = OuterLoopPreHeader->end();
				I != E; ++I) {
				if (isa<BranchInst>(*I))
				break;
				Inst.push_back(I);
				}

				BasicBlock *InnerPreHeader = InnerLoop->getLoopPreheader();
				for (auto I = Inst.begin(), E = Inst.end(); I != E; ++I) {
				Instruction Ins = cast<Instruction>(I);
				Ins->moveBefore(InnerPreHeader->getTerminator());
				}
				}

				void LoopInterchangeTransform::adjustInnerLoopPreheader() {

				BasicBlock *InnerLoopPreHeader = InnerLoop->getLoopPreheader();
				SmallVector<Instruction *, 8> Inst;
				for (auto I = InnerLoopPreHeader->begin(), E = InnerLoopPreHeader->end();
				I != E; ++I) {
				if (isa<BranchInst>(*I))
				break;
				Inst.push_back(I);
				}
				BasicBlock *OuterHeader = OuterLoop->getHeader();
				for (auto I = Inst.begin(), E = Inst.end(); I != E; ++I) {
				Instruction Ins = cast<Instruction>(I);
				Ins->moveBefore(OuterHeader->getTerminator());
				}
				}

				bool LoopInterchangeTransform::adjustLoopBranches() {

				DEBUG(dbgs() << "adjustLoopBranches called\n");
				// Adjust the loop preheader
				BasicBlock *InnerLoopHeader = InnerLoop->getHeader();
				BasicBlock *OuterLoopHeader = OuterLoop->getHeader();
				BasicBlock *InnerLoopLatch = InnerLoop->getLoopLatch();
				BasicBlock *OuterLoopLatch = OuterLoop->getLoopLatch();
				BasicBlock *OuterLoopPreHeader = OuterLoop->getLoopPreheader();
				BasicBlock *InnerLoopPreHeader = InnerLoop->getLoopPreheader();
				BasicBlock *OuterLoopPredecessor = OuterLoopPreHeader->getUniquePredecessor();
				BasicBlock *InnerLoopLatchPredecessor =
				InnerLoopLatch->getUniquePredecessor();
				BasicBlock *InnerLoopLatchSuccessor;
				BasicBlock *OuterLoopLatchSuccessor;

				BranchInst *OuterLoopLatchBI =
				dyn_cast<BranchInst>(OuterLoopLatch->getTerminator());
				BranchInst *InnerLoopLatchBI =
				dyn_cast<BranchInst>(InnerLoopLatch->getTerminator());
				BranchInst *OuterLoopHeaderBI =
				dyn_cast<BranchInst>(OuterLoopHeader->getTerminator());
				BranchInst *InnerLoopHeaderBI =
				dyn_cast<BranchInst>(InnerLoopHeader->getTerminator());

				if (!OuterLoopPredecessor \|\| !InnerLoopLatchPredecessor \|\|
				!OuterLoopLatchBI \|\| !InnerLoopLatchBI \|\| !OuterLoopHeaderBI \|\|
				!InnerLoopHeaderBI)
				return false;

				BranchInst *InnerLoopLatchPredecessorBI =
				dyn_cast<BranchInst>(InnerLoopLatchPredecessor->getTerminator());
				BranchInst *OuterLoopPredecessorBI =
				dyn_cast<BranchInst>(OuterLoopPredecessor->getTerminator());

				if (!OuterLoopPredecessorBI \|\| !InnerLoopLatchPredecessorBI)
				return false;
				BasicBlock *InnerLoopHeaderSucessor = InnerLoopHeader->getUniqueSuccessor();
				if (!InnerLoopHeaderSucessor)
				return false;

				// Adjust Loop Preheader and headers

				unsigned NumSucc = OuterLoopPredecessorBI->getNumSuccessors();
				for (unsigned i = 0; i < NumSucc; ++i) {
				if (OuterLoopPredecessorBI->getSuccessor(i) == OuterLoopPreHeader)
				OuterLoopPredecessorBI->setSuccessor(i, InnerLoopPreHeader);
				}

				NumSucc = OuterLoopHeaderBI->getNumSuccessors();
				for (unsigned i = 0; i < NumSucc; ++i) {
				if (OuterLoopHeaderBI->getSuccessor(i) == OuterLoopLatch)
				OuterLoopHeaderBI->setSuccessor(i, LoopExit);
				else if (OuterLoopHeaderBI->getSuccessor(i) == InnerLoopPreHeader)
				OuterLoopHeaderBI->setSuccessor(i, InnerLoopHeaderSucessor);
				}

				BranchInst::Create(OuterLoopPreHeader, InnerLoopHeaderBI);
				InnerLoopHeaderBI->eraseFromParent();

				// -------------Adjust loop latches-----------
				if (InnerLoopLatchBI->getSuccessor(0) == InnerLoopHeader)
				InnerLoopLatchSuccessor = InnerLoopLatchBI->getSuccessor(1);
				else
				InnerLoopLatchSuccessor = InnerLoopLatchBI->getSuccessor(0);

				NumSucc = InnerLoopLatchPredecessorBI->getNumSuccessors();
				for (unsigned i = 0; i < NumSucc; ++i) {
				if (InnerLoopLatchPredecessorBI->getSuccessor(i) == InnerLoopLatch)
				InnerLoopLatchPredecessorBI->setSuccessor(i, InnerLoopLatchSuccessor);
				}

				if (OuterLoopLatchBI->getSuccessor(0) == OuterLoopHeader)
				OuterLoopLatchSuccessor = OuterLoopLatchBI->getSuccessor(1);
				else
				OuterLoopLatchSuccessor = OuterLoopLatchBI->getSuccessor(0);

				if (InnerLoopLatchBI->getSuccessor(1) == InnerLoopLatchSuccessor)
				InnerLoopLatchBI->setSuccessor(1, OuterLoopLatchSuccessor);
				else
				InnerLoopLatchBI->setSuccessor(0, OuterLoopLatchSuccessor);

				if (OuterLoopLatchBI->getSuccessor(0) == OuterLoopLatchSuccessor) {
				OuterLoopLatchBI->setSuccessor(0, InnerLoopLatch);
				} else {
				OuterLoopLatchBI->setSuccessor(1, InnerLoopLatch);
				}

				return true;
				}
				void LoopInterchangeTransform::adjustLoopPreheaders() {

				// We have interchanged the preheaders so we need to interchange the data in
				// the preheader as well.
				// This is because the content of inner preheader was previously executed
				// inside the outer loop.
				BasicBlock *OuterLoopPreHeader = OuterLoop->getLoopPreheader();
				BasicBlock *InnerLoopPreHeader = InnerLoop->getLoopPreheader();
				BasicBlock *OuterLoopHeader = OuterLoop->getHeader();
				BranchInst *InnerTermBI =
				cast<BranchInst>(InnerLoopPreHeader->getTerminator());

				SmallVector<Value *, 16> OuterPreheaderInstr;
				SmallVector<Value *, 16> InnerPreheaderInstr;

				for (auto I = OuterLoopPreHeader->begin(); !isa<BranchInst>(I); ++I)
				OuterPreheaderInstr.push_back(I);

				for (auto I = InnerLoopPreHeader->begin(); !isa<BranchInst>(I); ++I)
				InnerPreheaderInstr.push_back(I);

				BasicBlock *HeaderSplit =
				SplitBlock(OuterLoopHeader, OuterLoopHeader->getTerminator(), DT, LI);
				Instruction *InsPoint = HeaderSplit->getFirstNonPHI();
				// These instructions should now be executed inside the loop.
				// Move instruction into a new block after outer header.
				for (auto I = InnerPreheaderInstr.begin(), E = InnerPreheaderInstr.end();
				I != E; ++I) {
				Instruction Ins = cast<Instruction>(I);
				Ins->moveBefore(InsPoint);
				}
				// These instructions were not executed previously in the loop so move them to
				// the older inner loop preheader.
				for (auto I = OuterPreheaderInstr.begin(), E = OuterPreheaderInstr.end();
				I != E; ++I) {
				Instruction Ins = cast<Instruction>(I);
				Ins->moveBefore(InnerTermBI);
				}
				}

				bool LoopInterchangeTransform::adjustLoopLinks() {

				// Adjust all branches in the inner and outer loop.
				bool Changed = adjustLoopBranches();
				if (Changed)
				adjustLoopPreheaders();
				return Changed;
				}

				char LoopInterchange::ID = 0;
				INITIALIZE_PASS_BEGIN(LoopInterchange, "loop-interchange",
				"Interchanges loops for cache reuse", false, false)
				INITIALIZE_AG_DEPENDENCY(AliasAnalysis)
				INITIALIZE_PASS_DEPENDENCY(DependenceAnalysis)
				INITIALIZE_PASS_DEPENDENCY(DominatorTreeWrapperPass)
				INITIALIZE_PASS_DEPENDENCY(ScalarEvolution)
				INITIALIZE_PASS_DEPENDENCY(LoopSimplify)
				INITIALIZE_PASS_DEPENDENCY(LCSSA)
				INITIALIZE_PASS_DEPENDENCY(LoopInfoWrapperPass)

				INITIALIZE_PASS_END(LoopInterchange, "loop-interchange",
				"Interchanges loops for cache reuse", false, false)

				Pass *llvm::createLoopInterchangePass() { return new LoopInterchange(); }

lib/Transforms/Scalar/Scalar.cpp

Show First 20 Lines • Show All 42 Lines • ▼ Show 20 Lines	void llvm::initializeScalarOpts(PassRegistry &Registry) {
initializeFlattenCFGPassPass(Registry);		initializeFlattenCFGPassPass(Registry);
initializeInductiveRangeCheckEliminationPass(Registry);		initializeInductiveRangeCheckEliminationPass(Registry);
initializeIndVarSimplifyPass(Registry);		initializeIndVarSimplifyPass(Registry);
initializeJumpThreadingPass(Registry);		initializeJumpThreadingPass(Registry);
initializeLICMPass(Registry);		initializeLICMPass(Registry);
initializeLoopDeletionPass(Registry);		initializeLoopDeletionPass(Registry);
initializeLoopAccessAnalysisPass(Registry);		initializeLoopAccessAnalysisPass(Registry);
initializeLoopInstSimplifyPass(Registry);		initializeLoopInstSimplifyPass(Registry);
		initializeLoopInterchangePass(Registry);
initializeLoopRotatePass(Registry);		initializeLoopRotatePass(Registry);
initializeLoopStrengthReducePass(Registry);		initializeLoopStrengthReducePass(Registry);
initializeLoopRerollPass(Registry);		initializeLoopRerollPass(Registry);
initializeLoopUnrollPass(Registry);		initializeLoopUnrollPass(Registry);
initializeLoopUnswitchPass(Registry);		initializeLoopUnswitchPass(Registry);
initializeLoopIdiomRecognizePass(Registry);		initializeLoopIdiomRecognizePass(Registry);
initializeLowerAtomicPass(Registry);		initializeLowerAtomicPass(Registry);
initializeLowerExpectIntrinsicPass(Registry);		initializeLowerExpectIntrinsicPass(Registry);
▲ Show 20 Lines • Show All 179 Lines • Show Last 20 Lines

test/Transforms/LoopInterchange/currentLimitation.ll

				; RUN: opt < %s -basicaa -loop-interchange -S \| FileCheck %s
				;; These are test that fail to interchange due to current limitation. This will go off once we extend the loop interchange pass.

				target datalayout = "e-m:e-i64:64-f80:128-n8:16:32:64-S128"
				target triple = "x86_64-unknown-linux-gnu"

				@A = common global [100 x [100 x i32]] zeroinitializer
				@B = common global [100 x [100 x [100 x i32]]] zeroinitializer

				;;--------------------------------------Test case 01------------------------------------
				;; [FIXME] This loop though valid is currently not interchanged due to the limitation that we cannot split the inner loop latch due to multiple use of inner induction
				;; variable.(used to increment the loop counter and to access A[j+1][i+1]
				;; for(int i=0;i<N-1;i++)
				;; for(int j=1;j<N-1;j++)
				;; A[j+1][i+1] = A[j+1][i+1] + k;

				define void @interchange_01(i32 %k, i32 %N) {
				entry:
				%sub = add nsw i32 %N, -1
				%cmp26 = icmp sgt i32 %N, 1
				br i1 %cmp26, label %for.cond1.preheader.lr.ph, label %for.end17

				for.cond1.preheader.lr.ph:
				%cmp324 = icmp sgt i32 %sub, 1
				%0 = add i32 %N, -2
				%1 = sext i32 %sub to i64
				br label %for.cond1.preheader

				for.cond.loopexit:
				%cmp = icmp slt i64 %indvars.iv.next29, %1
				br i1 %cmp, label %for.cond1.preheader, label %for.end17

				for.cond1.preheader:
				%indvars.iv28 = phi i64 [ 0, %for.cond1.preheader.lr.ph ], [ %indvars.iv.next29, %for.cond.loopexit ]
				%indvars.iv.next29 = add nuw nsw i64 %indvars.iv28, 1
				br i1 %cmp324, label %for.body4, label %for.cond.loopexit

				for.body4:
				%indvars.iv = phi i64 [ %indvars.iv.next, %for.body4 ], [ 1, %for.cond1.preheader ]
				%indvars.iv.next = add nuw nsw i64 %indvars.iv, 1
				%arrayidx7 = getelementptr inbounds [100 x [100 x i32]], [100 x [100 x i32]]* @A, i64 0, i64 %indvars.iv.next, i64 %indvars.iv.next29
				%2 = load i32, i32* %arrayidx7
				%add8 = add nsw i32 %2, %k
				store i32 %add8, i32* %arrayidx7
				%lftr.wideiv = trunc i64 %indvars.iv to i32
				%exitcond = icmp eq i32 %lftr.wideiv, %0
				br i1 %exitcond, label %for.cond.loopexit, label %for.body4

				for.end17:
				ret void
				}
				;; Inner loop not split so it is not interchanged.
				; CHECK-LABEL: @interchange_01
				; CHECK: for.body4:
				; CHECK-NEXT: %indvars.iv = phi i64 [ %indvars.iv.next, %for.body4 ], [ 1, %for.body4.preheader ]
				; CHECK-NEXT: %indvars.iv.next = add nuw nsw i64 %indvars.iv, 1
				; CHECK-NEXT: %arrayidx7 = getelementptr inbounds [100 x [100 x i32]], [100 x [100 x i32]]* @A, i64 0, i64 %indvars.iv.next, i64 %indvars.iv.next29

test/Transforms/LoopInterchange/interchange.ll

				; RUN: opt < %s -basicaa -loop-interchange -S \| FileCheck %s
				;; We test the complete .ll for adjustment in outer loop header/latch and inner loop header/latch.

				target datalayout = "e-m:e-i64:64-f80:128-n8:16:32:64-S128"
				target triple = "x86_64-unknown-linux-gnu"

				@A = common global [100 x [100 x i32]] zeroinitializer
				@B = common global [100 x i32] zeroinitializer
				@C = common global [100 x [100 x i32]] zeroinitializer
				@D = common global [100 x [100 x [100 x i32]]] zeroinitializer

				declare void @foo(...)

				;;--------------------------------------Test case 01------------------------------------
				;; for(int i=0;i<N;i++)
				;; for(int j=1;j<N;j++)
				;; A[j][i] = A[j][i]+k;

				define void @interchange_01(i32 %k, i32 %N) {
				entry:
				%cmp21 = icmp sgt i32 %N, 0
				br i1 %cmp21, label %for.cond1.preheader.lr.ph, label %for.end12

				for.cond1.preheader.lr.ph:
				%cmp219 = icmp sgt i32 %N, 1
				%0 = add i32 %N, -1
				br label %for.cond1.preheader

				for.cond1.preheader:
				%indvars.iv23 = phi i64 [ 0, %for.cond1.preheader.lr.ph ], [ %indvars.iv.next24, %for.inc10 ]
				br i1 %cmp219, label %for.body3, label %for.inc10

				for.body3:
				%indvars.iv = phi i64 [ %indvars.iv.next, %for.body3 ], [ 1, %for.cond1.preheader ]
				%arrayidx5 = getelementptr inbounds [100 x [100 x i32]], [100 x [100 x i32]]* @A, i64 0, i64 %indvars.iv, i64 %indvars.iv23
				%1 = load i32, i32* %arrayidx5
				%add = add nsw i32 %1, %k
				store i32 %add, i32* %arrayidx5
				%indvars.iv.next = add nuw nsw i64 %indvars.iv, 1
				%lftr.wideiv = trunc i64 %indvars.iv to i32
				%exitcond = icmp eq i32 %lftr.wideiv, %0
				br i1 %exitcond, label %for.inc10, label %for.body3

				for.inc10:
				%indvars.iv.next24 = add nuw nsw i64 %indvars.iv23, 1
				%lftr.wideiv25 = trunc i64 %indvars.iv23 to i32
				%exitcond26 = icmp eq i32 %lftr.wideiv25, %0
				br i1 %exitcond26, label %for.end12, label %for.cond1.preheader

				for.end12:
				ret void
				}

				; CHECK-LABEL: @interchange_01
				; CHECK: entry:
				; CHECK: %cmp21 = icmp sgt i32 %N, 0
				; CHECK: br i1 %cmp21, label %for.body3.preheader, label %for.end12
				; CHECK: for.cond1.preheader.lr.ph:
				; CHECK: br label %for.cond1.preheader
				; CHECK: for.cond1.preheader:
				; CHECK: %indvars.iv23 = phi i64 [ 0, %for.cond1.preheader.lr.ph ], [ %indvars.iv.next24, %for.inc10 ]
				; CHECK: br i1 %cmp219, label %for.body3.split1, label %for.end12.loopexit
				; CHECK: for.body3.preheader:
				; CHECK: %cmp219 = icmp sgt i32 %N, 1
				; CHECK: %0 = add i32 %N, -1
				; CHECK: br label %for.body3
				; CHECK: for.body3:
				; CHECK: %indvars.iv = phi i64 [ %indvars.iv.next, %for.body3.split ], [ 1, %for.body3.preheader ]
				; CHECK: br label %for.cond1.preheader.lr.ph
				; CHECK: for.body3.split1:
				; CHECK: %arrayidx5 = getelementptr inbounds [100 x [100 x i32]], [100 x [100 x i32]]* @A, i64 0, i64 %indvars.iv, i64 %indvars.iv23
				; CHECK: %1 = load i32, i32* %arrayidx5
				; CHECK: %add = add nsw i32 %1, %k
				; CHECK: store i32 %add, i32* %arrayidx5
				; CHECK: br label %for.inc10.loopexit
				; CHECK: for.body3.split:
				; CHECK: %indvars.iv.next = add nuw nsw i64 %indvars.iv, 1
				; CHECK: %lftr.wideiv = trunc i64 %indvars.iv to i32
				; CHECK: %exitcond = icmp eq i32 %lftr.wideiv, %0
				; CHECK: br i1 %exitcond, label %for.end12.loopexit, label %for.body3
				; CHECK: for.inc10.loopexit:
				; CHECK: br label %for.inc10
				; CHECK: for.inc10:
				; CHECK: %indvars.iv.next24 = add nuw nsw i64 %indvars.iv23, 1
				; CHECK: %lftr.wideiv25 = trunc i64 %indvars.iv23 to i32
				; CHECK: %exitcond26 = icmp eq i32 %lftr.wideiv25, %0
				; CHECK: br i1 %exitcond26, label %for.body3.split, label %for.cond1.preheader
				; CHECK: for.end12.loopexit:
				; CHECK: br label %for.end12
				; CHECK: for.end12:
				; CHECK: ret void

				;;--------------------------------------Test case 02-------------------------------------

				;; for(int i=0;i<100;i++)
				;; for(int j=100;j>=0;j--)
				;; A[j][i] = A[j][i]+k;

				define void @interchange_02(i32 %k) {
				entry:
				br label %for.cond1.preheader

				for.cond1.preheader:
				%indvars.iv19 = phi i64 [ 0, %entry ], [ %indvars.iv.next20, %for.inc10 ]
				br label %for.body3

				for.body3:
				%indvars.iv = phi i64 [ 100, %for.cond1.preheader ], [ %indvars.iv.next, %for.body3 ]
				%arrayidx5 = getelementptr inbounds [100 x [100 x i32]], [100 x [100 x i32]]* @A, i64 0, i64 %indvars.iv, i64 %indvars.iv19
				%0 = load i32, i32* %arrayidx5
				%add = add nsw i32 %0, %k
				store i32 %add, i32* %arrayidx5
				%indvars.iv.next = add nsw i64 %indvars.iv, -1
				%cmp2 = icmp sgt i64 %indvars.iv, 0
				br i1 %cmp2, label %for.body3, label %for.inc10

				for.inc10:
				%indvars.iv.next20 = add nuw nsw i64 %indvars.iv19, 1
				%exitcond = icmp eq i64 %indvars.iv.next20, 100
				br i1 %exitcond, label %for.end11, label %for.cond1.preheader

				for.end11:
				ret void
				}

				; CHECK-LABEL: @interchange_02
				; CHECK: entry:
				; CHECK: br label %for.body3.preheader
				; CHECK: for.cond1.preheader.preheader:
				; CHECK: br label %for.cond1.preheader
				; CHECK: for.cond1.preheader:
				; CHECK: %indvars.iv19 = phi i64 [ %indvars.iv.next20, %for.inc10 ], [ 0, %for.cond1.preheader.preheader ]
				; CHECK: br label %for.body3.split1
				; CHECK: for.body3.preheader:
				; CHECK: br label %for.body3
				; CHECK: for.body3:
				; CHECK: %indvars.iv = phi i64 [ %indvars.iv.next, %for.body3.split ], [ 100, %for.body3.preheader ]
				; CHECK: br label %for.cond1.preheader.preheader
				; CHECK: for.body3.split1: ; preds = %for.cond1.preheader
				; CHECK: %arrayidx5 = getelementptr inbounds [100 x [100 x i32]], [100 x [100 x i32]]* @A, i64 0, i64 %indvars.iv, i64 %indvars.iv19
				; CHECK: %0 = load i32, i32* %arrayidx5
				; CHECK: %add = add nsw i32 %0, %k
				; CHECK: store i32 %add, i32* %arrayidx5
				; CHECK: br label %for.inc10
				; CHECK: for.body3.split:
				; CHECK: %indvars.iv.next = add nsw i64 %indvars.iv, -1
				; CHECK: %cmp2 = icmp sgt i64 %indvars.iv, 0
				; CHECK: br i1 %cmp2, label %for.body3, label %for.end11
				; CHECK: for.inc10:
				; CHECK: %indvars.iv.next20 = add nuw nsw i64 %indvars.iv19, 1
				; CHECK: %exitcond = icmp eq i64 %indvars.iv.next20, 100
				; CHECK: br i1 %exitcond, label %for.body3.split, label %for.cond1.preheader
				; CHECK: for.end11:
				; CHECK: ret void

				;;--------------------------------------Test case 03-------------------------------------
				;; Loops should not be interchanged in this case as it is not profitable.
				;; for(int i=0;i<100;i++)
				;; for(int j=0;j<100;j++)
				;; A[i][j] = A[i][j]+k;

				define void @interchange_03(i32 %k) {
				entry:
				br label %for.cond1.preheader

				for.cond1.preheader:
				%indvars.iv21 = phi i64 [ 0, %entry ], [ %indvars.iv.next22, %for.inc10 ]
				br label %for.body3

				for.body3:
				%indvars.iv = phi i64 [ 0, %for.cond1.preheader ], [ %indvars.iv.next, %for.body3 ]
				%arrayidx5 = getelementptr inbounds [100 x [100 x i32]], [100 x [100 x i32]]* @A, i64 0, i64 %indvars.iv21, i64 %indvars.iv
				%0 = load i32, i32* %arrayidx5
				%add = add nsw i32 %0, %k
				store i32 %add, i32* %arrayidx5
				%indvars.iv.next = add nuw nsw i64 %indvars.iv, 1
				%exitcond = icmp eq i64 %indvars.iv.next, 100
				br i1 %exitcond, label %for.inc10, label %for.body3

				for.inc10:
				%indvars.iv.next22 = add nuw nsw i64 %indvars.iv21, 1
				%exitcond23 = icmp eq i64 %indvars.iv.next22, 100
				br i1 %exitcond23, label %for.end12, label %for.cond1.preheader

				for.end12:
				ret void
				}

				; CHECK-LABEL: @interchange_03
				; CHECK: entry:
				; CHECK: br label %for.cond1.preheader.preheader
				; CHECK: for.cond1.preheader.preheader: ; preds = %entry
				; CHECK: br label %for.cond1.preheader
				; CHECK: for.cond1.preheader: ; preds = %for.cond1.preheader.preheader, %for.inc10
				; CHECK: %indvars.iv21 = phi i64 [ %indvars.iv.next22, %for.inc10 ], [ 0, %for.cond1.preheader.preheader ]
				; CHECK: br label %for.body3.preheader
				; CHECK: for.body3.preheader: ; preds = %for.cond1.preheader
				; CHECK: br label %for.body3
				; CHECK: for.body3: ; preds = %for.body3.preheader, %for.body3
				; CHECK: %indvars.iv = phi i64 [ %indvars.iv.next, %for.body3 ], [ 0, %for.body3.preheader ]
				; CHECK: %arrayidx5 = getelementptr inbounds [100 x [100 x i32]], [100 x [100 x i32]]* @A, i64 0, i64 %indvars.iv21, i64 %indvars.iv
				; CHECK: %0 = load i32, i32* %arrayidx5
				; CHECK: %add = add nsw i32 %0, %k
				; CHECK: store i32 %add, i32* %arrayidx5
				; CHECK: %indvars.iv.next = add nuw nsw i64 %indvars.iv, 1
				; CHECK: %exitcond = icmp eq i64 %indvars.iv.next, 100
				; CHECK: br i1 %exitcond, label %for.inc10, label %for.body3
				; CHECK: for.inc10: ; preds = %for.body3
				; CHECK: %indvars.iv.next22 = add nuw nsw i64 %indvars.iv21, 1
				; CHECK: %exitcond23 = icmp eq i64 %indvars.iv.next22, 100
				; CHECK: br i1 %exitcond23, label %for.end12, label %for.cond1.preheader
				; CHECK: for.end12: ; preds = %for.inc10
				; CHECK: ret void


				;;--------------------------------------Test case 04-------------------------------------
				;; Loops should not be interchanged in this case as it is not legal due to dependency.
				;; for(int j=0;j<99;j++)
				;; for(int i=0;i<99;i++)
				;; A[j][i+1] = A[j+1][i]+k;

				define void @interchange_04(i32 %k){
				entry:
				br label %for.cond1.preheader

				for.cond1.preheader:
				%indvars.iv23 = phi i64 [ 0, %entry ], [ %indvars.iv.next24, %for.inc12 ]
				%indvars.iv.next24 = add nuw nsw i64 %indvars.iv23, 1
				br label %for.body3

				for.body3:
				%indvars.iv = phi i64 [ 0, %for.cond1.preheader ], [ %indvars.iv.next, %for.body3 ]
				%arrayidx5 = getelementptr inbounds [100 x [100 x i32]], [100 x [100 x i32]]* @A, i64 0, i64 %indvars.iv.next24, i64 %indvars.iv
				%0 = load i32, i32* %arrayidx5
				%add6 = add nsw i32 %0, %k
				%indvars.iv.next = add nuw nsw i64 %indvars.iv, 1
				%arrayidx11 = getelementptr inbounds [100 x [100 x i32]], [100 x [100 x i32]]* @A, i64 0, i64 %indvars.iv23, i64 %indvars.iv.next
				store i32 %add6, i32* %arrayidx11
				%exitcond = icmp eq i64 %indvars.iv.next, 99
				br i1 %exitcond, label %for.inc12, label %for.body3

				for.inc12:
				%exitcond25 = icmp eq i64 %indvars.iv.next24, 99
				br i1 %exitcond25, label %for.end14, label %for.cond1.preheader

				for.end14:
				ret void
				}

				; CHECK-LABEL: @interchange_04
				; CHECK: entry:
				; CHECK: br label %for.cond1.preheader
				; CHECK: for.cond1.preheader: ; preds = %for.inc12, %entry
				; CHECK: %indvars.iv23 = phi i64 [ 0, %entry ], [ %indvars.iv.next24, %for.inc12 ]
				; CHECK: %indvars.iv.next24 = add nuw nsw i64 %indvars.iv23, 1
				; CHECK: br label %for.body3
				; CHECK: for.body3: ; preds = %for.body3, %for.cond1.preheader
				; CHECK: %indvars.iv = phi i64 [ 0, %for.cond1.preheader ], [ %indvars.iv.next, %for.body3 ]
				; CHECK: %arrayidx5 = getelementptr inbounds [100 x [100 x i32]], [100 x [100 x i32]]* @A, i64 0, i64 %indvars.iv.next24, i64 %indvars.iv
				; CHECK: %0 = load i32, i32* %arrayidx5
				; CHECK: %add6 = add nsw i32 %0, %k
				; CHECK: %indvars.iv.next = add nuw nsw i64 %indvars.iv, 1
				; CHECK: %arrayidx11 = getelementptr inbounds [100 x [100 x i32]], [100 x [100 x i32]]* @A, i64 0, i64 %indvars.iv23, i64 %indvars.iv.next
				; CHECK: store i32 %add6, i32* %arrayidx11
				; CHECK: %exitcond = icmp eq i64 %indvars.iv.next, 99
				; CHECK: br i1 %exitcond, label %for.inc12, label %for.body3
				; CHECK: for.inc12: ; preds = %for.body3
				; CHECK: %exitcond25 = icmp eq i64 %indvars.iv.next24, 99
				; CHECK: br i1 %exitcond25, label %for.end14, label %for.cond1.preheader
				; CHECK: for.end14: ; preds = %for.inc12
				; CHECK: ret void



				;;--------------------------------------Test case 05-------------------------------------
				;; Loops not tightly nested are not interchanged
				;; for(int j=0;j<N;j++) {
				;; B[j] = j+k;
				;; for(int i=0;i<N;i++)
				;; A[j][i] = A[j][i]+B[j];
				;; }

				define void @interchange_05(i32 %k, i32 %N){
				entry:
				%cmp30 = icmp sgt i32 %N, 0
				br i1 %cmp30, label %for.body.lr.ph, label %for.end17

				for.body.lr.ph:
				%0 = add i32 %N, -1
				%1 = zext i32 %k to i64
				br label %for.body

				for.body:
				%indvars.iv32 = phi i64 [ 0, %for.body.lr.ph ], [ %indvars.iv.next33, %for.inc15 ]
				%2 = add nsw i64 %indvars.iv32, %1
				%arrayidx = getelementptr inbounds [100 x i32], [100 x i32]* @B, i64 0, i64 %indvars.iv32
				%3 = trunc i64 %2 to i32
				store i32 %3, i32* %arrayidx
				br label %for.body3

				for.body3:
				%indvars.iv = phi i64 [ 0, %for.body ], [ %indvars.iv.next, %for.body3 ]
				%arrayidx7 = getelementptr inbounds [100 x [100 x i32]], [100 x [100 x i32]]* @A, i64 0, i64 %indvars.iv32, i64 %indvars.iv
				%4 = load i32, i32* %arrayidx7
				%add10 = add nsw i32 %3, %4
				store i32 %add10, i32* %arrayidx7
				%indvars.iv.next = add nuw nsw i64 %indvars.iv, 1
				%lftr.wideiv = trunc i64 %indvars.iv to i32
				%exitcond = icmp eq i32 %lftr.wideiv, %0
				br i1 %exitcond, label %for.inc15, label %for.body3

				for.inc15:
				%indvars.iv.next33 = add nuw nsw i64 %indvars.iv32, 1
				%lftr.wideiv35 = trunc i64 %indvars.iv32 to i32
				%exitcond36 = icmp eq i32 %lftr.wideiv35, %0
				br i1 %exitcond36, label %for.end17, label %for.body

				for.end17:
				ret void
				}

				; CHECK-LABEL: @interchange_05
				; CHECK: entry:
				; CHECK: %cmp30 = icmp sgt i32 %N, 0
				; CHECK: br i1 %cmp30, label %for.body.lr.ph, label %for.end17
				; CHECK: for.body.lr.ph:
				; CHECK: %0 = add i32 %N, -1
				; CHECK: %1 = zext i32 %k to i64
				; CHECK: br label %for.body
				; CHECK: for.body:
				; CHECK: %indvars.iv32 = phi i64 [ 0, %for.body.lr.ph ], [ %indvars.iv.next33, %for.inc15 ]
				; CHECK: %2 = add nsw i64 %indvars.iv32, %1
				; CHECK: %arrayidx = getelementptr inbounds [100 x i32], [100 x i32]* @B, i64 0, i64 %indvars.iv32
				; CHECK: %3 = trunc i64 %2 to i32
				; CHECK: store i32 %3, i32* %arrayidx
				; CHECK: br label %for.body3.preheader
				; CHECK: for.body3.preheader:
				; CHECK: br label %for.body3
				; CHECK: for.body3:
				; CHECK: %indvars.iv = phi i64 [ %indvars.iv.next, %for.body3 ], [ 0, %for.body3.preheader ]
				; CHECK: %arrayidx7 = getelementptr inbounds [100 x [100 x i32]], [100 x [100 x i32]]* @A, i64 0, i64 %indvars.iv32, i64 %indvars.iv
				; CHECK: %4 = load i32, i32* %arrayidx7
				; CHECK: %add10 = add nsw i32 %3, %4
				; CHECK: store i32 %add10, i32* %arrayidx7
				; CHECK: %indvars.iv.next = add nuw nsw i64 %indvars.iv, 1
				; CHECK: %lftr.wideiv = trunc i64 %indvars.iv to i32
				; CHECK: %exitcond = icmp eq i32 %lftr.wideiv, %0
				; CHECK: br i1 %exitcond, label %for.inc15, label %for.body3
				; CHECK: for.inc15:
				; CHECK: %indvars.iv.next33 = add nuw nsw i64 %indvars.iv32, 1
				; CHECK: %lftr.wideiv35 = trunc i64 %indvars.iv32 to i32
				; CHECK: %exitcond36 = icmp eq i32 %lftr.wideiv35, %0
				; CHECK: br i1 %exitcond36, label %for.end17.loopexit, label %for.body
				; CHECK: for.end17.loopexit:
				; CHECK: br label %for.end17
				; CHECK: for.end17:
				; CHECK: ret void


				;;--------------------------------------Test case 06-------------------------------------
				;; Loops not tightly nested are not interchanged
				;; for(int j=0;j<N;j++) {
				;; foo();
				;; for(int i=2;i<N;i++)
				;; A[j][i] = A[j][i]+k;
				;; }

				define void @interchange_06(i32 %k, i32 %N) {
				entry:
				%cmp22 = icmp sgt i32 %N, 0
				br i1 %cmp22, label %for.body.lr.ph, label %for.end12

				for.body.lr.ph:
				%0 = add i32 %N, -1
				br label %for.body

				for.body:
				%indvars.iv24 = phi i64 [ 0, %for.body.lr.ph ], [ %indvars.iv.next25, %for.inc10 ]
				tail call void (...)* @foo()
				br label %for.body3

				for.body3:
				%indvars.iv = phi i64 [ %indvars.iv.next, %for.body3 ], [ 2, %for.body ]
				%arrayidx5 = getelementptr inbounds [100 x [100 x i32]], [100 x [100 x i32]]* @A, i64 0, i64 %indvars.iv24, i64 %indvars.iv
				%1 = load i32, i32* %arrayidx5
				%add = add nsw i32 %1, %k
				store i32 %add, i32* %arrayidx5
				%indvars.iv.next = add nuw nsw i64 %indvars.iv, 1
				%lftr.wideiv = trunc i64 %indvars.iv to i32
				%exitcond = icmp eq i32 %lftr.wideiv, %0
				br i1 %exitcond, label %for.inc10, label %for.body3

				for.inc10:
				%indvars.iv.next25 = add nuw nsw i64 %indvars.iv24, 1
				%lftr.wideiv26 = trunc i64 %indvars.iv24 to i32
				%exitcond27 = icmp eq i32 %lftr.wideiv26, %0
				br i1 %exitcond27, label %for.end12, label %for.body

				for.end12:
				ret void
				}
				;; Here we are checking if the inner phi is not split then we have not interchanged.
				; CHECK-LABEL: @interchange_06
				; CHECK: phi i64 [ %indvars.iv.next, %for.body3 ], [ 2, %for.body3.preheader ]
				; CHECK-NEXT: getelementptr
				; CHECK-NEXT: %1 = load

				;;--------------------------------------Test case 07-------------------------------------
				;; FIXME:
				;; Test for interchange when we have an lcssa phi. This should ideally be interchanged but it is currently not supported.
				;; for(gi=1;gi<N;gi++)
				;; for(gj=1;gj<M;gj++)
				;; A[gj][gi] = A[gj - 1][gi] + C[gj][gi];

				@gi = common global i32 0
				@gj = common global i32 0

				define void @interchange_07(i32 %N, i32 %M){
				entry:
				store i32 1, i32* @gi
				%cmp21 = icmp sgt i32 %N, 1
				br i1 %cmp21, label %for.cond1.preheader.lr.ph, label %for.end16

				for.cond1.preheader.lr.ph:
				%cmp218 = icmp sgt i32 %M, 1
				%gi.promoted = load i32, i32* @gi
				%0 = add i32 %M, -1
				%1 = sext i32 %gi.promoted to i64
				%2 = sext i32 %N to i64
				%3 = add i32 %gi.promoted, 1
				%4 = icmp slt i32 %3, %N
				%smax = select i1 %4, i32 %N, i32 %3
				br label %for.cond1.preheader

				for.cond1.preheader:
				%indvars.iv25 = phi i64 [ %1, %for.cond1.preheader.lr.ph ], [ %indvars.iv.next26, %for.inc14 ]
				br i1 %cmp218, label %for.body3, label %for.inc14

				for.body3:
				%indvars.iv = phi i64 [ %indvars.iv.next, %for.body3 ], [ 1, %for.cond1.preheader ]
				%5 = add nsw i64 %indvars.iv, -1
				%arrayidx5 = getelementptr inbounds [100 x [100 x i32]], [100 x [100 x i32]]* @A, i64 0, i64 %5, i64 %indvars.iv25
				%6 = load i32, i32* %arrayidx5
				%arrayidx9 = getelementptr inbounds [100 x [100 x i32]], [100 x [100 x i32]]* @C, i64 0, i64 %indvars.iv, i64 %indvars.iv25
				%7 = load i32, i32* %arrayidx9
				%add = add nsw i32 %7, %6
				%arrayidx13 = getelementptr inbounds [100 x [100 x i32]], [100 x [100 x i32]]* @A, i64 0, i64 %indvars.iv, i64 %indvars.iv25
				store i32 %add, i32* %arrayidx13
				%indvars.iv.next = add nuw nsw i64 %indvars.iv, 1
				%lftr.wideiv = trunc i64 %indvars.iv to i32
				%exitcond = icmp eq i32 %lftr.wideiv, %0
				br i1 %exitcond, label %for.inc14, label %for.body3

				for.inc14:
				%inc.lcssa23 = phi i32 [ 1, %for.cond1.preheader ], [ %M, %for.body3 ]
				%indvars.iv.next26 = add nsw i64 %indvars.iv25, 1
				%cmp = icmp slt i64 %indvars.iv.next26, %2
				br i1 %cmp, label %for.cond1.preheader, label %for.cond.for.end16_crit_edge

				for.cond.for.end16_crit_edge:
				store i32 %inc.lcssa23, i32* @gj
				store i32 %smax, i32* @gi
				br label %for.end16

				for.end16:
				ret void
				}

				; CHECK-LABEL: @interchange_07
				; CHECK: for.body3: ; preds = %for.body3.preheader, %for.body3
				; CHECK: %indvars.iv = phi i64 [ %indvars.iv.next, %for.body3 ], [ 1, %for.body3.preheader ]
				; CHECK: %5 = add nsw i64 %indvars.iv, -1
				; CHECK: %arrayidx5 = getelementptr inbounds [100 x [100 x i32]], [100 x [100 x i32]]* @A, i64 0, i64 %5, i64 %indvars.iv25
				; CHECK: %6 = load i32, i32* %arrayidx5
				; CHECK: %arrayidx9 = getelementptr inbounds [100 x [100 x i32]], [100 x [100 x i32]]* @C, i64 0, i64 %indvars.iv, i64 %indvars.iv25

				;;------------------------------------------------Test case 08-------------------------------
				;; Test for interchange in loop nest greater than 2.
				;; for(int i=0;i<100;i++)
				;; for(int j=0;j<100;j++)
				;; for(int k=0;k<100;k++)
				;; D[i][k][j] = D[i][k][j]+t;

				define void @interchange_08(i32 %t){
				entry:
				br label %for.cond1.preheader

				for.cond1.preheader: ; preds = %for.inc15, %entry
				%i.028 = phi i32 [ 0, %entry ], [ %inc16, %for.inc15 ]
				br label %for.cond4.preheader

				for.cond4.preheader: ; preds = %for.inc12, %for.cond1.preheader
				%j.027 = phi i32 [ 0, %for.cond1.preheader ], [ %inc13, %for.inc12 ]
				br label %for.body6

				for.body6: ; preds = %for.body6, %for.cond4.preheader
				%k.026 = phi i32 [ 0, %for.cond4.preheader ], [ %inc, %for.body6 ]
				%arrayidx8 = getelementptr inbounds [100 x [100 x [100 x i32]]], [100 x [100 x [100 x i32]]]* @D, i32 0, i32 %i.028, i32 %k.026, i32 %j.027
				%0 = load i32, i32* %arrayidx8
				%add = add nsw i32 %0, %t
				store i32 %add, i32* %arrayidx8
				%inc = add nuw nsw i32 %k.026, 1
				%exitcond = icmp eq i32 %inc, 100
				br i1 %exitcond, label %for.inc12, label %for.body6

				for.inc12: ; preds = %for.body6
				%inc13 = add nuw nsw i32 %j.027, 1
				%exitcond29 = icmp eq i32 %inc13, 100
				br i1 %exitcond29, label %for.inc15, label %for.cond4.preheader

				for.inc15: ; preds = %for.inc12
				%inc16 = add nuw nsw i32 %i.028, 1
				%exitcond30 = icmp eq i32 %inc16, 100
				br i1 %exitcond30, label %for.end17, label %for.cond1.preheader

				for.end17: ; preds = %for.inc15
				ret void
				}
				; CHECK-LABEL: @interchange_08
				; CHECK: entry:
				; CHECK: br label %for.cond1.preheader.preheader
				; CHECK: for.cond1.preheader.preheader: ; preds = %entry
				; CHECK: br label %for.cond1.preheader
				; CHECK: for.cond1.preheader: ; preds = %for.cond1.preheader.preheader, %for.inc15
				; CHECK: %i.028 = phi i32 [ %inc16, %for.inc15 ], [ 0, %for.cond1.preheader.preheader ]
				; CHECK: br label %for.body6.preheader
				; CHECK: for.cond4.preheader.preheader: ; preds = %for.body6
				; CHECK: br label %for.cond4.preheader
				; CHECK: for.cond4.preheader: ; preds = %for.cond4.preheader.preheader, %for.inc12
				; CHECK: %j.027 = phi i32 [ %inc13, %for.inc12 ], [ 0, %for.cond4.preheader.preheader ]
				; CHECK: br label %for.body6.split1
				; CHECK: for.body6.preheader: ; preds = %for.cond1.preheader
				; CHECK: br label %for.body6
				; CHECK: for.body6: ; preds = %for.body6.preheader, %for.body6.split
				; CHECK: %k.026 = phi i32 [ %inc, %for.body6.split ], [ 0, %for.body6.preheader ]
				; CHECK: br label %for.cond4.preheader.preheader
				; CHECK: for.body6.split1: ; preds = %for.cond4.preheader
				; CHECK: %arrayidx8 = getelementptr inbounds [100 x [100 x [100 x i32]]], [100 x [100 x [100 x i32]]]* @D, i32 0, i32 %i.028, i32 %k.026, i32 %j.027
				; CHECK: %0 = load i32, i32* %arrayidx8
				; CHECK: %add = add nsw i32 %0, %t
				; CHECK: store i32 %add, i32* %arrayidx8
				; CHECK: br label %for.inc12
				; CHECK: for.body6.split: ; preds = %for.inc12
				; CHECK: %inc = add nuw nsw i32 %k.026, 1
				; CHECK: %exitcond = icmp eq i32 %inc, 100
				; CHECK: br i1 %exitcond, label %for.inc15, label %for.body6
				; CHECK: for.inc12: ; preds = %for.body6.split1
				; CHECK: %inc13 = add nuw nsw i32 %j.027, 1
				; CHECK: %exitcond29 = icmp eq i32 %inc13, 100
				; CHECK: br i1 %exitcond29, label %for.body6.split, label %for.cond4.preheader
				; CHECK: for.inc15: ; preds = %for.body6.split
				; CHECK: %inc16 = add nuw nsw i32 %i.028, 1
				; CHECK: %exitcond30 = icmp eq i32 %inc16, 100
				; CHECK: br i1 %exitcond30, label %for.end17, label %for.cond1.preheader
				; CHECK: for.end17: ; preds = %for.inc15
				; CHECK: ret void

test/Transforms/LoopInterchange/profitability.ll

				; RUN: opt < %s -basicaa -loop-interchange -S \| FileCheck %s
				;; We test profitability model in these test cases.

				target datalayout = "e-m:e-i64:64-f80:128-n8:16:32:64-S128"
				target triple = "x86_64-unknown-linux-gnu"

				@A = common global [100 x [100 x i32]] zeroinitializer
				@B = common global [100 x [100 x i32]] zeroinitializer

				;;---------------------------------------Test case 01---------------------------------
				;; Loops interchange will result in code vectorization and hence profitable. Check for interchange.
				;; for(int i=1;i<N;i++)
				;; for(int j=1;j<N;j++)
				;; A[j][i] = A[j - 1][i] + B[j][i];

				define void @interchange_01(i32 %N) {
				entry:
				%cmp27 = icmp sgt i32 %N, 1
				br i1 %cmp27, label %for.cond1.preheader.lr.ph, label %for.end16

				for.cond1.preheader.lr.ph:
				%0 = add i32 %N, -1
				br label %for.body3.preheader

				for.body3.preheader:
				%indvars.iv30 = phi i64 [ 1, %for.cond1.preheader.lr.ph ], [ %indvars.iv.next31, %for.inc14 ]
				br label %for.body3

				for.body3:
				%indvars.iv = phi i64 [ %indvars.iv.next, %for.body3 ], [ 1, %for.body3.preheader ]
				%1 = add nsw i64 %indvars.iv, -1
				%arrayidx5 = getelementptr inbounds [100 x [100 x i32]], [100 x [100 x i32]]* @A, i64 0, i64 %1, i64 %indvars.iv30
				%2 = load i32, i32* %arrayidx5
				%arrayidx9 = getelementptr inbounds [100 x [100 x i32]], [100 x [100 x i32]]* @B, i64 0, i64 %indvars.iv, i64 %indvars.iv30
				%3 = load i32, i32* %arrayidx9
				%add = add nsw i32 %3, %2
				%arrayidx13 = getelementptr inbounds [100 x [100 x i32]], [100 x [100 x i32]]* @A, i64 0, i64 %indvars.iv, i64 %indvars.iv30
				store i32 %add, i32* %arrayidx13
				%indvars.iv.next = add nuw nsw i64 %indvars.iv, 1
				%lftr.wideiv = trunc i64 %indvars.iv to i32
				%exitcond = icmp eq i32 %lftr.wideiv, %0
				br i1 %exitcond, label %for.inc14, label %for.body3

				for.inc14:
				%indvars.iv.next31 = add nuw nsw i64 %indvars.iv30, 1
				%lftr.wideiv32 = trunc i64 %indvars.iv30 to i32
				%exitcond33 = icmp eq i32 %lftr.wideiv32, %0
				br i1 %exitcond33, label %for.end16, label %for.body3.preheader

				for.end16:
				ret void
				}
				;; Here we are checking partial .ll to check if loop are interchanged.
				; CHECK-LABEL: @interchange_01
				; CHECK: for.body3.preheader: ; preds = %for.inc14, %for.cond1.preheader.lr.ph
				; CHECK: %indvars.iv30 = phi i64 [ 1, %for.cond1.preheader.lr.ph ], [ %indvars.iv.next31, %for.inc14 ]
				; CHECK: br label %for.body3.split2

				; CHECK: for.body3.preheader1: ; preds = %entry
				; CHECK: br label %for.body3

				; CHECK: for.body3: ; preds = %for.body3.preheader1, %for.body3.split
				; CHECK: %indvars.iv = phi i64 [ %indvars.iv.next, %for.body3.split ], [ 1, %for.body3.preheader1 ]
				; CHECK: br label %for.cond1.preheader.lr.ph

				; CHECK: for.body3.split2: ; preds = %for.body3.preheader
				; CHECK: %1 = add nsw i64 %indvars.iv, -1
				; CHECK: %arrayidx5 = getelementptr inbounds [100 x [100 x i32]], [100 x [100 x i32]]* @A, i64 0, i64 %1, i64 %indvars.iv30
				; CHECK: %2 = load i32, i32* %arrayidx5
				; CHECK: %arrayidx9 = getelementptr inbounds [100 x [100 x i32]], [100 x [100 x i32]]* @B, i64 0, i64 %indvars.iv, i64 %indvars.iv30
				; CHECK: %3 = load i32, i32* %arrayidx9
				; CHECK: %add = add nsw i32 %3, %2
				; CHECK: %arrayidx13 = getelementptr inbounds [100 x [100 x i32]], [100 x [100 x i32]]* @A, i64 0, i64 %indvars.iv, i64 %indvars.iv30
				; CHECK: store i32 %add, i32* %arrayidx13
				; CHECK: br label %for.inc14


				;; ---------------------------------------Test case 02---------------------------------
				;; Check loop interchange profitability model.
				;; This tests profitability model when operands of getelementpointer and not exactly the induction variable but some
				;; arithmetic operation on them.
				;; for(int i=1;i<N;i++)
				;; for(int j=1;j<N;j++)
				;; A[j-1][i-1] = A[j - 1][i-1] + B[j-1][i-1];

				define void @interchange_02(i32 %N) {
				entry:
				%cmp32 = icmp sgt i32 %N, 1
				br i1 %cmp32, label %for.cond1.preheader.lr.ph, label %for.end21

				for.cond1.preheader.lr.ph:
				%0 = add i32 %N, -1
				br label %for.body3.lr.ph

				for.body3.lr.ph:
				%indvars.iv35 = phi i64 [ 1, %for.cond1.preheader.lr.ph ], [ %indvars.iv.next36, %for.inc19 ]
				%1 = add nsw i64 %indvars.iv35, -1
				br label %for.body3

				for.body3:
				%indvars.iv = phi i64 [ 1, %for.body3.lr.ph ], [ %indvars.iv.next, %for.body3 ]
				%2 = add nsw i64 %indvars.iv, -1
				%arrayidx6 = getelementptr inbounds [100 x [100 x i32]], [100 x [100 x i32]]* @A, i64 0, i64 %2, i64 %1
				%3 = load i32, i32* %arrayidx6
				%arrayidx12 = getelementptr inbounds [100 x [100 x i32]], [100 x [100 x i32]]* @B, i64 0, i64 %2, i64 %1
				%4 = load i32, i32* %arrayidx12
				%add = add nsw i32 %4, %3
				store i32 %add, i32* %arrayidx6
				%indvars.iv.next = add nuw nsw i64 %indvars.iv, 1
				%lftr.wideiv = trunc i64 %indvars.iv to i32
				%exitcond = icmp eq i32 %lftr.wideiv, %0
				br i1 %exitcond, label %for.inc19, label %for.body3

				for.inc19:
				%indvars.iv.next36 = add nuw nsw i64 %indvars.iv35, 1
				%lftr.wideiv38 = trunc i64 %indvars.iv35 to i32
				%exitcond39 = icmp eq i32 %lftr.wideiv38, %0
				br i1 %exitcond39, label %for.end21, label %for.body3.lr.ph

				for.end21:
				ret void
				}
				; CHECK-LABEL: @interchange_02
				; CHECK: for.body3.lr.ph: ; preds = %for.inc19, %for.cond1.preheader.lr.ph
				; CHECK: %indvars.iv35 = phi i64 [ 1, %for.cond1.preheader.lr.ph ], [ %indvars.iv.next36, %for.inc19 ]
				; CHECK: %0 = add nsw i64 %indvars.iv35, -1
				; CHECK: br label %for.body3.split1

				; CHECK: for.body3.preheader: ; preds = %entry
				; CHECK: %1 = add i32 %N, -1
				; CHECK: br label %for.body3

				; CHECK: for.body3: ; preds = %for.body3.preheader, %for.body3.split
				; CHECK: %indvars.iv = phi i64 [ %indvars.iv.next, %for.body3.split ], [ 1, %for.body3.preheader ]
				; CHECK: br label %for.cond1.preheader.lr.ph

				; CHECK: for.body3.split1: ; preds = %for.body3.lr.ph
				; CHECK: %2 = add nsw i64 %indvars.iv, -1
				; CHECK: %arrayidx6 = getelementptr inbounds [100 x [100 x i32]], [100 x [100 x i32]]* @A, i64 0, i64 %2, i64 %0
				; CHECK: %3 = load i32, i32* %arrayidx6
				; CHECK: %arrayidx12 = getelementptr inbounds [100 x [100 x i32]], [100 x [100 x i32]]* @B, i64 0, i64 %2, i64 %0
				; CHECK: %4 = load i32, i32* %arrayidx12
				; CHECK: %add = add nsw i32 %4, %3
				; CHECK: store i32 %add, i32* %arrayidx6
				; CHECK: br label %for.inc19


				;;---------------------------------------Test case 03---------------------------------
				;; Loops interchange is not profitable.
				;; for(int i=1;i<N;i++)
				;; for(int j=1;j<N;j++)
				;; A[i-1][j-1] = A[i - 1][j-1] + B[i][j];

				define void @interchange_03(i32 %N){
				entry:
				%cmp31 = icmp sgt i32 %N, 1
				br i1 %cmp31, label %for.cond1.preheader.lr.ph, label %for.end19

				for.cond1.preheader.lr.ph:
				%0 = add i32 %N, -1
				br label %for.body3.lr.ph

				for.body3.lr.ph:
				%indvars.iv34 = phi i64 [ 1, %for.cond1.preheader.lr.ph ], [ %indvars.iv.next35, %for.inc17 ]
				%1 = add nsw i64 %indvars.iv34, -1
				br label %for.body3

				for.body3:
				%indvars.iv = phi i64 [ 1, %for.body3.lr.ph ], [ %indvars.iv.next, %for.body3 ]
				%2 = add nsw i64 %indvars.iv, -1
				%arrayidx6 = getelementptr inbounds [100 x [100 x i32]], [100 x [100 x i32]]* @A, i64 0, i64 %1, i64 %2
				%3 = load i32, i32* %arrayidx6
				%arrayidx10 = getelementptr inbounds [100 x [100 x i32]], [100 x [100 x i32]]* @B, i64 0, i64 %indvars.iv34, i64 %indvars.iv
				%4 = load i32, i32* %arrayidx10
				%add = add nsw i32 %4, %3
				store i32 %add, i32* %arrayidx6
				%indvars.iv.next = add nuw nsw i64 %indvars.iv, 1
				%lftr.wideiv = trunc i64 %indvars.iv to i32
				%exitcond = icmp eq i32 %lftr.wideiv, %0
				br i1 %exitcond, label %for.inc17, label %for.body3

				for.inc17:
				%indvars.iv.next35 = add nuw nsw i64 %indvars.iv34, 1
				%lftr.wideiv37 = trunc i64 %indvars.iv34 to i32
				%exitcond38 = icmp eq i32 %lftr.wideiv37, %0
				br i1 %exitcond38, label %for.end19, label %for.body3.lr.ph

				for.end19:
				ret void
				}

				; CHECK-LABEL: @interchange_03
				; CHECK: for.body3.lr.ph:
				; CHECK: %indvars.iv34 = phi i64 [ 1, %for.cond1.preheader.lr.ph ], [ %indvars.iv.next35, %for.inc17 ]
				; CHECK: %1 = add nsw i64 %indvars.iv34, -1
				; CHECK: br label %for.body3.preheader
				; CHECK: for.body3.preheader:
				; CHECK: br label %for.body3
				; CHECK: for.body3:
				; CHECK: %indvars.iv = phi i64 [ %indvars.iv.next, %for.body3 ], [ 1, %for.body3.preheader ]
				; CHECK: %2 = add nsw i64 %indvars.iv, -1
				; CHECK: %arrayidx6 = getelementptr inbounds [100 x [100 x i32]], [100 x [100 x i32]]* @A, i64 0, i64 %1, i64 %2
				; CHECK: %3 = load i32, i32* %arrayidx6
				; CHECK: %arrayidx10 = getelementptr inbounds [100 x [100 x i32]], [100 x [100 x i32]]* @B, i64 0, i64 %indvars.iv34, i64 %indvars.iv
				; CHECK: %4 = load i32, i32* %arrayidx10

This is an archive of the discontinued LLVM Phabricator instance.

[Patch] Loop Interchange PassClosedPublic

Details

Diff Detail

Event Timeline

Revision Contents

Diff 21271

include/llvm/InitializePasses.h

include/llvm/LinkAllPasses.h

include/llvm/Transforms/Scalar.h

lib/Analysis/DependenceAnalysis.cpp

lib/Transforms/IPO/PassManagerBuilder.cpp

lib/Transforms/Scalar/CMakeLists.txt

lib/Transforms/Scalar/LoopInterchange.cpp

lib/Transforms/Scalar/Scalar.cpp

test/Transforms/LoopInterchange/currentLimitation.ll

test/Transforms/LoopInterchange/interchange.ll

test/Transforms/LoopInterchange/profitability.ll

[Patch] Loop Interchange Pass
ClosedPublic