This is an archive of the discontinued LLVM Phabricator instance.

[Patch] Loop Interchange Pass
ClosedPublic

Authored by karthikthecool on Feb 9 2015, 5:24 AM.

Download Raw Diff

Details

Reviewers

jmolloy
hfinkel
pekka.jaaskelainen

Summary

Hi All,
Please find attached the patch for Loop Interchange Pass for llvm. Initial RFC and design was submitted at http://reviews.llvm.org/D7432 .
This pass is disabled by default.

To give a brief intorduction it consists of 3 stages-

LoopInterchangeLegality : Checks the legality of loop interchange based on distance/direction vector.
LoopInterchangeProfitability: A very basic heuristic has been added to check for profitibility. This will evolve over time.
LoopInterchangeTransform : Which does the actual transform.

Current Limitation:

Only handles leve 2 loops for now. Will extend it going forward to support any level of loops as James had suggested during RFC.
Triangular loops are not yet supported.

As Hal had suggested during RFC i went through TSVC Benchmark. Unfortunetly i didnt get time to run it but i went through the test case for loop interchange. One of the test cases s231() which was not being vectorized previously now gets vectorized. Added a similar test case in this patch.

This patch seems to be working fine and producing correct result (i.e. interchanging doesn't change the o/p of the program) to best of my knowledge.

Wanted some comments on how to go about writing test cases for this transform? Please let me know your inputs of this.
Also is it ok to do further development on trunk once this patch is finalized?

Thanks and Regards
Karthik Bhat

Diff Detail

Event Timeline

karthikthecool retitled this revision from to [Patch] Loop Interchange Pass.Feb 9 2015, 5:24 AM

karthikthecool updated this object.

karthikthecool edited the test plan for this revision. (Show Details)

karthikthecool added reviewers: hfinkel, jmolloy, pekka.jaaskelainen.

karthikthecool updated this revision to Diff 19574.Feb 9 2015, 5:24 AM

karthikthecool set the repository for this revision to rL LLVM.

karthikthecool added a subscriber: Unknown Object (MLST).

Please adjust all of the variable names to start with a capital letter, see: http://llvm.org/docs/CodingStandards.html#name-types-functions-variables-and-enumerators-properly

As Hal had suggested during RFC i went through TSVC Benchmark. Unfortunetly i didnt get time to run it but i went through the test case for loop interchange. One of the test cases s231() which was not being vectorized previously now gets vectorized. Added a similar test case in this patch.

Great, thanks!

Also is it ok to do further development on trunk once this patch is finalized?

Yes, once this functionality is finalized, we'll move further development to trunk.

include/llvm/Transforms/Scalar.h
143	How about, "This pass interchanges loops to provide a more cache-friendly memory access patterns."
lib/Transforms/Scalar/LoopInterchange.cpp
50	Please add a comment explaining what this function computes?
55	Did you mean += ?
62	LoopInfo already has a getLoopDepth() function. Can you use that?
76	This is not actually what you want. If the loop is branched to by, for example, multiple entries of the switch statement, the predecessor can be listed multiple times in the predecessor list (and, thus, you'll have more than two incoming values even though you have only 2 predecessor blocks). I suspect that what you actually want is that there is a unique latch and a unique predecessor, so you want that L->getLoopLatch() && L->getLoopPredecessor() [neither are nullptr].
82	Do you also need to check that AddRec->isAffine()?
87	Let's say: // FIXME: Handle loops with more than one induction variable. Note that, currently, legality makes sure we have only one induction variable.
213	I'd move this FIXME comment somewhere else, it is not particularly useful here. It is more useful to tag places that assume only two levels.
262	Either you should handle the case where the dyn_cast fails, or if it can't fail (because we've already verified that this must be a BranchInst), then use cast<> instead. This same comment applies to many places below as well. Only use dyn_cast if the cast can fail (in which case you should handle the nullptr case). Otherwise, use cast<>.
274	Space before 'Any'
276	licm -> LICM
298	What are you actually trying to check here? Instructions with side effects? Maybe you want I->mayHaveSideEffects()?
310	These checks look identical to those above, please make a function (a lambda function is fine).
364	Can you include Src and Des in these debug messages so that we can see what instructions are relevant?
416	How are you checking for reductions here? Do you need to check that the one PHI you've found is not used outside of the loop?
447	Why are you only counting uses in the latch block? Should the increment be in some other block, then what?
460	Let's say, "Inner or outer loops lack a preheader"? Also, for the future, adding a preheader when one is not present is pretty easy (you just need to call InsertPreheaderForLoop from llvm/Transforms/Utils/LoopUtils.h), we this is a limitation that should be removed sooner rather than later (although after the initial commit is okay).
501	if (GetElementPtrInst *GEP = dyn_cast<GetElementPtrInst>(UseInstr)) { ... }
503	What happens if it is not the IV directly, but some expression of the IV? I think you'd be better off using ScalarEvolution here, get the AddRec of the GEP, and see if the "outer" AddRec is provided in terms of the SCEV of the IV (or something like that).
551	Use SplitBlock from include/llvm/Transforms/Utils/BasicBlockUtils.h? (same for other functions below)?
726	Use llvm_unreachable, not assert(0 &&
788	Why?
818	PHIs are always at the beginning of the block; once you hit the first non-PHI, you can exit the loop (you should never find another).

Hi Hal,
Thanks for the review. Please find my comments updated below. Will upload modified patch shortly.

P.S. Sorry for the long comments,
Thanks and Regards
Karthik Bhat

include/llvm/Transforms/Scalar.h
143	Yes of course your comment makes more sense.
lib/Transforms/Scalar/LoopInterchange.cpp
50	This function get the maximum nesting level of the innermost loop. We use this to push loops of depth 2 to worklist. For e.g. for(int i=0;i<N;i++) for(int j=0j<M;j++) for(int k=0;k<K,k++) here we want to return 3 as the max nesting level is 3. I have renamed the function and added comment also modified this function a bit to correctly return the max loop depth in case we have multiple inner loops. For e.g. for(int i=0;i<N;i++) { for(int j=0j<M;j++) { for(int k=0;k<K;k++) { // this loop has depth 3 } } for(intk=0;k<K;k++) { // this loop has depth 2 } } In the above case we still return 3 as it is the max depth.
62	getLoopDepth currently only returns the nesting level of the current loop. Since we have access to outer loop here we always get nesting level as 1. So had to go with the recursive function above.
76	Updated the code. Thanks for clarifying the problem.
82	Yes i fell we need to check isAffine as well. Thanks updated the code.
87	OK. Done.
213	OK..
262	Updated code to use cast<> wherever possible. Added null checks in places were dyn_cast is being used.
274	Modified comment.
276	Modified comment.
298	Hi Hal, The way i'm trying to conclude that a loop is tightly nested is as follows- There should not be any extra block between the outer loop and inner loop. (i.e. in this case the outer loop header would branch to inner loop preheader/inner loop body && the other branch in the header would go to the outer loop latch). With this check we can catch loops which have a block inbetween outer and inner loop such as - for(int i=0;i<N;i++) { if(X) { } for(int j=0;j<N;j++) { } } and conclude these as not tightly nested. Second type of non nested loops can be- for(int i=0;i<N;i++) { a = i; k = A[i]; for(int j=0;j<N;j++) { } } these kind of loops will be caught by the second check which check we have a single use of indvar in latch or header which is the operand to Induction Phi(i.e used to increment/decrement the loop counter). I have modified this function a bit in the updated patch. In the 3rd case i was trying to catch loops such as - for(int i=0;i<N;i++) { foo(); for(int j=0;j<N;j++) { } } I think we can do it using a combination of mayHaveSideEffects and mayReadFromMemory. Updated the patch.
310	Done.
364	Done.
416	We are currently checking if there is only 1 PHI node in the lop header which will corrospond to the induction variable. If we find any other PHI's either due to reductions or triangular loop structure. We currently exit as current limiation.
447	We count the uses in loop latch only as we split the latch based on this instruction. This was done because the mentioned example generated code as - for.body3: ; preds = %for.body3, %for.body3.lr.ph %j.018 = phi i32 [ 0, %for.body3.lr.ph ], [ %add6, %for.body3 ] %arrayidx4 = getelementptr inbounds [100 x [100 x i32]]* @A, i32 0, i32 %j.018, i32 %i.020 %5 = load i32* %arrayidx4, align 4, !tbaa !1 %add = add nsw i32 %3, %5 %add6 = add nuw nsw i32 %j.018, 1 %arrayidx8 = getelementptr inbounds [100 x [100 x i32]]* @A, i32 0, i32 %add6, i32 %add5 store i32 %add, i32* %arrayidx8, align 4, !tbaa !1 %exitcond = icmp eq i32 %j.018, %4 br i1 %exitcond, label %for.inc9.loopexit, label %for.body3 since we cannot split at %add6 = add nuw nsw i32 %j.018, 1 we give up in this case. But now that i think about it counting uses may not be the right method to check if we can split the inner loop latch. Consider the following valid loop were we fail with this check- for(int i=0;i<100;i++) for(int j=0;j<100;j++) A[j][i] = A[j][i]+k; here we get the inner loop latch as - for.body3: ; preds = %for.body3, %for.cond1.preheader %j.015 = phi i32 [ 0, %for.cond1.preheader ], [ %inc, %for.body3 ] %arrayidx4 = getelementptr inbounds [100 x [100 x i32]]* @A, i32 0, i32 %j.015, i32 %i.016 %1 = load i32* %arrayidx4, align 4, !tbaa !1 %add = add nsw i32 %0, %1 store i32 %add, i32* %arrayidx4, align 4, !tbaa !1 %inc = add nuw nsw i32 %j.015, 1 %exitcond = icmp eq i32 %inc, 100 br i1 %exitcond, label %for.inc7, label %for.body3 This could have been splitted at %inc = add nuw nsw i32 %j.015, 1 but we fail as we find more than 1 uses. Modified the logic to check tightly grouped inner loop latch which can be splitted.
460	Updated code to add a preheader when not present.
501	Done.
503	Updated code to use SCEV to get the loop from which we get the operand to decide it is a good or bad load. Able to handle code like- for(i=0;i<N;i+=1) for(j=0;j<N;j++) A[j-1][i-1] = A[j-1][i-1]+C[j-1][i-1]; after change. This now gets vectorized after interchange.
551	Updated code.
726	Done.
818	Yes you are right. Modified code.

Hi Hal,
Thanks a lot for the review. Updated the patch to address review comments. Also fixed a few issues which I found during testing.
Major changes include-

Logic to calculate profitibility has been made more acurate.
Logic to detect were to split the inner loop is changed to be more acurate.
Added test case to check the updated profitibility model.

Please let me know your inputs on this.

This still needs major work to support generic loop depths and improved profitability model.
Hopefully will be able to complete it with help from the community.

Thanks once again for the support.

Regards
Karthik Bhat

hfinkel added inline comments.Feb 12 2015, 10:42 PM

lib/Transforms/Scalar/LoopInterchange.cpp
416	No, I mean uses outside of the loops in general. I don't think you check for that. You check for: if (numUsageinLatch + numUsageinHeader != 1) return false; but the PHI could be used in any block dominated by the loop. Do you need to check for that? int i, j; for (int i = 0; i < n; ++i) for (int j = 0; j < m; ++j) a[i][j] = 7; cout << "final i, j = " << i << ", " << j << "\n";

karthikthecool added inline comments.Feb 13 2015, 1:59 AM

lib/Transforms/Scalar/LoopInterchange.cpp
416	Hi Hal, The way i was handling this was- since the loops were tightly coupled we were getting the lcssa phi for these loops in the outer loop latch which i was splitting and moving outside loop. I was able to get the correct value for i and j in this case. But i think i can add check to avoid these cases as well. Since we do not want uses outside loop is it ok to have a check like- if (isa<PHINode>(InnerLoopLatch->begin())) return false; if (isa<PHINode>(OuterLoopLatch->begin())) return false; This will make sure we do not have any outside uses defined inside the loop. Does this check look good. Will this make this transform too restrictive? Thanks for answering my silly queries i'm still getting hold of loop optimizations. Regards Karthik Bhat

hfinkel added inline comments.Feb 13 2015, 2:27 AM

lib/Transforms/Scalar/LoopInterchange.cpp
416	Ah, you're right. I think that, given our current restrictions, the final values outside the loop nest will always be the same, so this is fine. (we should have a regression test showing that we still can interchange in this case).

Hi Hal,
Updated the test case to add a test case to cover case were we have a usage of PHI outside the loop. I have added gi,gj as global vaiable and used them as induction variables in the loop to simulate this case.

It would be great if you could give me some inputs on writing test case for this pass.
Currently the test cases i have added are all more or less similar(i.e. they get vectorized after interchange).
For checking loop that are just interchanged but not vectorized do we have to check the exact instructions after interchange or may be check the PHI instruction order in .ll (after interchange the Induction PHI will be in the reverse order) ?

Thanks
Karthik Bhat

In D7499#123108, @karthikthecool wrote:

Hi Hal,
Updated the test case to add a test case to cover case were we have a usage of PHI outside the loop. I have added gi,gj as global vaiable and used them as induction variables in the loop to simulate this case.

It would be great if you could give me some inputs on writing test case for this pass.
Currently the test cases i have added are all more or less similar(i.e. they get vectorized after interchange).
For checking loop that are just interchanged but not vectorized do we have to check the exact instructions after interchange or may be check the PHI instruction order in .ll (after interchange the Induction PHI will be in the reverse order) ?

Good point; I'd not looked carefully at the tests yet. We should not test this pass by using the vectorizer, but rather, should test the output of the interchange pass directly.

We don't need to make this harder than necessary, but, I think that for a few representative cases, we should check all of the relevant parts of the output. Then for cases that are structurally similar to those, checking the PHI order (or some other signature of the interchange) is fine.

We also should have negative tests (some loops that aren't quite tightly nested, maybe with some function call or extra memory access, etc.) and make sure they're not interchanged. We should also add tests for current limitations (like that loops with reductions are not interchanged), and put in some FIXME comments stating that these are just current limitations.

Feel free to borrow IR from the files in test/Analysis/DependenceAnalysis and adapt them as tests here.

Thanks
Karthik Bhat

Hi Hal,
Sorry for the delay in followup. I was on a vaction.
Please find the updated and rebased patch. Added test cases as per your suggestion. I also verified the o/p of the programs on randomly generated array and o/p's are same before and after interchange in cases were loops are interchanged.

I will start to work on generic version of loop interchange (i.e. to support loops of any depth) after the initial version is committed.

Please let me know your inputs on this. Thanks again for your time and review. I really appreciate your help.

Regrads
Karthik Bhat

hfinkel added inline comments.Feb 19 2015, 6:46 PM

test/Transforms/LoopInterchange/vectorize.ll
1 ↗	(On Diff #20150)	Don't run the vectorizer here. Just run interchange, verify that it does what it should, and if you want end-to-end coverage through the vectorizer, add a vectorizer regression test for the interchanged loops (likely, in the subfolder of the vectorizer's regression tests for your target of interest).

Hi Hal,
Thanks for the review.
Please find the updated patch. Moved tests from vectorize.ll to profitability.ll and interchange.ll. Checking for loop interchange as per comments. Also added a negative test case to check profitabilitymodel.(i.e. were it is legal but not profitable to interchange).

Please let me know if this looks good for initial commit. This pass is currently disabled by default.

Thanks and Regards
Karthik Bhat

Hi Hal,
I had some time to work on generic version of loop interchange to support any depth. This updated patch supports loops of any depth.
The loop selection algorithm currently selects the innermost loop for interchange. Going forward we can improve this heuristic to select the most profitable loop based on Dependency matrix.
To keep it simple in the first version loops with LCSSA phi are currently not handled. I will work on handling them in later iterations.
The legality and profitability logic is pretty much the same. We use dependency matrix to conclude legality of interchange of 2 loops.

One of the TSVC benchmark test case (s231) gives 2X improvement with this patch.

I ran llvm lnt performance tests based on http://llvm.org/docs/lnt/quickstart.html#running-tests with sample size of 3 but every time i see a lot of variations in the results. I will try to run lnt with larger sample size and update the results here.

It would be great if you could let me know your inputs on this patch.

P.S. Are there any build bots which we can use to run llvm lnt/performance tests for this patch?

Thanks and Regards
Karthik Bhat

I had some time to work on generic version of loop interchange to support any depth. This updated patch supports loops of any depth.

Nice! A few comments...

include/llvm/Transforms/Scalar.h
144	You've un-improved this comment; please change it back.
lib/Transforms/IPO/PassManagerBuilder.cpp
262 ↗	(On Diff #20478)	I'd certainly like to have this on by default eventually, but we should be more conservative at first. Please add a command-line flag to enable this (there are several in this file already), so we can do further testing.
lib/Transforms/Scalar/LoopInterchange.cpp
12	You should use the 'cache-friendly memory access pattern' terminology here too.
51	Is this matrix generally sparse? (or could we make it sparse by picking some default). If so, is this the right data structure?
75	I'm somewhat worried about doing this eagerly for all loops; what if they're really large with lots of memory accesses? Maybe we should have a cutoff?
653	No need for { } here.
856	I don't really understand this comment. I think we can assume that LICM has run first. (and if this pass detects loop-invariant code better than LICM, that is another problem to fix, but not here).
917	No need for the { }

Hi Hal,
Please find my comments inline. Updated the patch as per review comments and fixed few issues found during llvm lnt regression.
The current version of loop interchange gives some 30% improvement in execution time in 2 benchmarks. This is because it contains code fragments like -

for (i = 0; i < _PB_N; i++)
 for (j = 0; j < _PB_N; j++)
   x[i] = x[i] + beta * A[j][i] * y[j];

which gets benefited after interchange.
There are few compile time regression which can be because of the heavy legality checks in loop interchange. I will try to fix this in next iteration.
Apart from this we found a crash in Dependency Analysis module which I'm planning to fix seperatly as i need to understand it in more detail. Will raise a bug on the same.

Thanks and Regards
Karthik Bhat

lib/Transforms/IPO/PassManagerBuilder.cpp
262 ↗	(On Diff #20478)	Sure. I had added it for testing performance forgot to revert before checkin. Will add a command line flag and disable it by default.
lib/Transforms/Scalar/LoopInterchange.cpp
12	Done.
51	Yes this matrix can be sparse depending on the dependence carried by the loop. I will check more on this front. Have added a TODO for now.
75	Makes sense. Added a limit of max 10 loops(Columns in the dependency matrix) and 100 dependencies(Rows of the dependency matrix).
653	Done.
856	Hi Hal, Consider the below code of matrix multiplication- for(int i=0;i<N;i++) for(int j=0;j<N;j++) for(int k=0;k<N;k++) A[i][j]= A[i][j]+B[i][k]C[k][j] In this example the direction vector would be - [= = \|<] (i.e. '=' dependency in i, '=' dependency in j and is loop independent dependency in k). The LICM pass would move getElementPointer for A[i][j] outside the inner loop but it cannot move the complete statement outside the inner loop. Now since vectorizer only works on inner loop. The above code is not vectorized for i,j. But if we interchange the loops to - for(int k=0;k<N;k++) for(int i=0;i<N;i++) for(int j=0;j<N;j++) A[i][j]= A[i][j]+B[i][k]C[k][j] now the loop gets vectorized. It is mostly profitable to keep loop independent dependencies such as the above at the outermost possible level. We try to achieve the same here.
917	Done.

Hi Hal,
Please find the updated patch attached. Addressed review comments and fixed few issues in Loop Interchange found during llvm lnt regression. After this change we find some improvement in execution time of 2 benchamrk test cases as shown in the previous post. There are few issues in Dependency analysis as you mentioned I'm planning to address it seperatly after looking into the module in more detail. Hope that should be fine?
Please let me know your inputs on the patch.
Thanks for your time and help.
Regards
Karthik Bhat

Hi All,
Rebase to trunc and update the test cases to reflect recent changes in IR format.

Patch to fix the crash in Dependency analysis mentioned above submitted at D8059. With this patch we do not see any failures in llvm lnt. As mentioned in previous comments we see execution time improvement in 2 tests and compile time regression in few test cases.

Please if you could let me know if this is good for initial checkin with pass disabled by default. We still have some work to do in this pass e.g.-

Add support for reductions and lcssa phi.
Improve profitability model.
Improve loop selection algorithm.
Improve compile time regression found in llvm lnt due to this pass.
Fix issues in Dependency Analysis module.

I would like to address them one by one on trunc if everyone is OK with it.
Awaiting response.
Regards
Karthik Bhat

Thanks for continuing to work on this. I have a few minor comments below, but we can move this in-tree. Please go ahead an commit, and we'll continue to iterate/test. When you commit, please commit the change to lib/Analysis/DependenceAnalysis.cpp separately.

lib/Transforms/Scalar/LoopInterchange.cpp
516	ENABLE_DEBUGGING is too generic for this. How about calling this: DUMP_DEP_MATRICIES
548	Please make this TODO more specific. What happens now and what should happen instead?
689	Please make this more specific. We do handle anti deps. What needs to happen?

This revision is now accepted and ready to land.Mar 5 2015, 7:23 PM

Thanks Hal. Committed as r231458 after implementing review comments. Will raise a review for DependencyAnalysis fix shortly.
Thanks and Regards
Karthik Bhat

karthikthecool mentioned this in D7432: [RFC] Loop Interchane Pass.Mar 8 2015, 9:52 PM

Revision Contents

Path

Size

include/

llvm/

InitializePasses.h

1 line

LinkAllPasses.h

1 line

Transforms/

Scalar.h

7 lines

lib/

Transforms/

Scalar/

CMakeLists.txt

1 line

LoopInterchange.cpp

876 lines

Scalar.cpp

1 line

test/

Transforms/

LoopInterchange/

currentLimitation.ll

165 lines

interchange.ll

548 lines

profitability.ll

197 lines

Diff 20374

include/llvm/InitializePasses.h

	Show First 20 Lines • Show All 162 Lines • ▼ Show 20 Lines
	void initializeLiveRegMatrixPass(PassRegistry&);			void initializeLiveRegMatrixPass(PassRegistry&);
	void initializeLiveStacksPass(PassRegistry&);			void initializeLiveStacksPass(PassRegistry&);
	void initializeLiveVariablesPass(PassRegistry&);			void initializeLiveVariablesPass(PassRegistry&);
	void initializeLoaderPassPass(PassRegistry&);			void initializeLoaderPassPass(PassRegistry&);
	void initializeLocalStackSlotPassPass(PassRegistry&);			void initializeLocalStackSlotPassPass(PassRegistry&);
	void initializeLoopDeletionPass(PassRegistry&);			void initializeLoopDeletionPass(PassRegistry&);
	void initializeLoopExtractorPass(PassRegistry&);			void initializeLoopExtractorPass(PassRegistry&);
	void initializeLoopInfoWrapperPassPass(PassRegistry&);			void initializeLoopInfoWrapperPassPass(PassRegistry&);
				void initializeLoopInterchangePass(PassRegistry &);
	void initializeLoopInstSimplifyPass(PassRegistry&);			void initializeLoopInstSimplifyPass(PassRegistry&);
	void initializeLoopRotatePass(PassRegistry&);			void initializeLoopRotatePass(PassRegistry&);
	void initializeLoopSimplifyPass(PassRegistry&);			void initializeLoopSimplifyPass(PassRegistry&);
	void initializeLoopStrengthReducePass(PassRegistry&);			void initializeLoopStrengthReducePass(PassRegistry&);
	void initializeGlobalMergePass(PassRegistry&);			void initializeGlobalMergePass(PassRegistry&);
	void initializeLoopRerollPass(PassRegistry&);			void initializeLoopRerollPass(PassRegistry&);
	void initializeLoopUnrollPass(PassRegistry&);			void initializeLoopUnrollPass(PassRegistry&);
	void initializeLoopUnswitchPass(PassRegistry&);			void initializeLoopUnswitchPass(PassRegistry&);
	▲ Show 20 Lines • Show All 119 Lines • Show Last 20 Lines

include/llvm/LinkAllPasses.h

Show First 20 Lines • Show All 91 Lines • ▼ Show 20 Lines	ForcePassLinking() {
(void) llvm::createInstructionCombiningPass();		(void) llvm::createInstructionCombiningPass();
(void) llvm::createInternalizePass();		(void) llvm::createInternalizePass();
(void) llvm::createJumpInstrTableInfoPass();		(void) llvm::createJumpInstrTableInfoPass();
(void) llvm::createJumpInstrTablesPass();		(void) llvm::createJumpInstrTablesPass();
(void) llvm::createLCSSAPass();		(void) llvm::createLCSSAPass();
(void) llvm::createLICMPass();		(void) llvm::createLICMPass();
(void) llvm::createLazyValueInfoPass();		(void) llvm::createLazyValueInfoPass();
(void) llvm::createLoopExtractorPass();		(void) llvm::createLoopExtractorPass();
		(void)llvm::createLoopInterchangePass();
(void) llvm::createLoopSimplifyPass();		(void) llvm::createLoopSimplifyPass();
(void) llvm::createLoopStrengthReducePass();		(void) llvm::createLoopStrengthReducePass();
(void) llvm::createLoopRerollPass();		(void) llvm::createLoopRerollPass();
(void) llvm::createLoopUnrollPass();		(void) llvm::createLoopUnrollPass();
(void) llvm::createLoopUnswitchPass();		(void) llvm::createLoopUnswitchPass();
(void) llvm::createLoopIdiomPass();		(void) llvm::createLoopIdiomPass();
(void) llvm::createLoopRotatePass();		(void) llvm::createLoopRotatePass();
(void) llvm::createLowerExpectIntrinsicPass();		(void) llvm::createLowerExpectIntrinsicPass();
▲ Show 20 Lines • Show All 78 Lines • Show Last 20 Lines

include/llvm/Transforms/Scalar.h

	Show First 20 Lines • Show All 134 Lines • ▼ Show 20 Lines
	//===----------------------------------------------------------------------===//			//===----------------------------------------------------------------------===//
	//			//
	// LICM - This pass is a loop invariant code motion and memory promotion pass.			// LICM - This pass is a loop invariant code motion and memory promotion pass.
	//			//
	Pass *createLICMPass();			Pass *createLICMPass();

	//===----------------------------------------------------------------------===//			//===----------------------------------------------------------------------===//
	//			//
				// LoopInterchange - This pass interchanges loops to provide a more
				hfinkelUnsubmitted Not Done Reply Inline Actions How about, "This pass interchanges loops to provide a more cache-friendly memory access patterns." hfinkel: How about, "This pass interchanges loops to provide a more cache-friendly memory access…
				karthikthecoolAuthorUnsubmitted Not Done Reply Inline Actions Yes of course your comment makes more sense. karthikthecool: Yes of course your comment makes more sense.
				// cache-friendly memory access patterns.
				hfinkelUnsubmitted Not Done Reply Inline Actions You've un-improved this comment; please change it back. hfinkel: You've un-improved this comment; please change it back.
				//
				Pass *createLoopInterchangePass();

				//===----------------------------------------------------------------------===//
				//
	// LoopStrengthReduce - This pass is strength reduces GEP instructions that use			// LoopStrengthReduce - This pass is strength reduces GEP instructions that use
	// a loop's canonical induction variable as one of their indices.			// a loop's canonical induction variable as one of their indices.
	//			//
	Pass *createLoopStrengthReducePass();			Pass *createLoopStrengthReducePass();

	Pass createGlobalMergePass(const TargetMachine TM = nullptr);			Pass createGlobalMergePass(const TargetMachine TM = nullptr);

	//===----------------------------------------------------------------------===//			//===----------------------------------------------------------------------===//
	▲ Show 20 Lines • Show All 288 Lines • Show Last 20 Lines

lib/Transforms/Scalar/CMakeLists.txt

Show All 12 Lines	add_llvm_library(LLVMScalarOpts
InductiveRangeCheckElimination.cpp		InductiveRangeCheckElimination.cpp
IndVarSimplify.cpp		IndVarSimplify.cpp
JumpThreading.cpp		JumpThreading.cpp
LICM.cpp		LICM.cpp
LoadCombine.cpp		LoadCombine.cpp
LoopDeletion.cpp		LoopDeletion.cpp
LoopIdiomRecognize.cpp		LoopIdiomRecognize.cpp
LoopInstSimplify.cpp		LoopInstSimplify.cpp
		LoopInterchange.cpp
LoopRerollPass.cpp		LoopRerollPass.cpp
LoopRotation.cpp		LoopRotation.cpp
LoopStrengthReduce.cpp		LoopStrengthReduce.cpp
LoopUnrollPass.cpp		LoopUnrollPass.cpp
LoopUnswitch.cpp		LoopUnswitch.cpp
LowerAtomic.cpp		LowerAtomic.cpp
LowerExpectIntrinsic.cpp		LowerExpectIntrinsic.cpp
MemCpyOptimizer.cpp		MemCpyOptimizer.cpp
Show All 24 Lines

lib/Transforms/Scalar/LoopInterchange.cpp

				//===- LoopInterchange.cpp - Loop interchange pass------------------------===//
				//
				// The LLVM Compiler Infrastructure
				//
				// This file is distributed under the University of Illinois Open Source
				// License. See LICENSE.TXT for details.
				//
				//===----------------------------------------------------------------------===//
				//
				// This Pass handles loop interchange transform. This interchanges the inner
				// and outer loop if interchanging can result in better cache hits.
				//
				hfinkelUnsubmitted Not Done Reply Inline Actions You should use the 'cache-friendly memory access pattern' terminology here too. hfinkel: You should use the 'cache-friendly memory access pattern' terminology here too.
				karthikthecoolAuthorUnsubmitted Not Done Reply Inline Actions Done. karthikthecool: Done.
				//===----------------------------------------------------------------------===//

				#include "llvm/ADT/SmallVector.h"
				#include "llvm/Analysis/AliasAnalysis.h"
				#include "llvm/Analysis/AliasSetTracker.h"
				#include "llvm/Analysis/AssumptionCache.h"
				#include "llvm/Analysis/BlockFrequencyInfo.h"
				#include "llvm/Analysis/CodeMetrics.h"
				#include "llvm/Analysis/DependenceAnalysis.h"
				#include "llvm/Analysis/LoopInfo.h"
				#include "llvm/Analysis/LoopIterator.h"
				#include "llvm/Analysis/LoopPass.h"
				#include "llvm/Analysis/ScalarEvolution.h"
				#include "llvm/Analysis/ScalarEvolutionExpander.h"
				#include "llvm/Analysis/ScalarEvolutionExpressions.h"
				#include "llvm/Analysis/TargetTransformInfo.h"
				#include "llvm/Analysis/ValueTracking.h"
				#include "llvm/Transforms/Scalar.h"
				#include "llvm/IR/Function.h"
				#include "llvm/IR/IRBuilder.h"
				#include "llvm/IR/IntrinsicInst.h"
				#include "llvm/IR/InstIterator.h"
				#include "llvm/IR/Dominators.h"
				#include "llvm/Pass.h"
				#include "llvm/Support/Debug.h"
				#include "llvm/Transforms/Utils/SSAUpdater.h"
				#include "llvm/Support/raw_ostream.h"
				#include "llvm/Transforms/Utils/LoopUtils.h"
				#include "llvm/Transforms/Utils/BasicBlockUtils.h"
				using namespace llvm;

				#define DEBUG_TYPE "loop-interchange"

				namespace {

				typedef std::pair<Loop , Loop > LoopPair;
				class LoopInterchange;
				// Returns the maximum inner loop depth starting from L this is used to populate
				hfinkelUnsubmitted Not Done Reply Inline Actions Please add a comment explaining what this function computes? hfinkel: Please add a comment explaining what this function computes?
				karthikthecoolAuthorUnsubmitted Not Done Reply Inline Actions This function get the maximum nesting level of the innermost loop. We use this to push loops of depth 2 to worklist. For e.g. for(int i=0;i<N;i++) for(int j=0j<M;j++) for(int k=0;k<K,k++) here we want to return 3 as the max nesting level is 3. I have renamed the function and added comment also modified this function a bit to correctly return the max loop depth in case we have multiple inner loops. For e.g. for(int i=0;i<N;i++) { for(int j=0j<M;j++) { for(int k=0;k<K;k++) { // this loop has depth 3 } } for(intk=0;k<K;k++) { // this loop has depth 2 } } In the above case we still return 3 as it is the max depth. karthikthecool: This function get the maximum nesting level of the innermost loop. We use this to push loops of…
				// worklist with loops of depth 2.
				hfinkelUnsubmitted Not Done Reply Inline Actions Is this matrix generally sparse? (or could we make it sparse by picking some default). If so, is this the right data structure? hfinkel: Is this matrix generally sparse? (or could we make it sparse by picking some default). If so…
				karthikthecoolAuthorUnsubmitted Not Done Reply Inline Actions Yes this matrix can be sparse depending on the dependence carried by the loop. I will check more on this front. Have added a TODO for now. karthikthecool: Yes this matrix can be sparse depending on the dependence carried by the loop. I will check…
				unsigned getMaxNestingLevel(Loop &L) {
				unsigned Level = 0;
				if (L.empty())
				return 1;
				hfinkelUnsubmitted Not Done Reply Inline Actions Did you mean += ? hfinkel: Did you mean += ?
				for (Loop *InnerL : L)
				Level = std::max(Level, getMaxNestingLevel(*InnerL) + 1);
				return Level;
				}

				static void populateWorklist(Loop &L, SmallVector<LoopPair, 8> &V) {
				// TODO: Currently only handled for loops depth of 2.
				hfinkelUnsubmitted Not Done Reply Inline Actions LoopInfo already has a getLoopDepth() function. Can you use that? hfinkel: LoopInfo already has a getLoopDepth() function. Can you use that?
				karthikthecoolAuthorUnsubmitted Not Done Reply Inline Actions getLoopDepth currently only returns the nesting level of the current loop. Since we have access to outer loop here we always get nesting level as 1. So had to go with the recursive function above. karthikthecool: getLoopDepth currently only returns the nesting level of the current loop. Since we have access…
				unsigned SubLoopsize = L.getSubLoops().size();
				unsigned Count = getMaxNestingLevel(L);
				if (Count == 2 && SubLoopsize == 1) {
				Loop *Inner;
				for (Loop *InnerL : L)
				Inner = InnerL;

				V.push_back(std::make_pair(&L, Inner));
				}
				}

				static PHINode getInductionVariable(Loop L, ScalarEvolution *SE) {
				PHINode *InnerIndexVar = L->getCanonicalInductionVariable();
				hfinkelUnsubmitted Not Done Reply Inline Actions I'm somewhat worried about doing this eagerly for all loops; what if they're really large with lots of memory accesses? Maybe we should have a cutoff? hfinkel: I'm somewhat worried about doing this eagerly for all loops; what if they're really large with…
				karthikthecoolAuthorUnsubmitted Not Done Reply Inline Actions Makes sense. Added a limit of max 10 loops(Columns in the dependency matrix) and 100 dependencies(Rows of the dependency matrix). karthikthecool: Makes sense. Added a limit of max 10 loops(Columns in the dependency matrix) and 100…
				if (InnerIndexVar)
				hfinkelUnsubmitted Not Done Reply Inline Actions This is not actually what you want. If the loop is branched to by, for example, multiple entries of the switch statement, the predecessor can be listed multiple times in the predecessor list (and, thus, you'll have more than two incoming values even though you have only 2 predecessor blocks). I suspect that what you actually want is that there is a unique latch and a unique predecessor, so you want that L->getLoopLatch() && L->getLoopPredecessor() [neither are nullptr]. hfinkel: This is not actually what you want. If the loop is branched to by, for example, multiple…
				karthikthecoolAuthorUnsubmitted Not Done Reply Inline Actions Updated the code. Thanks for clarifying the problem. karthikthecool: Updated the code. Thanks for clarifying the problem.
				return InnerIndexVar;
				if (L->getLoopLatch() == nullptr \|\| L->getLoopPredecessor() == nullptr)
				return nullptr;
				for (BasicBlock::iterator I = L->getHeader()->begin(); isa<PHINode>(I); ++I) {
				PHINode *PhiVar = cast<PHINode>(I);
				const SCEVAddRecExpr *AddRec =
				hfinkelUnsubmitted Not Done Reply Inline Actions Do you also need to check that AddRec->isAffine()? hfinkel: Do you also need to check that AddRec->isAffine()?
				karthikthecoolAuthorUnsubmitted Not Done Reply Inline Actions Yes i fell we need to check isAffine as well. Thanks updated the code. karthikthecool: Yes i fell we need to check isAffine as well. Thanks updated the code.
				dyn_cast<SCEVAddRecExpr>(SE->getSCEV(PhiVar));
				if (!AddRec \|\| !AddRec->isAffine())
				continue;
				const SCEV Step = AddRec->getStepRecurrence(SE);
				const SCEVConstant *C = dyn_cast<SCEVConstant>(Step);
				hfinkelUnsubmitted Not Done Reply Inline Actions Let's say: // FIXME: Handle loops with more than one induction variable. Note that, currently, legality makes sure we have only one induction variable. hfinkel: Let's say: // FIXME: Handle loops with more than one induction variable. Note that…
				karthikthecoolAuthorUnsubmitted Not Done Reply Inline Actions OK. Done. karthikthecool: OK. Done.
				if (!C)
				continue;
				// Found the induction variable.
				// FIXME: Handle loops with more than one induction variable. Note that,
				// currently, legality makes sure we have only one induction variable.
				return PhiVar;
				}
				return nullptr;
				}

				/// LoopInterchangeLegality checks if it is legal to interchange the loop.
				class LoopInterchangeLegality {
				public:
				LoopInterchangeLegality(Loop Outer, Loop Inner, ScalarEvolution *SE,
				DependenceAnalysis DA, LoopInterchange Pass)
				: OuterLoop(Outer), InnerLoop(Inner), SE(SE), DA(DA), Parent(Pass) {}

				/// Check if the loops can be interchanged.
				bool canInterchangeLoops();

				bool currentLimitations();

				private:
				bool checkDependence(Loop Outer, DependenceAnalysis DA);
				bool tightlyNested(Loop Outer, Loop Inner);

				Loop *OuterLoop;
				Loop *InnerLoop;

				/// Scev analysis.
				ScalarEvolution *SE;
				/// Dependence analysis.
				DependenceAnalysis *DA;
				LoopInterchange *Parent;
				};

				/// LoopInterchangeProfitability checks if it is profitable to interchange the
				/// loop.
				class LoopInterchangeProfitability {
				public:
				LoopInterchangeProfitability(Loop Outer, Loop Inner, ScalarEvolution *SE)
				: OuterLoop(Outer), InnerLoop(Inner), SE(SE) {}

				/// Check if the loop interchange is profitable
				bool isProfitable();

				private:
				int getInstrOrderCost();

				Loop *OuterLoop;
				Loop *InnerLoop;

				/// Scev analysis.
				ScalarEvolution *SE;
				};

				/// LoopInterchangeTransform interchanges the loop
				class LoopInterchangeTransform {
				public:
				LoopInterchangeTransform(Loop Outer, Loop Inner, ScalarEvolution *SE,
				LoopInfo LI, DominatorTree DT)
				: OuterLoop(Outer), InnerLoop(Inner), SE(SE), LI(LI), DT(DT) {
				initialize();
				}

				/// Interchange OuterLoop and InnerLoop.
				bool transform();
				void initialize();

				private:
				void splitInnerLoopLatch(Instruction *);
				void splitOuterLoopLatch();
				void splitInnerLoopHeader();
				void adjustOuterLoopLatch();
				bool adjustLoopLinks();
				void adjustOuterLoopPreheader();
				bool adjustLoopBranches();

				Loop *OuterLoop;
				Loop *InnerLoop;
				BasicBlock *InnerLoopHeader;
				BasicBlock *OuterLoopHeader;
				BasicBlock *InnerLoopLatch;
				BasicBlock *OuterLoopLatch;
				BasicBlock *OuterLatchLcssaPhiBlock;
				BasicBlock *OuterLoopPreHeader;
				BasicBlock *InnerLoopPreHeader;
				BasicBlock *InnerLoopLatchPred;
				BasicBlock *InnerLoopHeaderSucc;
				std::vector<std::pair<Loop , Loop >> interchangedLoops;
				/// Scev analysis.
				ScalarEvolution *SE;
				LoopInfo *LI;
				DominatorTree *DT;
				};

				// Main LoopInterchange Pass
				struct LoopInterchange : public FunctionPass {
				static char ID;
				ScalarEvolution *SE;
				LoopInfo *LI;
				DependenceAnalysis *DA;
				DominatorTree *DT;
				LoopInterchange()
				: FunctionPass(ID), SE(nullptr), LI(nullptr), DA(nullptr), DT(nullptr) {
				initializeLoopInterchangePass(*PassRegistry::getPassRegistry());
				}

				void getAnalysisUsage(AnalysisUsage &AU) const override {
				AU.addRequired<ScalarEvolution>();
				AU.addRequired<AliasAnalysis>();
				AU.addRequired<DominatorTreeWrapperPass>();
				AU.addRequired<LoopInfoWrapperPass>();
				AU.addRequired<DependenceAnalysis>();
				AU.addRequiredID(LoopSimplifyID);
				AU.addPreservedID(LoopSimplifyID);
				AU.addRequiredID(LCSSAID);
				AU.addPreservedID(LCSSAID);
				}

				bool runOnFunction(Function &F) override {
				SE = &getAnalysis<ScalarEvolution>();
				LI = &getAnalysis<LoopInfoWrapperPass>().getLoopInfo();
				DA = &getAnalysis<DependenceAnalysis>();
				auto *DTWP = getAnalysisIfAvailable<DominatorTreeWrapperPass>();
				DT = DTWP ? &DTWP->getDomTree() : nullptr;
				hfinkelUnsubmitted Not Done Reply Inline Actions I'd move this FIXME comment somewhere else, it is not particularly useful here. It is more useful to tag places that assume only two levels. hfinkel: I'd move this FIXME comment somewhere else, it is not particularly useful here. It is more…
				karthikthecoolAuthorUnsubmitted Not Done Reply Inline Actions OK.. karthikthecool: OK..
				// Build up a worklist of loop pairs to analyze.
				SmallVector<LoopPair, 8> Worklist;

				for (Loop L : LI)
				populateWorklist(*L, Worklist);

				DEBUG(dbgs() << "Worklist size = " << Worklist.size() << "\n");

				bool Changed = false;
				while (!Worklist.empty())
				Changed \|= processLoop(Worklist.pop_back_val());

				return Changed;
				}

				bool processLoop(LoopPair P) {
				// Check if it is legal to interchange loop
				LoopInterchangeLegality LIL(P.first, P.second, SE, DA, this);
				if (!LIL.canInterchangeLoops()) {
				DEBUG(dbgs() << "Not interchanging Loops. Cannot prove legality\n");
				return false;
				}
				DEBUG(dbgs() << "Loops are legal to interchange\n");
				LoopInterchangeProfitability LIP(P.first, P.second, SE);
				if (!LIP.isProfitable()) {
				DEBUG(dbgs() << "Interchanging Loops not profitable\n");
				return false;
				}
				LoopInterchangeTransform LIT(P.first, P.second, SE, LI, DT);
				LIT.transform();
				DEBUG(dbgs() << "Loops interchanged\n");
				return true;
				}
				};

				} // end of namespace

				static bool containsUnsafeInstructions(BasicBlock *BB) {
				for (auto I = BB->begin(), E = BB->end(); I != E; ++I) {
				if (I->mayHaveSideEffects() \|\| I->mayReadFromMemory())
				return true;
				}
				return false;
				}

				bool LoopInterchangeLegality::tightlyNested(Loop OuterLoop, Loop InnerLoop) {
				BasicBlock *OuterLoopHeader = OuterLoop->getHeader();
				BasicBlock *InnerLoopPreHeader = InnerLoop->getLoopPreheader();
				BasicBlock *OuterLoopLatch = OuterLoop->getLoopLatch();
				hfinkelUnsubmitted Not Done Reply Inline Actions Either you should handle the case where the dyn_cast fails, or if it can't fail (because we've already verified that this must be a BranchInst), then use cast<> instead. This same comment applies to many places below as well. Only use dyn_cast if the cast can fail (in which case you should handle the nullptr case). Otherwise, use cast<>. hfinkel: Either you should handle the case where the dyn_cast fails, or if it can't fail (because we've…
				karthikthecoolAuthorUnsubmitted Not Done Reply Inline Actions Updated code to use cast<> wherever possible. Added null checks in places were dyn_cast is being used. karthikthecool: Updated code to use cast<> wherever possible. Added null checks in places were dyn_cast is…

				// A perfectly nested loop will not have any branch in between the outer and
				// inner block i.e. outer header will branch to either inner preheader and
				// outerloop latch.
				BranchInst *outerLoopHeaderBI =
				cast<BranchInst>(OuterLoopHeader->getTerminator());
				unsigned num = outerLoopHeaderBI->getNumSuccessors();
				for (unsigned i = 0; i < num; i++) {
				if (outerLoopHeaderBI->getSuccessor(i) != InnerLoopPreHeader &&
				outerLoopHeaderBI->getSuccessor(i) != OuterLoopLatch)
				return false;
				}
				hfinkelUnsubmitted Not Done Reply Inline Actions Space before 'Any' hfinkel: Space before 'Any'
				karthikthecoolAuthorUnsubmitted Not Done Reply Inline Actions Modified comment. karthikthecool: Modified comment.

				// We do not have any basic block in between now make sure the outer header
				hfinkelUnsubmitted Not Done Reply Inline Actions licm -> LICM hfinkel: licm -> LICM
				karthikthecoolAuthorUnsubmitted Not Done Reply Inline Actions Modified comment. karthikthecool: Modified comment.
				// and outer loop latch doesnt contain any unsafe instructions.
				if (containsUnsafeInstructions(OuterLoopHeader) \|\|
				containsUnsafeInstructions(OuterLoopLatch))
				return false;

				// We have a perfect loop nest.
				return true;
				}

				bool LoopInterchangeLegality::checkDependence(Loop *Outer,
				DependenceAnalysis *DA) {

				typedef SmallVector<Value *, 16> ValueVector;
				// Holds Load and Store instructions.
				ValueVector MemInstr;
				// For each block.
				for (Loop::block_iterator BI = Outer->block_begin(), BE = Outer->block_end();
				BI != BE; ++BI) {
				// Scan the BB and collect legal loads and stores.
				for (BasicBlock::iterator I = (BI)->begin(), E = (BI)->end(); I != E;
				++I) {
				Instruction *Ins = dyn_cast<Instruction>(I);
				hfinkelUnsubmitted Not Done Reply Inline Actions What are you actually trying to check here? Instructions with side effects? Maybe you want I->mayHaveSideEffects()? hfinkel: What are you actually trying to check here? Instructions with side effects? Maybe you want I…
				karthikthecoolAuthorUnsubmitted Not Done Reply Inline Actions Hi Hal, The way i'm trying to conclude that a loop is tightly nested is as follows- There should not be any extra block between the outer loop and inner loop. (i.e. in this case the outer loop header would branch to inner loop preheader/inner loop body && the other branch in the header would go to the outer loop latch). With this check we can catch loops which have a block inbetween outer and inner loop such as - for(int i=0;i<N;i++) { if(X) { } for(int j=0;j<N;j++) { } } and conclude these as not tightly nested. Second type of non nested loops can be- for(int i=0;i<N;i++) { a = i; k = A[i]; for(int j=0;j<N;j++) { } } these kind of loops will be caught by the second check which check we have a single use of indvar in latch or header which is the operand to Induction Phi(i.e used to increment/decrement the loop counter). I have modified this function a bit in the updated patch. In the 3rd case i was trying to catch loops such as - for(int i=0;i<N;i++) { foo(); for(int j=0;j<N;j++) { } } I think we can do it using a combination of mayHaveSideEffects and mayReadFromMemory. Updated the patch. karthikthecool: Hi Hal, The way i'm trying to conclude that a loop is tightly nested is as follows- 1) There…
				if (!Ins)
				return false;
				LoadInst *Ld = dyn_cast<LoadInst>(I);
				StoreInst *St = dyn_cast<StoreInst>(I);
				if (!St && !Ld)
				continue;
				if (Ld && !Ld->isSimple())
				return false;
				if (St && !St->isSimple())
				return false;
				MemInstr.push_back(Ins);
				}
				hfinkelUnsubmitted Not Done Reply Inline Actions These checks look identical to those above, please make a function (a lambda function is fine). hfinkel: These checks look identical to those above, please make a function (a lambda function is fine).
				karthikthecoolAuthorUnsubmitted Not Done Reply Inline Actions Done. karthikthecool: Done.
				}

				DEBUG(dbgs() << "Found " << MemInstr.size()
				<< " Loads and stores to analyze\n");

				ValueVector::iterator I, IE, J, JE;
				for (I = MemInstr.begin(), IE = MemInstr.end(); I != IE; ++I) {
				for (J = I, JE = MemInstr.end(); J != JE; ++J) {
				Instruction Src = cast<Instruction>(I);
				Instruction Des = cast<Instruction>(J);
				if (Src == Des)
				continue;
				DEBUG(dbgs() << "Checking Depencency between Src " << Src << " and Des"
				<< Des << "\n");

				if (auto D = DA->depends(Src, Des, true)) {
				// TODO: Fix his handle only anti/output dep for now.
				if (D->isFlow()) {
				// TODO: Flow dependency can be interchanged??
				DEBUG(dbgs() << "Flow dependence not handled");
				return false;
				}
				if (D->isAnti()) {
				DEBUG(dbgs() << "Found Anti dependence \n");
				unsigned Levels = D->getLevels();

				// If the two memory instructions have an anti dependence check
				// the distance or the direction by which they vary.
				// Interchanging two loops with anti dependence is valid if the
				// dependence distance is not positive in each level.
				for (unsigned II = 1; II <= Levels; ++II) {
				const SCEV *Distance = D->getDistance(II);
				const SCEVConstant *SCEVConst =
				dyn_cast_or_null<SCEVConstant>(Distance);
				if (SCEVConst) {
				const ConstantInt *CI = SCEVConst->getValue();
				if (!CI \|\| (!CI->isNegative() && !CI->isZeroValue()))
				return false;
				} else if (D->isScalar(II)) {
				DEBUG(dbgs()
				<< "TODO:Scalars dependence are currently not handled\n");
				return false;
				} else {
				unsigned Direction = D->getDirection(II);
				if (Direction == Dependence::DVEntry::LT \|\|
				Direction == Dependence::DVEntry::LE \|\|
				Direction == Dependence::DVEntry::EQ)
				continue;
				return false;
				}
				}
				}
				}
				}
				hfinkelUnsubmitted Not Done Reply Inline Actions Can you include Src and Des in these debug messages so that we can see what instructions are relevant? hfinkel: Can you include Src and Des in these debug messages so that we can see what instructions are…
				karthikthecoolAuthorUnsubmitted Not Done Reply Inline Actions Done. karthikthecool: Done.
				}

				return true;
				}

				// This function indicates the current limitations in the transform as a result
				// of which we do not proceed.
				bool LoopInterchangeLegality::currentLimitations() {

				BasicBlock *InnerLoopPreHeader = InnerLoop->getLoopPreheader();
				BasicBlock *InnerLoopHeader = InnerLoop->getHeader();
				BasicBlock *InnerLoopLatch = InnerLoop->getLoopLatch();
				int PhiCount = 0;
				PHINode *PHI;

				// We currently handle only 1 induction variable inside the loop. We also do
				// not handle reductions as of now.
				for (auto I = InnerLoopHeader->begin(); isa<PHINode>(I); ++I) {
				PHI = cast<PHINode>(I);
				PhiCount++;
				if (PhiCount > 1)
				return true;
				}

				// TODO: Current limitation: Since we split the inner loop latch at the point
				// were induction variable is incremented (induction.next); We cannot have
				// more than 1 user of induction.next since it would result in broken code
				// after split.
				// e.g.
				// for(i=0;i<N;i++) {
				// for(j = 0;j<M;j++) {
				// A[j+1][i+2] = A[j][i]+k;
				// }
				// }
				bool FoundInduction = false;
				Instruction *InnerIndexVarInc = nullptr;
				if (PHI->getIncomingBlock(0) == InnerLoopPreHeader)
				InnerIndexVarInc = dyn_cast<Instruction>(PHI->getIncomingValue(1));
				else
				InnerIndexVarInc = dyn_cast<Instruction>(PHI->getIncomingValue(0));

				if (!InnerIndexVarInc)
				return true;

				// Since we split the inner loop latch on this induction variable. Make sure
				// we do not have any instruction between the induction variable and branch
				// instruction.

				for (auto I = InnerLoopLatch->rbegin(), E = InnerLoopLatch->rend();
				I != E && !FoundInduction; ++I) {
				if (isa<BranchInst>(I) \|\| isa<CmpInst>(I) \|\| isa<TruncInst>(*I))
				continue;
				hfinkelUnsubmitted Not Done Reply Inline Actions How are you checking for reductions here? Do you need to check that the one PHI you've found is not used outside of the loop? hfinkel: How are you checking for reductions here? Do you need to check that the one PHI you've found is…
				karthikthecoolAuthorUnsubmitted Not Done Reply Inline Actions We are currently checking if there is only 1 PHI node in the lop header which will corrospond to the induction variable. If we find any other PHI's either due to reductions or triangular loop structure. We currently exit as current limiation. karthikthecool: We are currently checking if there is only 1 PHI node in the lop header which will corrospond…
				hfinkelUnsubmitted Not Done Reply Inline Actions No, I mean uses outside of the loops in general. I don't think you check for that. You check for: if (numUsageinLatch + numUsageinHeader != 1) return false; but the PHI could be used in any block dominated by the loop. Do you need to check for that? int i, j; for (int i = 0; i < n; ++i) for (int j = 0; j < m; ++j) a[i][j] = 7; cout << "final i, j = " << i << ", " << j << "\n"; hfinkel: No, I mean uses outside of the loops in general. I don't think you check for that. You check…
				karthikthecoolAuthorUnsubmitted Not Done Reply Inline Actions Hi Hal, The way i was handling this was- since the loops were tightly coupled we were getting the lcssa phi for these loops in the outer loop latch which i was splitting and moving outside loop. I was able to get the correct value for i and j in this case. But i think i can add check to avoid these cases as well. Since we do not want uses outside loop is it ok to have a check like- if (isa<PHINode>(InnerLoopLatch->begin())) return false; if (isa<PHINode>(OuterLoopLatch->begin())) return false; This will make sure we do not have any outside uses defined inside the loop. Does this check look good. Will this make this transform too restrictive? Thanks for answering my silly queries i'm still getting hold of loop optimizations. Regards Karthik Bhat karthikthecool: Hi Hal, The way i was handling this was- since the loops were tightly coupled we were getting…
				hfinkelUnsubmitted Not Done Reply Inline Actions Ah, you're right. I think that, given our current restrictions, the final values outside the loop nest will always be the same, so this is fine. (we should have a regression test showing that we still can interchange in this case). hfinkel: Ah, you're right. I think that, given our current restrictions, the final values outside the…
				const Instruction &Ins = *I;
				// We found an instruction. If this is not induction variable then it is not
				// safe to split this loop latch.
				if (!Ins.isIdenticalTo(InnerIndexVarInc))
				return true;
				else
				FoundInduction = true;
				}
				// The loop latch ended and we didnt find the induction variable return as
				// current limitation.
				if (!FoundInduction)
				return true;

				return false;
				}

				bool LoopInterchangeLegality::canInterchangeLoops() {

				BasicBlock *OuterLoopPreHeader = OuterLoop->getLoopPreheader();
				BasicBlock *InnerLoopPreHeader = InnerLoop->getLoopPreheader();
				if (!OuterLoopPreHeader \|\| OuterLoopPreHeader == OuterLoop->getHeader()) {
				OuterLoopPreHeader = InsertPreheaderForLoop(OuterLoop, Parent);
				}
				if (!InnerLoopPreHeader \|\| InnerLoopPreHeader == OuterLoop->getHeader() \|\|
				InnerLoopPreHeader == InnerLoop->getHeader()) {
				InnerLoopPreHeader = InsertPreheaderForLoop(InnerLoop, Parent);
				}

				// ScalarEvolution needs to be able to find the exit count.
				const SCEV *ExitCountOuter = SE->getBackedgeTakenCount(OuterLoop);
				const SCEV *ExitCountInner = SE->getBackedgeTakenCount(InnerLoop);
				hfinkelUnsubmitted Not Done Reply Inline Actions Why are you only counting uses in the latch block? Should the increment be in some other block, then what? hfinkel: Why are you only counting uses in the latch block? Should the increment be in some other block…
				karthikthecoolAuthorUnsubmitted Not Done Reply Inline Actions We count the uses in loop latch only as we split the latch based on this instruction. This was done because the mentioned example generated code as - for.body3: ; preds = %for.body3, %for.body3.lr.ph %j.018 = phi i32 [ 0, %for.body3.lr.ph ], [ %add6, %for.body3 ] %arrayidx4 = getelementptr inbounds [100 x [100 x i32]]* @A, i32 0, i32 %j.018, i32 %i.020 %5 = load i32* %arrayidx4, align 4, !tbaa !1 %add = add nsw i32 %3, %5 %add6 = add nuw nsw i32 %j.018, 1 %arrayidx8 = getelementptr inbounds [100 x [100 x i32]]* @A, i32 0, i32 %add6, i32 %add5 store i32 %add, i32* %arrayidx8, align 4, !tbaa !1 %exitcond = icmp eq i32 %j.018, %4 br i1 %exitcond, label %for.inc9.loopexit, label %for.body3 since we cannot split at %add6 = add nuw nsw i32 %j.018, 1 we give up in this case. But now that i think about it counting uses may not be the right method to check if we can split the inner loop latch. Consider the following valid loop were we fail with this check- for(int i=0;i<100;i++) for(int j=0;j<100;j++) A[j][i] = A[j][i]+k; here we get the inner loop latch as - for.body3: ; preds = %for.body3, %for.cond1.preheader %j.015 = phi i32 [ 0, %for.cond1.preheader ], [ %inc, %for.body3 ] %arrayidx4 = getelementptr inbounds [100 x [100 x i32]]* @A, i32 0, i32 %j.015, i32 %i.016 %1 = load i32* %arrayidx4, align 4, !tbaa !1 %add = add nsw i32 %0, %1 store i32 %add, i32* %arrayidx4, align 4, !tbaa !1 %inc = add nuw nsw i32 %j.015, 1 %exitcond = icmp eq i32 %inc, 100 br i1 %exitcond, label %for.inc7, label %for.body3 This could have been splitted at %inc = add nuw nsw i32 %j.015, 1 but we fail as we find more than 1 uses. Modified the logic to check tightly grouped inner loop latch which can be splitted. karthikthecool: We count the uses in loop latch only as we split the latch based on this instruction. This was…
				if (ExitCountOuter == SE->getCouldNotCompute() \|\|
				ExitCountInner == SE->getCouldNotCompute()) {
				DEBUG(dbgs() << "Could not determine number of loop iterations\n");
				return false;
				}

				// We must have a single backedge.
				if (OuterLoop->getNumBackEdges() != 1 \|\| InnerLoop->getNumBackEdges() != 1) {
				DEBUG(dbgs() << "loop control flow is not understood");
				return false;
				}
				// We must have a single exiting block.
				if (!OuterLoop->getExitingBlock() \|\| !InnerLoop->getExitingBlock()) {
				hfinkelUnsubmitted Not Done Reply Inline Actions Let's say, "Inner or outer loops lack a preheader"? Also, for the future, adding a preheader when one is not present is pretty easy (you just need to call InsertPreheaderForLoop from llvm/Transforms/Utils/LoopUtils.h), we this is a limitation that should be removed sooner rather than later (although after the initial commit is okay). hfinkel: Let's say, "Inner or outer loops lack a preheader"? Also, for the future, adding a preheader…
				karthikthecoolAuthorUnsubmitted Not Done Reply Inline Actions Updated code to add a preheader when not present. karthikthecool: Updated code to add a preheader when not present.
				DEBUG(dbgs() << "loop control flow is not understood");
				return false;
				}
				// Check if the loops are tightly nested.
				if (!tightlyNested(OuterLoop, InnerLoop)) {
				DEBUG(dbgs() << "Loops not tightly nested\n");
				return false;
				}

				// TODO: The loops could not be interchanged due to current limitations in the
				// transform module.
				if (currentLimitations()) {
				DEBUG(dbgs() << "Not legal because of current transform limitation\n");
				return false;
				}
				return checkDependence(OuterLoop, DA);
				}

				int LoopInterchangeProfitability::getInstrOrderCost() {
				unsigned GoodOrder, BadOrder;
				BadOrder = GoodOrder = 0;
				for (auto BI = InnerLoop->block_begin(), BE = InnerLoop->block_end();
				BI != BE; ++BI) {
				for (auto I = (BI)->begin(), E = (BI)->end(); I != E; ++I) {
				const Instruction &Ins = *I;
				if (const GetElementPtrInst *GEP = dyn_cast<GetElementPtrInst>(&Ins)) {
				unsigned NumOp = GEP->getNumOperands();
				for (unsigned i = 0; i < NumOp; ++i) {
				const SCEV *OperandVal = SE->getSCEV(GEP->getOperand(i));
				const SCEVAddRecExpr *AR = dyn_cast<SCEVAddRecExpr>(OperandVal);
				if (!AR)
				continue;
				// If leftmost operand comes from inner loop then it is a bad order.
				if (AR->getLoop() == InnerLoop) {
				BadOrder++;
				break;
				}
				// If leftmost operand comes from outer loop then it is a good order.
				if (AR->getLoop() == OuterLoop) {
				GoodOrder++;
				break;
				hfinkelUnsubmitted Not Done Reply Inline Actions if (GetElementPtrInst GEP = dyn_cast<GetElementPtrInst>(UseInstr)) { ... } hfinkel:* if (GetElementPtrInst *GEP = dyn_cast<GetElementPtrInst>(UseInstr)) { ... }
				karthikthecoolAuthorUnsubmitted Not Done Reply Inline Actions Done. karthikthecool: Done.
				}
				}
				hfinkelUnsubmitted Not Done Reply Inline Actions What happens if it is not the IV directly, but some expression of the IV? I think you'd be better off using ScalarEvolution here, get the AddRec of the GEP, and see if the "outer" AddRec is provided in terms of the SCEV of the IV (or something like that). hfinkel: What happens if it is not the IV directly, but some expression of the IV? I think you'd be…
				karthikthecoolAuthorUnsubmitted Not Done Reply Inline Actions Updated code to use SCEV to get the loop from which we get the operand to decide it is a good or bad load. Able to handle code like- for(i=0;i<N;i+=1) for(j=0;j<N;j++) A[j-1][i-1] = A[j-1][i-1]+C[j-1][i-1]; after change. This now gets vectorized after interchange. karthikthecool: Updated code to use SCEV to get the loop from which we get the operand to decide it is a good…
				}
				}
				}

				return GoodOrder - BadOrder;
				}

				bool LoopInterchangeProfitability::isProfitable() {
				// TODO: Add Better Profitibility checks.
				// e.g
				// 1) Construct dependency matrix and move the one with no loop carried dep
				// inside to enable vectorization.
				// 2) If reordering results in inner loop having stride of 1 etc.
				hfinkelUnsubmitted Not Done Reply Inline Actions ENABLE_DEBUGGING is too generic for this. How about calling this: DUMP_DEP_MATRICIES hfinkel: ENABLE_DEBUGGING is too generic for this. How about calling this: DUMP_DEP_MATRICIES

				// This is rough cost estimation algorithm. It counts the good and bad order
				// of induction variables in the instruction and allows reordering if number
				// of bad orders is more than good.
				int Cost = 0;
				Cost += getInstrOrderCost();
				DEBUG(dbgs() << "Cost = " << Cost << "\n");
				if (Cost < 0)
				return true;
				return false;
				}

				bool LoopInterchangeTransform::transform() {

				DEBUG(dbgs() << "transform\n");
				bool Transformed = false;
				Instruction *InnerIndexVar;
				PHINode *InductionPHI = getInductionVariable(InnerLoop, SE);
				if (!InductionPHI) {
				DEBUG(dbgs() << "Failed to find the point to split loop latch \n");
				return false;
				}

				if (InductionPHI->getIncomingBlock(0) == InnerLoopPreHeader)
				InnerIndexVar = dyn_cast<Instruction>(InductionPHI->getIncomingValue(1));
				else
				InnerIndexVar = dyn_cast<Instruction>(InductionPHI->getIncomingValue(0));

				//
				// Split at the place were the induction variable is incremented/decremented.
				// TODO: This splitting logic may not work always. Fix this.
				splitInnerLoopLatch(InnerIndexVar);
				hfinkelUnsubmitted Not Done Reply Inline Actions Please make this TODO more specific. What happens now and what should happen instead? hfinkel: Please make this TODO more specific. What happens now and what should happen instead?
				DEBUG(dbgs() << "splitInnerLoopLatch Done\n");

				// Splits the inner loops phi nodes out into a seperate basic block.
				hfinkelUnsubmitted Not Done Reply Inline Actions Use SplitBlock from include/llvm/Transforms/Utils/BasicBlockUtils.h? (same for other functions below)? hfinkel: Use SplitBlock from include/llvm/Transforms/Utils/BasicBlockUtils.h? (same for other functions…
				karthikthecoolAuthorUnsubmitted Not Done Reply Inline Actions Updated code. karthikthecool: Updated code.
				splitInnerLoopHeader();
				DEBUG(dbgs() << "splitInnerLoopHeader Done\n");

				// Splits the LCSSA PHI nodes into a seperate block.
				splitOuterLoopLatch();
				DEBUG(dbgs() << "splitOuterLoopLatch Done\n");

				adjustOuterLoopLatch();

				Transformed \|= adjustLoopLinks();
				if (!Transformed) {
				DEBUG(dbgs() << "adjustLoopLinks Failed\n");
				return false;
				}
				SE->forgetLoop(OuterLoop);
				SE->forgetLoop(InnerLoop);
				return true;
				}

				void LoopInterchangeTransform::initialize() {
				InnerLoopHeader = InnerLoop->getHeader();
				OuterLoopHeader = OuterLoop->getHeader();
				InnerLoopLatch = InnerLoop->getLoopLatch();
				OuterLoopLatch = OuterLoop->getLoopLatch();
				OuterLoopPreHeader = OuterLoop->getLoopPreheader();
				InnerLoopPreHeader = InnerLoop->getLoopPreheader();
				}

				void LoopInterchangeTransform::splitInnerLoopLatch(Instruction *inc) {

				BasicBlock::iterator I = InnerLoopLatch->begin();
				BasicBlock::iterator E = InnerLoopLatch->end();
				for (; I != E; ++I) {
				if (inc == I)
				break;
				}

				// Split the inner loop latch out.
				InnerLoopLatchPred = InnerLoopLatch;
				InnerLoopLatch = SplitBlock(InnerLoopLatchPred, I, DT, LI);
				}

				void LoopInterchangeTransform::splitOuterLoopLatch() {
				OuterLatchLcssaPhiBlock = OuterLoopLatch;
				OuterLoopLatch = SplitBlock(OuterLatchLcssaPhiBlock,
				OuterLoopLatch->getFirstNonPHI(), DT, LI);
				}

				void LoopInterchangeTransform::adjustOuterLoopLatch() {

				for (auto BI = OuterLoop->block_begin(), BE = OuterLoop->block_end();
				BI != BE; ++BI) {
				BranchInst BInstr = dyn_cast<BranchInst>((BI)->getTerminator());
				if (!BInstr)
				continue;
				unsigned NumOp = BInstr->getNumSuccessors();
				for (unsigned i = 0; i < NumOp; ++i) {
				if (BInstr->getSuccessor(i) == OuterLatchLcssaPhiBlock) {
				BInstr->setSuccessor(i, OuterLoopLatch);
				}
				}
				}
				// Now set OuterLatchLcssaPhiBlock as successor of OuterLoopLatch.
				BranchInst *BInstr = cast<BranchInst>(OuterLoopLatch->getTerminator());
				BasicBlock *LoopNestExit;
				if (BInstr->getSuccessor(0) == OuterLoopHeader) {
				LoopNestExit = BInstr->getSuccessor(1);
				BInstr->setSuccessor(1, OuterLatchLcssaPhiBlock);
				} else {
				LoopNestExit = BInstr->getSuccessor(0);
				BInstr->setSuccessor(0, OuterLatchLcssaPhiBlock);
				}

				// Incoming block changed adjust PHI nodes in OuterLatchLcssaPhiBlock.
				// One block will branch from outer pre header after checking the condition
				// for inner loop and another from inner loop latch.
				for (auto I = OuterLatchLcssaPhiBlock->begin(); isa<PHINode>(I); ++I) {
				PHINode *LCSSAPHI = cast<PHINode>(I);
				unsigned NumOp = LCSSAPHI->getNumIncomingValues();
				for (unsigned i = 0; i < NumOp; ++i) {
				if (LCSSAPHI->getIncomingBlock(i) != OuterLoopHeader)
				LCSSAPHI->setIncomingBlock(i, InnerLoopLatch);
				else
				LCSSAPHI->setIncomingBlock(i, OuterLoopPreHeader);
				}
				}

				BInstr = cast<BranchInst>(OuterLatchLcssaPhiBlock->getTerminator());
				BInstr->setSuccessor(0, LoopNestExit);

				// Outer Loop's LCSSA nodes have been now moved outside loop. This is done
				// because after interchange we need to have a check for inner loops branch
				// condition in the preheader and exit the loop in case the condition fails.
				LI->removeBlock(OuterLatchLcssaPhiBlock);

				// Incoming block changed adjust PHI nodes in LoopNestExit
				for (auto I = LoopNestExit->begin(); isa<PHINode>(I); ++I) {
				PHINode *PHI = cast<PHINode>(I);
				unsigned NumOp = PHI->getNumIncomingValues();
				for (unsigned i = 0; i < NumOp; ++i) {
				if (PHI->getIncomingBlock(i) == OuterLoopLatch)
				PHI->setIncomingBlock(i, OuterLatchLcssaPhiBlock);
				hfinkelUnsubmitted Not Done Reply Inline Actions No need for { } here. hfinkel: No need for { } here.
				karthikthecoolAuthorUnsubmitted Not Done Reply Inline Actions Done. karthikthecool: Done.
				}
				}
				}

				void LoopInterchangeTransform::splitInnerLoopHeader() {

				// Split the inner loop header out.
				InnerLoopHeaderSucc =
				SplitBlock(InnerLoopHeader, InnerLoopHeader->getFirstNonPHI(), DT, LI);

				DEBUG(dbgs() << "Output of splitInnerLoopHeader InnerLoopHeaderSucc & "
				"InnerLoopHeader \n");
				}

				void LoopInterchangeTransform::adjustOuterLoopPreheader() {
				// Adjust the outerLoop preheader to jump to inner loop preheader and
				// if the
				BranchInst *outerLoopHeaderBI =
				cast<BranchInst>(OuterLoopHeader->getTerminator());
				BranchInst *outerLoopPreHeaderBI =
				cast<BranchInst>(OuterLoopPreHeader->getTerminator());
				BranchInst *ExitInst = cast<BranchInst>(OuterLoopLatch->getTerminator());

				BasicBlock *TrueBlock;
				BasicBlock *FalseBlock;
				BasicBlock *LoopNestExitBlock;
				if (outerLoopHeaderBI->isUnconditional()) {
				BranchInst::Create(InnerLoopPreHeader, outerLoopPreHeaderBI);
				outerLoopPreHeaderBI->eraseFromParent();
				return;
				}
				// Find the loop nest exit block.
				if (ExitInst->getSuccessor(0) == OuterLoopHeader)
				LoopNestExitBlock = ExitInst->getSuccessor(1);
				else
				LoopNestExitBlock = ExitInst->getSuccessor(0);
				hfinkelUnsubmitted Not Done Reply Inline Actions Please make this more specific. We do handle anti deps. What needs to happen? hfinkel: Please make this more specific. We do handle anti deps. What needs to happen?
				// OuterLoopPreheader will branch to inner loop preheader and exit to Loop
				// nest exit based on inner loops branch condition. On exit condition we
				// should now branch to loop exit.
				// Rewrite the Branch instruction to handle this.
				if (outerLoopHeaderBI->getSuccessor(0) == InnerLoopPreHeader) {
				TrueBlock = InnerLoopPreHeader;
				FalseBlock = LoopNestExitBlock;
				} else {
				FalseBlock = InnerLoopPreHeader;
				TrueBlock = LoopNestExitBlock;
				}
				BranchInst::Create(TrueBlock, FalseBlock, outerLoopHeaderBI->getCondition(),
				outerLoopPreHeaderBI);
				outerLoopPreHeaderBI->eraseFromParent();
				return;
				}

				bool LoopInterchangeTransform::adjustLoopBranches() {

				SmallVector<PHINode *, 8> LoopPhis;
				SmallVector<Instruction *, 8> DepInstr;
				BranchInst *InnerPreheaderBI =
				dyn_cast<BranchInst>(InnerLoopPreHeader->getTerminator());
				BranchInst *InnerHeaderBI =
				dyn_cast<BranchInst>(InnerLoopHeader->getTerminator());
				BranchInst *OuterHeaderBI =
				dyn_cast<BranchInst>(OuterLoopHeader->getTerminator());
				BranchInst *InnerLoopLatchPredBI =
				dyn_cast<BranchInst>(InnerLoopLatchPred->getTerminator());
				BranchInst *OuterLoopLatchBI =
				dyn_cast<BranchInst>(OuterLoopLatch->getTerminator());
				BranchInst *InnerLoopLatchBI =
				dyn_cast<BranchInst>(InnerLoopLatch->getTerminator());
				BranchInst *InnerLoopHeaderSuccBI =
				dyn_cast<BranchInst>(InnerLoopHeaderSucc->getTerminator());

				BasicBlock *LoopNestExit;
				hfinkelUnsubmitted Not Done Reply Inline Actions Use llvm_unreachable, not assert(0 && hfinkel: Use llvm_unreachable, not assert(0 &&
				karthikthecoolAuthorUnsubmitted Not Done Reply Inline Actions Done. karthikthecool: Done.
				if (!InnerPreheaderBI \|\| !InnerHeaderBI \|\| !InnerPreheaderBI \|\|
				!InnerLoopLatchPredBI \|\| !OuterLoopLatchBI \|\| !InnerLoopLatchBI \|\|
				!InnerLoopHeaderSuccBI)
				llvm_unreachable(
				"This should not be triggered.We have already modified parts "
				"of the loop");

				// Update OuterLoopHeader PHI nodes as incoming block has changed.
				// Collect all Phi nodes and instructions in the outer loop header.
				for (auto I = OuterLoopHeader->begin(), E = OuterLoopHeader->end(); I != E;
				++I) {
				Instruction *Ins = I;
				if (isa<PHINode>(I))
				LoopPhis.push_back(cast<PHINode>(I));
				for (auto UI = Ins->user_begin(), UE = Ins->user_end(); UI != UE; ++UI) {
				User Use = UI;
				Instruction *U = dyn_cast<Instruction>(Use);
				if (!U)
				continue;
				// find any instuctions in inner loop preheader which may requires
				// instructions in outer loop pre header
				if (U->getParent() == InnerLoopPreHeader) {
				DepInstr.push_back(U);
				}
				}
				}

				// Move these dependent instructions to the new inner loop header(old outer
				// loop header)
				// Insert in the same order as it was present use rbegin and rend
				for (auto I = DepInstr.rbegin(), E = DepInstr.rend(); I != E; ++I) {
				Instruction Ins = I;
				Ins->moveBefore(OuterLoopHeader->getTerminator());
				}

				// Create an unconditional branch to the new splitted innerLoopHeader
				// from inner loop preheader. Erase the old branch instruction.
				BranchInst::Create(InnerLoopHeader, InnerPreheaderBI);
				InnerPreheaderBI->eraseFromParent();

				// Branch from the old inner loop header to outer loop header.
				// Erase old branch instruction.
				BranchInst::Create(OuterLoopHeader, InnerHeaderBI);
				InnerHeaderBI->eraseFromParent();

				// Adjust Phi nodes of the outer loop header. The previous incoming
				// OuterLoopPreHeader is now gone and the new incoming block is
				// innerLoopHeader.
				while (!LoopPhis.empty()) {
				PHINode *CurrIV = LoopPhis.pop_back_val();
				unsigned numIncomingBlocks = CurrIV->getNumIncomingValues();
				for (unsigned i = 0; i < numIncomingBlocks; ++i) {
				if (CurrIV->getIncomingBlock(i) == OuterLoopPreHeader) {
				CurrIV->setIncomingBlock(i, InnerLoopHeader);
				}
				}
				}

				// The outer header will now branch to the InnerLoopHeaderSucc(which was
				// obtained after spiltting PHI nodes of the inner loop header) instead of
				// inner loop preheader.
				BranchInst::Create(InnerLoopHeaderSucc, OuterHeaderBI);
				hfinkelUnsubmitted Not Done Reply Inline Actions Why? hfinkel: Why?
				OuterHeaderBI->eraseFromParent();

				BasicBlock *InnerLoopExitBlock;
				BasicBlock *InnerLoopExitIncomingBlock = nullptr;

				if (InnerLoopLatchBI->getSuccessor(0) == InnerLoopHeader)
				InnerLoopExitBlock = InnerLoopLatchBI->getSuccessor(1);
				else
				InnerLoopExitBlock = InnerLoopLatchBI->getSuccessor(0);

				if (InnerLoopHeaderSuccBI->getSuccessor(0) == InnerLoopLatch) {
				InnerLoopHeaderSuccBI->setSuccessor(0, InnerLoopExitBlock);
				InnerLoopExitIncomingBlock = InnerLoopHeaderSucc;
				} else if (!InnerLoopHeaderSuccBI->isUnconditional() &&
				InnerLoopHeaderSuccBI->getSuccessor(1) == InnerLoopLatch) {
				InnerLoopHeaderSuccBI->setSuccessor(1, InnerLoopExitBlock);
				InnerLoopExitIncomingBlock = InnerLoopHeaderSucc;
				}

				// If the split did't result in same basic block adjust InnerLoopLatchPred
				if (InnerLoopHeaderSuccBI != InnerLoopLatchPredBI) {
				if (InnerLoopLatchPredBI->getSuccessor(0) == InnerLoopLatch) {
				InnerLoopLatchPredBI->setSuccessor(0, InnerLoopExitBlock);
				InnerLoopExitIncomingBlock = InnerLoopLatchPred;
				} else if (!InnerLoopLatchPredBI->isUnconditional() &&
				InnerLoopLatchPredBI->getSuccessor(1) == InnerLoopLatch) {
				InnerLoopLatchPredBI->setSuccessor(1, InnerLoopExitBlock);
				InnerLoopExitIncomingBlock = InnerLoopLatchPred;
				}
				}
				hfinkelUnsubmitted Not Done Reply Inline Actions PHIs are always at the beginning of the block; once you hit the first non-PHI, you can exit the loop (you should never find another). hfinkel: PHIs are always at the beginning of the block; once you hit the first non-PHI, you can exit the…
				karthikthecoolAuthorUnsubmitted Not Done Reply Inline Actions Yes you are right. Modified code. karthikthecool: Yes you are right. Modified code.

				// Update lcssa phi's in exitblock of old inner loop as incoming block has
				// changed.
				for (auto I = InnerLoopExitBlock->begin(); isa<PHINode>(I); ++I) {
				PHINode *PHI = cast<PHINode>(I);
				unsigned numBlocks = PHI->getNumIncomingValues();
				for (unsigned i = 0; i < numBlocks; ++i) {
				if (PHI->getIncomingBlock(i) == InnerLoopLatch &&
				InnerLoopExitIncomingBlock != nullptr)
				PHI->setIncomingBlock(i, InnerLoopExitIncomingBlock);
				}
				}

				// exit of outerLoop latch will now branch to inner loop latch
				if (OuterLoopLatchBI->getSuccessor(0) == OuterLoopHeader) {
				LoopNestExit = OuterLoopLatchBI->getSuccessor(1);
				OuterLoopLatchBI->setSuccessor(1, InnerLoopLatch);
				} else {
				LoopNestExit = OuterLoopLatchBI->getSuccessor(0);
				OuterLoopLatchBI->setSuccessor(0, InnerLoopLatch);
				}

				// exit of innerLoop latch will now branch to loop nest exit
				if (InnerLoopLatchBI->getSuccessor(0) == InnerLoopHeader)
				InnerLoopLatchBI->setSuccessor(1, LoopNestExit);
				else
				InnerLoopLatchBI->setSuccessor(0, LoopNestExit);

				return true;
				}

				bool LoopInterchangeTransform::adjustLoopLinks() {

				// Adjust the outerLoop preheader to jump to inner loop preheader and
				// to loop nest exit based on inner loops branch condition.
				adjustOuterLoopPreheader();

				// Adjust all branches so that outer loop is moved inside and inner loop is
				hfinkelUnsubmitted Not Done Reply Inline Actions I don't really understand this comment. I think we can assume that LICM has run first. (and if this pass detects loop-invariant code better than LICM, that is another problem to fix, but not here). hfinkel: I don't really understand this comment. I think we can assume that LICM has run first. (and if…
				karthikthecoolAuthorUnsubmitted Not Done Reply Inline Actions Hi Hal, Consider the below code of matrix multiplication- for(int i=0;i<N;i++) for(int j=0;j<N;j++) for(int k=0;k<N;k++) A[i][j]= A[i][j]+B[i][k]C[k][j] In this example the direction vector would be - [= = \|<] (i.e. '=' dependency in i, '=' dependency in j and is loop independent dependency in k). The LICM pass would move getElementPointer for A[i][j] outside the inner loop but it cannot move the complete statement outside the inner loop. Now since vectorizer only works on inner loop. The above code is not vectorized for i,j. But if we interchange the loops to - for(int k=0;k<N;k++) for(int i=0;i<N;i++) for(int j=0;j<N;j++) A[i][j]= A[i][j]+B[i][k]C[k][j] now the loop gets vectorized. It is mostly profitable to keep loop independent dependencies such as the above at the outermost possible level. We try to achieve the same here. karthikthecool: Hi Hal, Consider the below code of matrix multiplication- for(int i=0;i<N;i++) for(int…
				// moved outside.
				adjustLoopBranches();
				return true;
				}

				char LoopInterchange::ID = 0;
				INITIALIZE_PASS_BEGIN(LoopInterchange, "loop-interchange",
				"Interchanges loops for cache reuse", false, false)
				INITIALIZE_AG_DEPENDENCY(AliasAnalysis)
				INITIALIZE_PASS_DEPENDENCY(DependenceAnalysis)
				INITIALIZE_PASS_DEPENDENCY(DominatorTreeWrapperPass)
				INITIALIZE_PASS_DEPENDENCY(ScalarEvolution)
				INITIALIZE_PASS_DEPENDENCY(LoopSimplify)
				INITIALIZE_PASS_DEPENDENCY(LCSSA)
				INITIALIZE_PASS_DEPENDENCY(LoopInfoWrapperPass)

				INITIALIZE_PASS_END(LoopInterchange, "loop-interchange",
				"Interchanges loops for cache reuse", false, false)

				Pass *llvm::createLoopInterchangePass() { return new LoopInterchange(); }
				hfinkelUnsubmitted Not Done Reply Inline Actions No need for the { } hfinkel: No need for the { }
				karthikthecoolAuthorUnsubmitted Not Done Reply Inline Actions Done. karthikthecool: Done.

lib/Transforms/Scalar/Scalar.cpp

Show First 20 Lines • Show All 41 Lines • ▼ Show 20 Lines	void llvm::initializeScalarOpts(PassRegistry &Registry) {
initializeEarlyCSELegacyPassPass(Registry);		initializeEarlyCSELegacyPassPass(Registry);
initializeFlattenCFGPassPass(Registry);		initializeFlattenCFGPassPass(Registry);
initializeInductiveRangeCheckEliminationPass(Registry);		initializeInductiveRangeCheckEliminationPass(Registry);
initializeIndVarSimplifyPass(Registry);		initializeIndVarSimplifyPass(Registry);
initializeJumpThreadingPass(Registry);		initializeJumpThreadingPass(Registry);
initializeLICMPass(Registry);		initializeLICMPass(Registry);
initializeLoopDeletionPass(Registry);		initializeLoopDeletionPass(Registry);
initializeLoopInstSimplifyPass(Registry);		initializeLoopInstSimplifyPass(Registry);
		initializeLoopInterchangePass(Registry);
initializeLoopRotatePass(Registry);		initializeLoopRotatePass(Registry);
initializeLoopStrengthReducePass(Registry);		initializeLoopStrengthReducePass(Registry);
initializeLoopRerollPass(Registry);		initializeLoopRerollPass(Registry);
initializeLoopUnrollPass(Registry);		initializeLoopUnrollPass(Registry);
initializeLoopUnswitchPass(Registry);		initializeLoopUnswitchPass(Registry);
initializeLoopIdiomRecognizePass(Registry);		initializeLoopIdiomRecognizePass(Registry);
initializeLowerAtomicPass(Registry);		initializeLowerAtomicPass(Registry);
initializeLowerExpectIntrinsicPass(Registry);		initializeLowerExpectIntrinsicPass(Registry);
▲ Show 20 Lines • Show All 178 Lines • Show Last 20 Lines

test/Transforms/LoopInterchange/currentLimitation.ll

				; RUN: opt < %s -basicaa -loop-interchange -S \| FileCheck %s
				;; These are test that fail to interchange due to current limitation. This will go off once we extend the loop interchange pass.

				target datalayout = "e-m:e-i64:64-f80:128-n8:16:32:64-S128"
				target triple = "x86_64-unknown-linux-gnu"

				@A = common global [100 x [100 x i32]] zeroinitializer
				@B = common global [100 x [100 x [100 x i32]]] zeroinitializer

				;;--------------------------------------Test case 01------------------------------------
				;; [FIXME] This loop though valid is currently not interchanged due to the limitation that we cannot split the inner loop latch due to multiple use of inner induction
				;; variable.(used to increment the loop counter and to access A[j+1][i+1]
				;; for(int i=0;i<N-1;i++)
				;; for(int j=1;j<N-1;j++)
				;; A[j+1][i+1] = A[j+1][i+1]+k;

				define void @interchange_01(i32 %k, i32 %N) {
				entry:
				%sub = add nsw i32 %N, -1
				%cmp26 = icmp sgt i32 %N, 1
				br i1 %cmp26, label %for.cond1.preheader.lr.ph, label %for.end17

				for.cond1.preheader.lr.ph:
				%cmp324 = icmp sgt i32 %sub, 1
				%0 = add i32 %N, -2
				%1 = sext i32 %sub to i64
				br label %for.cond1.preheader

				for.cond.loopexit:
				%cmp = icmp slt i64 %indvars.iv.next29, %1
				br i1 %cmp, label %for.cond1.preheader, label %for.end17

				for.cond1.preheader:
				%indvars.iv28 = phi i64 [ 0, %for.cond1.preheader.lr.ph ], [ %indvars.iv.next29, %for.cond.loopexit ]
				%indvars.iv.next29 = add nuw nsw i64 %indvars.iv28, 1
				br i1 %cmp324, label %for.body4, label %for.cond.loopexit

				for.body4:
				%indvars.iv = phi i64 [ %indvars.iv.next, %for.body4 ], [ 1, %for.cond1.preheader ]
				%indvars.iv.next = add nuw nsw i64 %indvars.iv, 1
				%arrayidx7 = getelementptr inbounds [100 x [100 x i32]]* @A, i64 0, i64 %indvars.iv.next, i64 %indvars.iv.next29
				%2 = load i32* %arrayidx7
				%add8 = add nsw i32 %2, %k
				store i32 %add8, i32* %arrayidx7
				%lftr.wideiv = trunc i64 %indvars.iv to i32
				%exitcond = icmp eq i32 %lftr.wideiv, %0
				br i1 %exitcond, label %for.cond.loopexit, label %for.body4

				for.end17:
				ret void
				}
				;; Inner loop not split so it is not interchanged.
				; CHECK-LABEL: @interchange_01
				; CHECK: for.body4:
				; CHECK-NEXT: %indvars.iv = phi i64 [ %indvars.iv.next, %for.body4 ], [ 1, %for.body4.preheader ]
				; CHECK-NEXT: %indvars.iv.next = add nuw nsw i64 %indvars.iv, 1
				; CHECK-NEXT: %arrayidx7 = getelementptr inbounds [100 x [100 x i32]]* @A, i64 0, i64 %indvars.iv.next, i64 %indvars.iv.next29


				;;--------------------------------------Test case 02------------------------------------
				;; [FIXME] Currently reductions are not supported.
				;; for(int i=0;i<N-1;i++)
				;; for(int j=1;j<N-1;j++)
				;; k+=A[i][j];

				define void @interchange_02(i32 %k, i32 %N) {
				entry:
				%cmp17 = icmp sgt i32 %N, 0
				br i1 %cmp17, label %for.cond1.preheader.lr.ph, label %for.end8

				for.cond1.preheader.lr.ph:
				%cmp214 = icmp sgt i32 %N, 1
				%0 = add i32 %N, -1
				br label %for.cond1.preheader

				for.cond1.preheader:
				%indvars.iv20 = phi i64 [ 0, %for.cond1.preheader.lr.ph ], [ %indvars.iv.next21, %for.inc6 ]
				br i1 %cmp214, label %for.body3, label %for.inc6

				for.body3:
				%indvars.iv = phi i64 [ %indvars.iv.next, %for.body3 ], [ 1, %for.cond1.preheader ]
				%indvars.iv.next = add nuw nsw i64 %indvars.iv, 1
				%lftr.wideiv = trunc i64 %indvars.iv to i32
				%exitcond = icmp eq i32 %lftr.wideiv, %0
				br i1 %exitcond, label %for.inc6, label %for.body3

				for.inc6:
				%indvars.iv.next21 = add nuw nsw i64 %indvars.iv20, 1
				%lftr.wideiv22 = trunc i64 %indvars.iv20 to i32
				%exitcond23 = icmp eq i32 %lftr.wideiv22, %0
				br i1 %exitcond23, label %for.end8, label %for.cond1.preheader

				for.end8:
				ret void
				}
				;; Inner loop phi is not split so it is not interchanged.
				; CHECK-LABEL: @interchange_02
				; CHECK: for.body3:
				; CHECK-NEXT: %indvars.iv = phi i64 [ %indvars.iv.next, %for.body3 ], [ 1, %for.body3.preheader ]
				; CHECK-NEXT: %indvars.iv.next = add nuw nsw i64 %indvars.iv, 1
				; CHECK-NEXT: %lftr.wideiv = trunc i64 %indvars.iv to i32


				;;--------------------------------------Test case 03------------------------------------
				;; [FIXME] Currently loops of depth greater than 2 is not handled.
				;; for(int i=0;i<N;i++)
				;; for(int j=1;j<N;j++)
				;; for(int k=0;k<N;k++)
				;; B[k][j][i] = B[k][j][i]+M;

				define void @interchange_03(i32 %M, i32 %N) {
				entry:
				%cmp38 = icmp sgt i32 %N, 0
				br i1 %cmp38, label %for.cond1.preheader.lr.ph, label %for.end22

				for.cond1.preheader.lr.ph:
				%cmp236 = icmp sgt i32 %N, 1
				%0 = add i32 %N, -1
				br label %for.cond1.preheader

				for.cond1.preheader:
				%indvars.iv44 = phi i64 [ 0, %for.cond1.preheader.lr.ph ], [ %indvars.iv.next45, %for.inc20 ]
				br i1 %cmp236, label %for.cond4.preheader, label %for.inc20

				for.cond4.preheader:
				%indvars.iv40 = phi i64 [ %indvars.iv.next41, %for.inc17 ], [ 1, %for.cond1.preheader ]
				br label %for.body6

				for.body6:
				%indvars.iv = phi i64 [ %indvars.iv.next, %for.body6 ], [ 0, %for.cond4.preheader ]
				%arrayidx10 = getelementptr inbounds [100 x [100 x [100 x i32]]]* @B, i64 0, i64 %indvars.iv, i64 %indvars.iv40, i64 %indvars.iv44
				%1 = load i32* %arrayidx10
				%add = add nsw i32 %1, %M
				store i32 %add, i32* %arrayidx10
				%indvars.iv.next = add nuw nsw i64 %indvars.iv, 1
				%lftr.wideiv = trunc i64 %indvars.iv to i32
				%exitcond = icmp eq i32 %lftr.wideiv, %0
				br i1 %exitcond, label %for.inc17, label %for.body6

				for.inc17: ; preds = %for.body6
				%indvars.iv.next41 = add nuw nsw i64 %indvars.iv40, 1
				%lftr.wideiv42 = trunc i64 %indvars.iv40 to i32
				%exitcond43 = icmp eq i32 %lftr.wideiv42, %0
				br i1 %exitcond43, label %for.inc20, label %for.cond4.preheader

				for.inc20: ; preds = %for.inc17, %for.cond1.preheader
				%indvars.iv.next45 = add nuw nsw i64 %indvars.iv44, 1
				%lftr.wideiv46 = trunc i64 %indvars.iv44 to i32
				%exitcond47 = icmp eq i32 %lftr.wideiv46, %0
				br i1 %exitcond47, label %for.end22, label %for.cond1.preheader

				for.end22: ; preds = %for.inc20, %entry
				ret void
				}

				;; Inner loop phi is not split so it is not interchanged.
				; CHECK-LABEL: @interchange_03
				; CHECK: for.body6:
				; CHECK: %indvars.iv = phi i64 [ %indvars.iv.next, %for.body6 ], [ 0, %for.cond4.preheader ]
				; CHECK: %arrayidx10 = getelementptr inbounds [100 x [100 x [100 x i32]]]* @B, i64 0, i64 %indvars.iv, i64 %indvars.iv40, i64 %indvars.iv44
				; CHECK: %1 = load i32* %arrayidx10

test/Transforms/LoopInterchange/interchange.ll

				; RUN: opt < %s -basicaa -loop-interchange -S \| FileCheck %s
				;; We test the complete .ll for adjustment in outer loop header/latch and inner loop header/latch.

				target datalayout = "e-m:e-i64:64-f80:128-n8:16:32:64-S128"
				target triple = "x86_64-unknown-linux-gnu"

				@A = common global [100 x [100 x i32]] zeroinitializer
				@B = common global [100 x i32] zeroinitializer
				@C = common global [100 x [100 x i32]] zeroinitializer
				declare void @foo(...)

				;;--------------------------------------Test case 01------------------------------------
				;; for(int i=0;i<N;i++)
				;; for(int j=1;j<N;j++)
				;; A[j][i] = A[j][i]+k;

				define void @interchange_01(i32 %k, i32 %N) {
				entry:
				%cmp21 = icmp sgt i32 %N, 0
				br i1 %cmp21, label %for.cond1.preheader.lr.ph, label %for.end12

				for.cond1.preheader.lr.ph: ; preds = %entry
				%cmp219 = icmp sgt i32 %N, 1
				%0 = add i32 %N, -1
				br label %for.cond1.preheader

				for.cond1.preheader: ; preds = %for.inc10, %for.cond1.preheader.lr.ph
				%indvars.iv23 = phi i64 [ 0, %for.cond1.preheader.lr.ph ], [ %indvars.iv.next24, %for.inc10 ]
				br i1 %cmp219, label %for.body3, label %for.inc10

				for.body3: ; preds = %for.cond1.preheader, %for.body3
				%indvars.iv = phi i64 [ %indvars.iv.next, %for.body3 ], [ 1, %for.cond1.preheader ]
				%arrayidx5 = getelementptr inbounds [100 x [100 x i32]]* @A, i64 0, i64 %indvars.iv, i64 %indvars.iv23
				%1 = load i32* %arrayidx5
				%add = add nsw i32 %1, %k
				store i32 %add, i32* %arrayidx5
				%indvars.iv.next = add nuw nsw i64 %indvars.iv, 1
				%lftr.wideiv = trunc i64 %indvars.iv to i32
				%exitcond = icmp eq i32 %lftr.wideiv, %0
				br i1 %exitcond, label %for.inc10, label %for.body3

				for.inc10: ; preds = %for.body3, %for.cond1.preheader
				%indvars.iv.next24 = add nuw nsw i64 %indvars.iv23, 1
				%lftr.wideiv25 = trunc i64 %indvars.iv23 to i32
				%exitcond26 = icmp eq i32 %lftr.wideiv25, %0
				br i1 %exitcond26, label %for.end12, label %for.cond1.preheader

				for.end12: ; preds = %for.inc10, %entry
				ret void
				}

				; CHECK-LABEL: @interchange_01
				; CHECK: entry:
				; CHECK: %cmp21 = icmp sgt i32 %N, 0
				; CHECK: br i1 %cmp21, label %for.cond1.preheader.lr.ph, label %for.end12

				; CHECK: for.cond1.preheader.lr.ph
				; CHECK: %cmp219 = icmp sgt i32 %N, 1
				; CHECK: %0 = add i32 %N, -1
				; CHECK: br i1 %cmp219, label %for.body3.preheader, label %for.inc10

				; CHECK: for.cond1.preheader
				; CHECK: %indvars.iv23 = phi i64 [ 0, %for.body3 ], [ %indvars.iv.next24, %for.inc10.split ]
				; CHECK:br label %for.body3.split1

				; CHECK:for.body3.preheader:
				; CHECK:br label %for.body3

				; CHECK: for.body3:
				; CHECK: %indvars.iv = phi i64 [ %indvars.iv.next, %for.body3.split ], [ 1, %for.body3.preheader ]
				; CHECK: br label %for.cond1.preheader

				; CHECK: for.body3.split1:
				; CHECK: %arrayidx5 = getelementptr inbounds [100 x [100 x i32]]* @A, i64 0, i64 %indvars.iv, i64 %indvars.iv23
				; CHECK: %1 = load i32* %arrayidx5
				; CHECK: %add = add nsw i32 %1, %k
				; CHECK: store i32 %add, i32* %arrayidx5
				; CHECK: br label %for.inc10.loopexit

				; CHECK: for.body3.split:
				; CHECK: %indvars.iv.next = add nuw nsw i64 %indvars.iv, 1
				; CHECK: %lftr.wideiv = trunc i64 %indvars.iv to i32
				; CHECK: %exitcond = icmp eq i32 %lftr.wideiv, %0
				; CHECK: br i1 %exitcond, label %for.inc10, label %for.body3

				; CHECK: for.inc10.loopexit:
				; CHECK: br label %for.inc10.split

				; CHECK: for.inc10:
				; CHECK: br label %for.end12.loopexit

				; CHECK: for.inc10.split:
				; CHECK: %indvars.iv.next24 = add nuw nsw i64 %indvars.iv23, 1
				; CHECK: %lftr.wideiv25 = trunc i64 %indvars.iv23 to i32
				; CHECK: %exitcond26 = icmp eq i32 %lftr.wideiv25, %0
				; CHECK: br i1 %exitcond26, label %for.body3.split, label %for.cond1.preheader

				; CHECK: for.end12.loopexit:
				; CHECK: br label %for.end12
				; CHECK: for.end12:
				; CHECK: ret void

				;;--------------------------------------Test case 02-------------------------------------

				;; for(int i=0;i<100;i++)
				;; for(int j=100;j>=0;j--)
				;; A[j][i] = A[j][i]+k;

				define void @interchange_02(i32 %k) {
				entry:
				br label %for.cond1.preheader

				for.cond1.preheader: ; preds = %for.inc10, %entry
				%indvars.iv19 = phi i64 [ 0, %entry ], [ %indvars.iv.next20, %for.inc10 ]
				br label %for.body3

				for.body3: ; preds = %for.cond1.preheader, %for.body3
				%indvars.iv = phi i64 [ 100, %for.cond1.preheader ], [ %indvars.iv.next, %for.body3 ]
				%arrayidx5 = getelementptr inbounds [100 x [100 x i32]]* @A, i64 0, i64 %indvars.iv, i64 %indvars.iv19
				%0 = load i32* %arrayidx5
				%add = add nsw i32 %0, %k
				store i32 %add, i32* %arrayidx5
				%indvars.iv.next = add nsw i64 %indvars.iv, -1
				%cmp2 = icmp sgt i64 %indvars.iv, 0
				br i1 %cmp2, label %for.body3, label %for.inc10

				for.inc10: ; preds = %for.body3
				%indvars.iv.next20 = add nuw nsw i64 %indvars.iv19, 1
				%exitcond = icmp eq i64 %indvars.iv.next20, 100
				br i1 %exitcond, label %for.end11, label %for.cond1.preheader

				for.end11: ; preds = %for.inc10
				ret void
				}

				; CHECK-LABEL: @interchange_02
				; CHECK: entry:
				; CHECK: br label %for.body3.preheader

				; CHECK: for.cond1.preheader:
				; CHECK: %indvars.iv19 = phi i64 [ 0, %for.body3 ], [ %indvars.iv.next20, %for.inc10.split ]
				; CHECK: br label %for.body3.split1
				; CHECK: for.body3.preheader:
				; CHECK: br label %for.body3

				; CHECK: for.body3:
				; CHECK: %indvars.iv = phi i64 [ %indvars.iv.next, %for.body3.split ], [ 100, %for.body3.preheader ]
				; CHECK: br label %for.cond1.preheader

				; CHECK: for.body3.split1:
				; CHECK: %arrayidx5 = getelementptr inbounds [100 x [100 x i32]]* @A, i64 0, i64 %indvars.iv, i64 %indvars.iv19
				; CHECK: %0 = load i32* %arrayidx5
				; CHECK: %add = add nsw i32 %0, %k
				; CHECK: store i32 %add, i32* %arrayidx5
				; CHECK: br label %for.inc10.split

				; CHECK: for.body3.split:
				; CHECK: %indvars.iv.next = add nsw i64 %indvars.iv, -1
				; CHECK: %cmp2 = icmp sgt i64 %indvars.iv, 0
				; CHECK: br i1 %cmp2, label %for.body3, label %for.inc10

				; CHECK: for.inc10:
				; CHECK: br label %for.end11

				; CHECK: for.inc10.split:
				; CHECK: %indvars.iv.next20 = add nuw nsw i64 %indvars.iv19, 1
				; CHECK: %exitcond = icmp eq i64 %indvars.iv.next20, 100
				; CHECK: br i1 %exitcond, label %for.body3.split, label %for.cond1.preheader

				; CHECK: for.end11:
				; CHECK: ret void


				;;--------------------------------------Test case 03-------------------------------------
				;; Loops should not be interchanged in this case as it is not profitable.
				;; for(int i=0;i<100;i++)
				;; for(int j=0;j<100;j++)
				;; A[i][j] = A[i][j]+k;

				define void @interchange_03(i32 %k) {
				entry:
				br label %for.cond1.preheader

				for.cond1.preheader:
				%indvars.iv21 = phi i64 [ 0, %entry ], [ %indvars.iv.next22, %for.inc10 ]
				br label %for.body3

				for.body3:
				%indvars.iv = phi i64 [ 0, %for.cond1.preheader ], [ %indvars.iv.next, %for.body3 ]
				%arrayidx5 = getelementptr inbounds [100 x [100 x i32]]* @A, i64 0, i64 %indvars.iv21, i64 %indvars.iv
				%0 = load i32* %arrayidx5
				%add = add nsw i32 %0, %k
				store i32 %add, i32* %arrayidx5
				%indvars.iv.next = add nuw nsw i64 %indvars.iv, 1
				%exitcond = icmp eq i64 %indvars.iv.next, 100
				br i1 %exitcond, label %for.inc10, label %for.body3

				for.inc10:
				%indvars.iv.next22 = add nuw nsw i64 %indvars.iv21, 1
				%exitcond23 = icmp eq i64 %indvars.iv.next22, 100
				br i1 %exitcond23, label %for.end12, label %for.cond1.preheader

				for.end12:
				ret void
				}

				; CHECK-LABEL: @interchange_03
				; CHECK: entry:
				; CHECK: br label %for.cond1.preheader

				; CHECK: for.cond1.preheader:
				; CHECK: %indvars.iv21 = phi i64 [ 0, %entry ], [ %indvars.iv.next22, %for.inc10 ]
				; CHECK: br label %for.body3

				; CHECK: for.body3:
				; CHECK: %indvars.iv = phi i64 [ %indvars.iv.next, %for.body3 ], [ 0, %for.body3.preheader ]
				; CHECK: %arrayidx5 = getelementptr inbounds [100 x [100 x i32]]* @A, i64 0, i64 %indvars.iv21, i64 %indvars.iv
				; CHECK: %0 = load i32* %arrayidx5
				; CHECK: %add = add nsw i32 %0, %k
				; CHECK: store i32 %add, i32* %arrayidx5
				; CHECK: %indvars.iv.next = add nuw nsw i64 %indvars.iv, 1
				; CHECK: %exitcond = icmp eq i64 %indvars.iv.next, 100
				; CHECK: br i1 %exitcond, label %for.inc10, label %for.body3

				; CHECK: for.inc10:
				; CHECK: %indvars.iv.next22 = add nuw nsw i64 %indvars.iv21, 1
				; CHECK: %exitcond23 = icmp eq i64 %indvars.iv.next22, 100
				; CHECK: br i1 %exitcond23, label %for.end12, label %for.cond1.preheader

				; CHECK: for.end12:
				; CHECK: ret void


				;;--------------------------------------Test case 04-------------------------------------
				;; Loops should not be interchanged in this case as it is not legal due to dependency.
				;; for(int j=0;j<99;j++)
				;; for(int i=0;i<99;i++)
				;; A[j][i+1] = A[j+1][i]+k;

				define void @interchange_04(i32 %k){
				entry:
				br label %for.cond1.preheader

				for.cond1.preheader:
				%indvars.iv23 = phi i64 [ 0, %entry ], [ %indvars.iv.next24, %for.inc12 ]
				%indvars.iv.next24 = add nuw nsw i64 %indvars.iv23, 1
				br label %for.body3

				for.body3:
				%indvars.iv = phi i64 [ 0, %for.cond1.preheader ], [ %indvars.iv.next, %for.body3 ]
				%arrayidx5 = getelementptr inbounds [100 x [100 x i32]]* @A, i64 0, i64 %indvars.iv.next24, i64 %indvars.iv
				%0 = load i32* %arrayidx5
				%add6 = add nsw i32 %0, %k
				%indvars.iv.next = add nuw nsw i64 %indvars.iv, 1
				%arrayidx11 = getelementptr inbounds [100 x [100 x i32]]* @A, i64 0, i64 %indvars.iv23, i64 %indvars.iv.next
				store i32 %add6, i32* %arrayidx11
				%exitcond = icmp eq i64 %indvars.iv.next, 99
				br i1 %exitcond, label %for.inc12, label %for.body3

				for.inc12:
				%exitcond25 = icmp eq i64 %indvars.iv.next24, 99
				br i1 %exitcond25, label %for.end14, label %for.cond1.preheader

				for.end14:
				ret void
				}

				; CHECK-LABEL: @interchange_04
				; CHECK: entry:
				; CHECK: br label %for.cond1.preheader

				; CHECK: for.cond1.preheader:
				; CHECK: %indvars.iv23 = phi i64 [ 0, %entry ], [ %indvars.iv.next24, %for.inc12 ]
				; CHECK: %indvars.iv.next24 = add nuw nsw i64 %indvars.iv23, 1
				; CHECK: br label %for.body3.preheader

				; CHECK: for.body3.preheader:
				; CHECK: br label %for.body3

				; CHECK: for.body3:
				; CHECK: %indvars.iv = phi i64 [ %indvars.iv.next, %for.body3 ], [ 0, %for.body3.preheader ]
				; CHECK: %arrayidx5 = getelementptr inbounds [100 x [100 x i32]]* @A, i64 0, i64 %indvars.iv.next24, i64 %indvars.iv
				; CHECK: %0 = load i32* %arrayidx5
				; CHECK: %add6 = add nsw i32 %0, %k
				; CHECK: %indvars.iv.next = add nuw nsw i64 %indvars.iv, 1
				; CHECK: %arrayidx11 = getelementptr inbounds [100 x [100 x i32]]* @A, i64 0, i64 %indvars.iv23, i64 %indvars.iv.next
				; CHECK: store i32 %add6, i32* %arrayidx11
				; CHECK: %exitcond = icmp eq i64 %indvars.iv.next, 99
				; CHECK: br i1 %exitcond, label %for.inc12, label %for.body3

				; CHECK: for.inc12:
				; CHECK: %exitcond25 = icmp eq i64 %indvars.iv.next24, 99
				; CHECK: br i1 %exitcond25, label %for.end14, label %for.cond1.preheader

				; CHECK: for.end14:
				; CHECK: ret void



				;;--------------------------------------Test case 05-------------------------------------
				;; Loops not tightly nested are not interchanged
				;; for(int j=0;j<N;j++) {
				;; B[j] = j+k;
				;; for(int i=0;i<N;i++)
				;; A[j][i] = A[j][i]+B[j];
				;; }

				define void @interchange_05(i32 %k, i32 %N){
				entry:
				%cmp30 = icmp sgt i32 %N, 0
				br i1 %cmp30, label %for.body.lr.ph, label %for.end17

				for.body.lr.ph:
				%0 = add i32 %N, -1
				%1 = zext i32 %k to i64
				br label %for.body

				for.body:
				%indvars.iv32 = phi i64 [ 0, %for.body.lr.ph ], [ %indvars.iv.next33, %for.inc15 ]
				%2 = add nsw i64 %indvars.iv32, %1
				%arrayidx = getelementptr inbounds [100 x i32]* @B, i64 0, i64 %indvars.iv32
				%3 = trunc i64 %2 to i32
				store i32 %3, i32* %arrayidx
				br label %for.body3

				for.body3:
				%indvars.iv = phi i64 [ 0, %for.body ], [ %indvars.iv.next, %for.body3 ]
				%arrayidx7 = getelementptr inbounds [100 x [100 x i32]]* @A, i64 0, i64 %indvars.iv32, i64 %indvars.iv
				%4 = load i32* %arrayidx7
				%add10 = add nsw i32 %3, %4
				store i32 %add10, i32* %arrayidx7
				%indvars.iv.next = add nuw nsw i64 %indvars.iv, 1
				%lftr.wideiv = trunc i64 %indvars.iv to i32
				%exitcond = icmp eq i32 %lftr.wideiv, %0
				br i1 %exitcond, label %for.inc15, label %for.body3

				for.inc15:
				%indvars.iv.next33 = add nuw nsw i64 %indvars.iv32, 1
				%lftr.wideiv35 = trunc i64 %indvars.iv32 to i32
				%exitcond36 = icmp eq i32 %lftr.wideiv35, %0
				br i1 %exitcond36, label %for.end17, label %for.body

				for.end17:
				ret void
				}

				; CHECK-LABEL: @interchange_05
				; CHECK: entry:
				; CHECK: %cmp30 = icmp sgt i32 %N, 0
				; CHECK: br i1 %cmp30, label %for.body.lr.ph, label %for.end17

				; CHECK: for.body.lr.ph:
				; CHECK: %0 = add i32 %N, -1
				; CHECK: %1 = zext i32 %k to i64
				; CHECK: br label %for.body

				; CHECK: for.body:
				; CHECK: %indvars.iv32 = phi i64 [ 0, %for.body.lr.ph ], [ %indvars.iv.next33, %for.inc15 ]
				; CHECK: %2 = add nsw i64 %indvars.iv32, %1
				; CHECK: %arrayidx = getelementptr inbounds [100 x i32]* @B, i64 0, i64 %indvars.iv32
				; CHECK: %3 = trunc i64 %2 to i32
				; CHECK: store i32 %3, i32* %arrayidx
				; CHECK: br label %for.body3.preheader

				; CHECK: for.body3.preheader:
				; CHECK: br label %for.body3

				; CHECK: for.body3:
				; CHECK: %indvars.iv = phi i64 [ %indvars.iv.next, %for.body3 ], [ 0, %for.body3.preheader ]
				; CHECK: %arrayidx7 = getelementptr inbounds [100 x [100 x i32]]* @A, i64 0, i64 %indvars.iv32, i64 %indvars.iv
				; CHECK: %4 = load i32* %arrayidx7
				; CHECK: %add10 = add nsw i32 %3, %4
				; CHECK: store i32 %add10, i32* %arrayidx7
				; CHECK: %indvars.iv.next = add nuw nsw i64 %indvars.iv, 1
				; CHECK: %lftr.wideiv = trunc i64 %indvars.iv to i32
				; CHECK: %exitcond = icmp eq i32 %lftr.wideiv, %0
				; CHECK: br i1 %exitcond, label %for.inc15, label %for.body3

				; CHECK: for.inc15:
				; CHECK: %indvars.iv.next33 = add nuw nsw i64 %indvars.iv32, 1
				; CHECK: %lftr.wideiv35 = trunc i64 %indvars.iv32 to i32
				; CHECK: %exitcond36 = icmp eq i32 %lftr.wideiv35, %0
				; CHECK: br i1 %exitcond36, label %for.end17.loopexit, label %for.body

				; CHECK: for.end17.loopexit:
				; CHECK: br label %for.end17

				; CHECK: for.end17:
				; CHECK: ret void


				;;--------------------------------------Test case 06-------------------------------------
				;; Loops not tightly nested are not interchanged
				;; for(int j=0;j<N;j++) {
				;; foo();
				;; for(int i=2;i<N;i++)
				;; A[j][i] = A[j][i]+k;
				;; }

				define void @interchange_06(i32 %k, i32 %N) {
				entry:
				%cmp22 = icmp sgt i32 %N, 0
				br i1 %cmp22, label %for.body.lr.ph, label %for.end12

				for.body.lr.ph:
				%0 = add i32 %N, -1
				br label %for.body

				for.body:
				%indvars.iv24 = phi i64 [ 0, %for.body.lr.ph ], [ %indvars.iv.next25, %for.inc10 ]
				tail call void (...)* @foo()
				br label %for.body3

				for.body3:
				%indvars.iv = phi i64 [ %indvars.iv.next, %for.body3 ], [ 2, %for.body ]
				%arrayidx5 = getelementptr inbounds [100 x [100 x i32]]* @A, i64 0, i64 %indvars.iv24, i64 %indvars.iv
				%1 = load i32* %arrayidx5
				%add = add nsw i32 %1, %k
				store i32 %add, i32* %arrayidx5
				%indvars.iv.next = add nuw nsw i64 %indvars.iv, 1
				%lftr.wideiv = trunc i64 %indvars.iv to i32
				%exitcond = icmp eq i32 %lftr.wideiv, %0
				br i1 %exitcond, label %for.inc10, label %for.body3

				for.inc10:
				%indvars.iv.next25 = add nuw nsw i64 %indvars.iv24, 1
				%lftr.wideiv26 = trunc i64 %indvars.iv24 to i32
				%exitcond27 = icmp eq i32 %lftr.wideiv26, %0
				br i1 %exitcond27, label %for.end12, label %for.body

				for.end12:
				ret void
				}
				;; Here we are checking if the inner phi is not split then we have not interchanged.
				; CHECK-LABEL: @interchange_06
				; CHECK: phi i64 [ %indvars.iv.next, %for.body3 ], [ 2, %for.body3.preheader ]
				; CHECK-NEXT: getelementptr
				; CHECK-NEXT: %1 = load

				;;--------------------------------------Test case 07-------------------------------------
				;; Test for interchange when we have an lcssa phi.
				;; for(gi=1;gi<N;gi++)
				;; for(gj=1;gj<M;gj++)
				;; A[gj][gi] = A[gj - 1][gi] + C[gj][gi];

				@gi = common global i32 0
				@gj = common global i32 0

				define void @interchange_07(i32 %N, i32 %M){
				entry:
				store i32 1, i32* @gi
				%cmp21 = icmp sgt i32 %N, 1
				br i1 %cmp21, label %for.cond1.preheader.lr.ph, label %for.end16

				for.cond1.preheader.lr.ph: ; preds = %entry
				%cmp218 = icmp sgt i32 %M, 1
				%gi.promoted = load i32* @gi
				%0 = add i32 %M, -1
				%1 = sext i32 %gi.promoted to i64
				%2 = sext i32 %N to i64
				%3 = add i32 %gi.promoted, 1
				%4 = icmp slt i32 %3, %N
				%smax = select i1 %4, i32 %N, i32 %3
				br label %for.cond1.preheader

				for.cond1.preheader: ; preds = %for.cond1.preheader.lr.ph, %for.inc14
				%indvars.iv25 = phi i64 [ %1, %for.cond1.preheader.lr.ph ], [ %indvars.iv.next26, %for.inc14 ]
				br i1 %cmp218, label %for.body3, label %for.inc14

				for.body3: ; preds = %for.cond1.preheader, %for.body3
				%indvars.iv = phi i64 [ %indvars.iv.next, %for.body3 ], [ 1, %for.cond1.preheader ]
				%5 = add nsw i64 %indvars.iv, -1
				%arrayidx5 = getelementptr inbounds [100 x [100 x i32]]* @A, i64 0, i64 %5, i64 %indvars.iv25
				%6 = load i32* %arrayidx5
				%arrayidx9 = getelementptr inbounds [100 x [100 x i32]]* @C, i64 0, i64 %indvars.iv, i64 %indvars.iv25
				%7 = load i32* %arrayidx9
				%add = add nsw i32 %7, %6
				%arrayidx13 = getelementptr inbounds [100 x [100 x i32]]* @A, i64 0, i64 %indvars.iv, i64 %indvars.iv25
				store i32 %add, i32* %arrayidx13
				%indvars.iv.next = add nuw nsw i64 %indvars.iv, 1
				%lftr.wideiv = trunc i64 %indvars.iv to i32
				%exitcond = icmp eq i32 %lftr.wideiv, %0
				br i1 %exitcond, label %for.inc14, label %for.body3

				for.inc14: ; preds = %for.body3, %for.cond1.preheader
				%inc.lcssa23 = phi i32 [ 1, %for.cond1.preheader ], [ %M, %for.body3 ]
				%indvars.iv.next26 = add nsw i64 %indvars.iv25, 1
				%cmp = icmp slt i64 %indvars.iv.next26, %2
				br i1 %cmp, label %for.cond1.preheader, label %for.cond.for.end16_crit_edge

				for.cond.for.end16_crit_edge: ; preds = %for.inc14
				store i32 %inc.lcssa23, i32* @gj
				store i32 %smax, i32* @gi
				br label %for.end16

				for.end16: ; preds = %for.cond.for.end16_crit_edge, %entry
				ret void
				}

				;; Check that loops are interchanged and lcssa phi are split out properly.
				; CHECK-LABEL: @interchange_07
				; CHECK: for.cond1.preheader.lr.ph:
				; CHECK: %cmp218 = icmp sgt i32 %M, 1
				; CHECK: %gi.promoted = load i32* @gi
				; CHECK: %0 = add i32 %M, -1
				; CHECK: %1 = sext i32 %gi.promoted to i64
				; CHECK: %2 = sext i32 %N to i64
				; CHECK: %3 = add i32 %gi.promoted, 1
				; CHECK: %4 = icmp slt i32 %3, %N
				; CHECK: %smax = select i1 %4, i32 %N, i32 %3
				; CHECK: br i1 %cmp218, label %for.body3.preheader, label %for.inc14
				; CHECK: for.cond1.preheader:
				; CHECK: %indvars.iv25 = phi i64 [ %1, %for.body3 ], [ %indvars.iv.next26, %for.inc14.split ]
				; CHECK: br label %for.body3.split1
				; CHECK: for.body3.preheader:
				; CHECK: br label %for.body3
				; CHECK: for.body3:
				; CHECK: %indvars.iv = phi i64 [ %indvars.iv.next, %for.body3.split ], [ 1, %for.body3.preheader ]
				; CHECK: br label %for.cond1.preheader
				; CHECK: for.body3.split1: ; preds = %for.cond1.preheader
				; CHECK: %5 = add nsw i64 %indvars.iv, -1
				; CHECK: %arrayidx5 = getelementptr inbounds [100 x [100 x i32]]* @A, i64 0, i64 %5, i64 %indvars.iv25
				; CHECK: %6 = load i32* %arrayidx5
				; CHECK: %arrayidx9 = getelementptr inbounds [100 x [100 x i32]]* @C, i64 0, i64 %indvars.iv, i64 %indvars.iv25
				; CHECK: %7 = load i32* %arrayidx9
				; CHECK: %add = add nsw i32 %7, %6
				; CHECK: %arrayidx13 = getelementptr inbounds [100 x [100 x i32]]* @A, i64 0, i64 %indvars.iv, i64 %indvars.iv25
				; CHECK: store i32 %add, i32* %arrayidx13
				; CHECK: br label %for.inc14.loopexit
				; CHECK: for.body3.split:
				; CHECK: %indvars.iv.next = add nuw nsw i64 %indvars.iv, 1
				; CHECK: %lftr.wideiv = trunc i64 %indvars.iv to i32
				; CHECK: %exitcond = icmp eq i32 %lftr.wideiv, %0
				; CHECK: br i1 %exitcond, label %for.inc14, label %for.body3
				; CHECK: for.inc14.loopexit:
				; CHECK: br label %for.inc14.split
				; CHECK: for.inc14:
				; CHECK: %inc.lcssa23 = phi i32 [ 1, %for.cond1.preheader.lr.ph ], [ %M, %for.body3.split ]
				; CHECK: br label %for.cond.for.end16_crit_edge
				; CHECK: for.inc14.split:
				; CHECK: %indvars.iv.next26 = add nsw i64 %indvars.iv25, 1
				; CHECK: %cmp = icmp slt i64 %indvars.iv.next26, %2
				; CHECK: br i1 %cmp, label %for.cond1.preheader, label %for.body3.split
				; CHECK: for.cond.for.end16_crit_edge:
				; CHECK: %inc.lcssa23.lcssa = phi i32 [ %inc.lcssa23, %for.inc14 ]
				; CHECK: store i32 %inc.lcssa23.lcssa, i32* @gj
				; CHECK: store i32 %smax, i32* @gi
				; CHECK: br label %for.end16

test/Transforms/LoopInterchange/profitability.ll

				; RUN: opt < %s -basicaa -loop-interchange -S \| FileCheck %s
				;; We test profitability model in these test cases.


				@A = common global [100 x [100 x i32]] zeroinitializer
				@B = common global [100 x [100 x i32]] zeroinitializer

				;;---------------------------------------Test case 01---------------------------------
				;; Loops interchange will result in code vectorization and hence profitable. Check for interchange.
				;; for(int i=1;i<N;i++)
				;; for(int j=1;j<N;j++)
				;; A[j][i] = A[j - 1][i] + B[j][i];

				define void @interchange_01(i32 %N) {
				entry:
				%cmp27 = icmp sgt i32 %N, 1
				br i1 %cmp27, label %for.cond1.preheader.lr.ph, label %for.end16

				for.cond1.preheader.lr.ph:
				%0 = add i32 %N, -1
				br label %for.body3.preheader

				for.body3.preheader:
				%indvars.iv30 = phi i64 [ 1, %for.cond1.preheader.lr.ph ], [ %indvars.iv.next31, %for.inc14 ]
				br label %for.body3

				for.body3:
				%indvars.iv = phi i64 [ %indvars.iv.next, %for.body3 ], [ 1, %for.body3.preheader ]
				%1 = add nsw i64 %indvars.iv, -1
				%arrayidx5 = getelementptr inbounds [100 x [100 x i32]]* @A, i64 0, i64 %1, i64 %indvars.iv30
				%2 = load i32* %arrayidx5
				%arrayidx9 = getelementptr inbounds [100 x [100 x i32]]* @B, i64 0, i64 %indvars.iv, i64 %indvars.iv30
				%3 = load i32* %arrayidx9
				%add = add nsw i32 %3, %2
				%arrayidx13 = getelementptr inbounds [100 x [100 x i32]]* @A, i64 0, i64 %indvars.iv, i64 %indvars.iv30
				store i32 %add, i32* %arrayidx13
				%indvars.iv.next = add nuw nsw i64 %indvars.iv, 1
				%lftr.wideiv = trunc i64 %indvars.iv to i32
				%exitcond = icmp eq i32 %lftr.wideiv, %0
				br i1 %exitcond, label %for.inc14, label %for.body3

				for.inc14:
				%indvars.iv.next31 = add nuw nsw i64 %indvars.iv30, 1
				%lftr.wideiv32 = trunc i64 %indvars.iv30 to i32
				%exitcond33 = icmp eq i32 %lftr.wideiv32, %0
				br i1 %exitcond33, label %for.end16, label %for.body3.preheader

				for.end16:
				ret void
				}
				;; Here we are checking partial .ll to check if loop are interchanged.
				; CHECK-LABEL: @interchange_01
				; CHECK: for.body3.preheader: ; preds = %for.body3, %for.inc14.split
				; CHECK: %indvars.iv30 = phi i64 [ 1, %for.body3 ], [ %indvars.iv.next31, %for.inc14.split ]
				; CHECK: br label %for.body3.split2

				; CHECK: for.body3.preheader1:
				; CHECK: br label %for.body3

				; CHECK: for.body3:
				; CHECK: %indvars.iv = phi i64 [ %indvars.iv.next, %for.body3.split ], [ 1, %for.body3.preheader1 ]
				; CHECK: br label %for.body3.preheader

				; CHECK: for.body3.split2:
				; CHECK: %1 = add nsw i64 %indvars.iv, -1
				; CHECK: %arrayidx5 = getelementptr inbounds [100 x [100 x i32]]* @A, i64 0, i64 %1, i64 %indvars.iv30
				; CHECK: %2 = load i32* %arrayidx5
				; CHECK: %arrayidx9 = getelementptr inbounds [100 x [100 x i32]]* @B, i64 0, i64 %indvars.iv, i64 %indvars.iv30
				; CHECK: %3 = load i32* %arrayidx9
				; CHECK: %add = add nsw i32 %3, %2
				; CHECK: %arrayidx13 = getelementptr inbounds [100 x [100 x i32]]* @A, i64 0, i64 %indvars.iv, i64 %indvars.iv30
				; CHECK: store i32 %add, i32* %arrayidx13
				; CHECK: br label %for.inc14.split

				;; ---------------------------------------Test case 02---------------------------------
				;; Check loop interchange profitability model.
				;; This tests profitability model when operands of getelementpointer and not exactly the induction variable but some
				;; arithmetic operation on them.
				;; for(int i=1;i<N;i++)
				;; for(int j=1;j<N;j++)
				;; A[j-1][i-1] = A[j - 1][i-1] + B[j-1][i-1];

				define void @interchange_02(i32 %N) {
				entry:
				%cmp32 = icmp sgt i32 %N, 1
				br i1 %cmp32, label %for.cond1.preheader.lr.ph, label %for.end21

				for.cond1.preheader.lr.ph:
				%0 = add i32 %N, -1
				br label %for.body3.lr.ph

				for.body3.lr.ph:
				%indvars.iv35 = phi i64 [ 1, %for.cond1.preheader.lr.ph ], [ %indvars.iv.next36, %for.inc19 ]
				%1 = add nsw i64 %indvars.iv35, -1
				br label %for.body3

				for.body3:
				%indvars.iv = phi i64 [ 1, %for.body3.lr.ph ], [ %indvars.iv.next, %for.body3 ]
				%2 = add nsw i64 %indvars.iv, -1
				%arrayidx6 = getelementptr inbounds [100 x [100 x i32]]* @A, i64 0, i64 %2, i64 %1
				%3 = load i32* %arrayidx6
				%arrayidx12 = getelementptr inbounds [100 x [100 x i32]]* @B, i64 0, i64 %2, i64 %1
				%4 = load i32* %arrayidx12
				%add = add nsw i32 %4, %3
				store i32 %add, i32* %arrayidx6
				%indvars.iv.next = add nuw nsw i64 %indvars.iv, 1
				%lftr.wideiv = trunc i64 %indvars.iv to i32
				%exitcond = icmp eq i32 %lftr.wideiv, %0
				br i1 %exitcond, label %for.inc19, label %for.body3

				for.inc19:
				%indvars.iv.next36 = add nuw nsw i64 %indvars.iv35, 1
				%lftr.wideiv38 = trunc i64 %indvars.iv35 to i32
				%exitcond39 = icmp eq i32 %lftr.wideiv38, %0
				br i1 %exitcond39, label %for.end21, label %for.body3.lr.ph

				for.end21:
				ret void
				}
				; CHECK-LABEL: @interchange_02
				; CHECK: for.body3.lr.ph:
				; CHECK: %indvars.iv35 = phi i64 [ 1, %for.body3 ], [ %indvars.iv.next36, %for.inc19.split ]
				; CHECK: %1 = add nsw i64 %indvars.iv35, -1
				; CHECK: br label %for.body3.split1
				; CHECK: for.body3.preheader:
				; CHECK: br label %for.body3
				; CHECK: for.body3:
				; CHECK: %indvars.iv = phi i64 [ %indvars.iv.next, %for.body3.split ], [ 1, %for.body3.preheader ]
				; CHECK: br label %for.body3.lr.ph
				; CHECK: for.body3.split1:
				; CHECK: %2 = add nsw i64 %indvars.iv, -1
				; CHECK: %arrayidx6 = getelementptr inbounds [100 x [100 x i32]]* @A, i64 0, i64 %2, i64 %1
				; CHECK: %3 = load i32* %arrayidx6
				; CHECK: %arrayidx12 = getelementptr inbounds [100 x [100 x i32]]* @B, i64 0, i64 %2, i64 %1
				; CHECK: %4 = load i32* %arrayidx12
				; CHECK: %add = add nsw i32 %4, %3
				; CHECK: store i32 %add, i32* %arrayidx6
				; CHECK: br label %for.inc19.split

				;;---------------------------------------Test case 03---------------------------------
				;; Loops interchange is not profitable.
				;; for(int i=1;i<N;i++)
				;; for(int j=1;j<N;j++)
				;; A[i-1][j-1] = A[i - 1][j-1] + B[i][j];

				define void @interchange_03(i32 %N){
				entry:
				%cmp31 = icmp sgt i32 %N, 1
				br i1 %cmp31, label %for.cond1.preheader.lr.ph, label %for.end19

				for.cond1.preheader.lr.ph:
				%0 = add i32 %N, -1
				br label %for.body3.lr.ph

				for.body3.lr.ph:
				%indvars.iv34 = phi i64 [ 1, %for.cond1.preheader.lr.ph ], [ %indvars.iv.next35, %for.inc17 ]
				%1 = add nsw i64 %indvars.iv34, -1
				br label %for.body3

				for.body3:
				%indvars.iv = phi i64 [ 1, %for.body3.lr.ph ], [ %indvars.iv.next, %for.body3 ]
				%2 = add nsw i64 %indvars.iv, -1
				%arrayidx6 = getelementptr inbounds [100 x [100 x i32]]* @A, i64 0, i64 %1, i64 %2
				%3 = load i32* %arrayidx6
				%arrayidx10 = getelementptr inbounds [100 x [100 x i32]]* @B, i64 0, i64 %indvars.iv34, i64 %indvars.iv
				%4 = load i32* %arrayidx10
				%add = add nsw i32 %4, %3
				store i32 %add, i32* %arrayidx6
				%indvars.iv.next = add nuw nsw i64 %indvars.iv, 1
				%lftr.wideiv = trunc i64 %indvars.iv to i32
				%exitcond = icmp eq i32 %lftr.wideiv, %0
				br i1 %exitcond, label %for.inc17, label %for.body3

				for.inc17:
				%indvars.iv.next35 = add nuw nsw i64 %indvars.iv34, 1
				%lftr.wideiv37 = trunc i64 %indvars.iv34 to i32
				%exitcond38 = icmp eq i32 %lftr.wideiv37, %0
				br i1 %exitcond38, label %for.end19, label %for.body3.lr.ph

				for.end19:
				ret void
				}

				; CHECK-LABEL: @interchange_03
				; CHECK: for.body3.lr.ph:
				; CHECK: %indvars.iv34 = phi i64 [ 1, %for.cond1.preheader.lr.ph ], [ %indvars.iv.next35, %for.inc17 ]
				; CHECK: %1 = add nsw i64 %indvars.iv34, -1
				; CHECK: br label %for.body3.preheader
				; CHECK: for.body3.preheader:
				; CHECK: br label %for.body3
				; CHECK: for.body3:
				; CHECK: %indvars.iv = phi i64 [ %indvars.iv.next, %for.body3 ], [ 1, %for.body3.preheader ]
				; CHECK: %2 = add nsw i64 %indvars.iv, -1
				; CHECK: %arrayidx6 = getelementptr inbounds [100 x [100 x i32]]* @A, i64 0, i64 %1, i64 %2
				; CHECK: %3 = load i32* %arrayidx6
				; CHECK: %arrayidx10 = getelementptr inbounds [100 x [100 x i32]]* @B, i64 0, i64 %indvars.iv34, i64 %indvars.iv
				; CHECK: %4 = load i32* %arrayidx10

This is an archive of the discontinued LLVM Phabricator instance.

[Patch] Loop Interchange PassClosedPublic

Details

Diff Detail

Event Timeline

Revision Contents

Diff 20374

include/llvm/InitializePasses.h

include/llvm/LinkAllPasses.h

include/llvm/Transforms/Scalar.h

lib/Transforms/Scalar/CMakeLists.txt

lib/Transforms/Scalar/LoopInterchange.cpp

lib/Transforms/Scalar/Scalar.cpp

test/Transforms/LoopInterchange/currentLimitation.ll

test/Transforms/LoopInterchange/interchange.ll

test/Transforms/LoopInterchange/profitability.ll

[Patch] Loop Interchange Pass
ClosedPublic