This is an archive of the discontinued LLVM Phabricator instance.

[LoopInterchange] Add support to interchange loops with reductions.
ClosedPublic

Authored by karthikthecool on Mar 13 2015, 2:50 AM.

Download Raw Diff

Details

Reviewers

rengolin
nadav
aschwaighofer
jmolloy
hfinkel

Commits

rL235571: Add support to interchange loops with reductions.

Summary

Hi Hal,
Please find attached the patch to enable interchange of loops having reductions. The logic to detect a reduction/induction is borrowed from loop vectorizer code.

With this change we are now able to interchange matrix multiplication code such as one below-

for( int i=1;i<2048;i++)
  for( int j=1;j<2048;j++)
    for( int k=1;k<2048;k++) 
      A[i][j]+=B[i][k]*C[k][j];

into -

for( int k=1;k<2048;k++) 
  for( int i=1;i<2048;i++)
    for( int j=1;j<2048;j++)
      A[i][j]+=B[i][k]*C[k][j];

which now gets vectorized.

We observe a ~3X execution time improvement in the above code.
Please if you could let me know your inputs on the same.

Thanks and Regards
Karthik Bhat

Diff Detail

Event Timeline

karthikthecool updated this revision to Diff 21907.Mar 13 2015, 2:50 AM

karthikthecool retitled this revision from to [LoopInterchange] Add support to interchange loops with reductions..

karthikthecool updated this object.

karthikthecool edited the test plan for this revision. (Show Details)

karthikthecool added reviewers: hfinkel, jmolloy.

karthikthecool added a subscriber: Unknown Object (MLST).

Hi Karthik,

Wouldn't this be better to be integrated into the vectorizer? Or at least, not duplicated code from the vectorizer?

Also, you have at least two other bold moves:

Move a lot of internal methods to static. This could be ok, but needs some further explanation why it is so.
Move the interchange pass down the pipeline. Again, it could be the right thing to do, but also needs some context.

Finally, how did you test and benchmark your changes? "It makes my code 3x faster" is not enough to include such a big change. The steps you need to complete to get such a change in are in order:

Run the "make check-all" tests and make sure they're green
Run the test-suite and make sure it passes *and* doesn't regress performance
Provide more information about the benchmarks that you tested it, relative improvements, regressions, etc.

cheers,
--renato

Hi Renato,
Thanks for looking into the patch. Please find my comments inline.

Wouldn't this be better to be integrated into the vectorizer? Or at least, not duplicated code from the vectorizer?

Since this is a complete pass integrating this will loop vectorizer would be a bit difficult. I can try to move common functions into a utility file if required.

Also, you have at least two other bold moves:

Move a lot of internal methods to static. This could be ok, but needs some further explanation why it is so.

These methods are helper functions. Helper functions are usually marked static in other Passes i had missed it in my initial commit hence corrected it here.

Move the interchange pass down the pipeline. Again, it could be the right thing to do, but also needs some context.

I had to move interchange pass down the pipeline and add licm and loop unswitch pass after loop interchange as after the inner loop header/loop latches are split and loops are interchanged we get multiple blocks with successors outside the loop(i.e. getExitingBlock was null). This kind of loop is not handled by vectorize and hence had to run licm and loop unswitch to remove unconditional branches between blocks in the inner loop.

Finally, how did you test and benchmark your changes?

I have run llvm lnt test cases but unfortunately didn't observe much improvement with this patch. I'm planning to run phoronix test suites to see if it gives improvement in some known benchmark. The matrix multiplication code was from one of our internal test cases which showed improvement post this patch. Have added few test cases (positive and negative) and ran make check all to make sure there are no regression. Also have written few local C test cases to check that the o/p's are same after interchange.

Run the "make check-all" tests and make sure they're green

Yes make check all passes with this patch. Have also added new test cases to test this feature.

Run the test-suite and make sure it passes *and* doesn't regress performance

Provide more information about the benchmarks that you tested it, relative improvements, regressions, etc.

Yes ran llvm lnt benchmark but didn't observe much improvement. No regressions were observed either. I have observed one crash in Dependency Analysis module which is triggered after this patch. Since this pass is currently disabled by default I was planning to address the crash in Dependency Analysis module separately. Will that be ok?

I will try to get more benchmark data using phoronix test suites over the weekend and get back to you.

Thanks once again for looking into the patch. Looking forward to your inputs on the patch.
Regards
Karthik Bhat

Hi Hal,Renato,
Refactor some common code into functions. I have currently borrowed and modified some functions from loop vectorizer. Do i need to refactor them into a common utility as well? These functions such as AddReductionVar seems to be a bit tightly bound with loop vectorizer code.

Second change is in PassManagerBuilder. Running SimplifyCFGPass after LoopInterchange is sufficient to merge and remove redundant basic blocks(blocks with just unconditional branch) produced after loop interhcange.Update the code to reflect the same.

I ran few phoronix benchmarks and lnt benchamrks but unfortunetly didn't see any improvement/regression due to this patch.

As mentioned in previous comments post this change code such as-

void matrixMult(int N, int M, int K) {
  for(int i=0;i<N;i++)
    for(int j=0;j<M;j++)
      for(int k=0;k<K;k++)
        A[i][j]+=B[i][k]*C[k][j];
}

gets vectorized givinig some execution time improvement during large matrix multiplication.

Please if you could let me know your inputs on the same.
Also are there are matrix multiplication benchark which i can test to see if the kind of code i mentioned above gets triggered?

Thanks and Regards
Karthik Bhat

rengolin added reviewers: nadav, aschwaighofer.Mar 21 2015, 8:18 AM

rengolin set the repository for this revision to rL LLVM.

In D8314#141802, @karthikthecool wrote:

Refactor some common code into functions. I have currently borrowed and modified some functions from loop vectorizer. Do i need to refactor them into a common utility as well? These functions such as AddReductionVar seems to be a bit tightly bound with loop vectorizer code.

Yes, they are, and I can see what the problem is. But there is a lot of duplication added by this patch and I'm still uncomfortable. I've added Nadav and Arnold, our loop vectorizer experts, to assist on what to do next.

I strongly suggest against duplication, and the only option I can think of is to spot the pattern while creating the reduction variable. You can create a function to iterate all containing loops and inspect all the ranges to make sure they match your pattern. Early exits should be made if the loop is not deep enough, or the outer loops don't iterate through any of the affected induction variables in your reduction.

Second change is in PassManagerBuilder. Running SimplifyCFGPass after LoopInterchange is sufficient to merge and remove redundant basic blocks(blocks with just unconditional branch) produced after loop interhcange.Update the code to reflect the same.

This is good news. Means that the pass is a lot less dramatic than you anticipated. :) This gives me hope that doing this inside the loop vectorizer can be managed.

I ran few phoronix benchmarks and lnt benchamrks but unfortunetly didn't see any improvement/regression due to this patch.

I'd say "fortunately", since you haven't introduced any regressions, and that's a great thing!

As mentioned in previous comments post this change code such as-
void matrixMult(int N, int M, int K) {
  for(int i=0;i<N;i++)
    for(int j=0;j<M;j++)
      for(int k=0;k<K;k++)
        A[i][j]+=B[i][k]*C[k][j];
}
gets vectorized givinig some execution time improvement during large matrix multiplication.

It seems we don't have that kind of benchmark on our test suite, and it would be good to have one. I don't know one off the top of my head, but maybe Hal/Nadav/Arnold could help.

cheers,
--renato

Thanks Renato, Tobias for your inputs. Please find my comments inline-

Yes, they are, and I can see what the problem is. But there is a lot of duplication added by this patch and I'm still uncomfortable. I've added Nadav and Arnold, our loop vectorizer experts, to assist on what to do next.

Sure. The functions currently duplicated are - isReductionPHI and helper functions and isInductionPHI. Is it ok to move them to somewhere like LoopBase. Loop can expose an API to check if a variable is induction/reduction in the loop? It can be reused by other modules in this case.

I strongly suggest against duplication, and the only option I can think of is to spot the pattern while creating the reduction variable. You can create a function to iterate all containing loops and inspect all the ranges to make sure they match your pattern. Early exits should be made if the loop is not deep enough, or the outer loops don't iterate through any of the affected induction variables in your reduction.

We do have early exits in the code. It was checked in in the initial version. E.g. min loop depth, dependency checks to see it interchange is safe etc are done before populating reduction/induction phi's.

This is good news. Means that the pass is a lot less dramatic than you anticipated. :) This gives me hope that doing this inside the loop vectorizer can be managed.

Yes :) . But i still feel that this should be a seperate pass and should not be moved inside loop vectorizer the reason is loop interchange is not specifically for vectorization of code. Based on the profitability model it can be used for cache resuse or register reuse etc. So having it as a seperate pass looks like a good option to me. Please let me know if you feel otherwise.

It seems we don't have that kind of benchmark on our test suite, and it would be good to have one. I don't know one off the top of my head, but maybe Hal/Nadav/Arnold could help.

Thanks Tobias for pointing out the test case SingleSource/Benchmarks/Polybench/linear-algebra/kernels/gemm/gemm.c. But the loop inside the test case is something like -

for (i = 0; i < _PB_NI; i++)
  for (j = 0; j < _PB_NJ; j++) {
    C[i][j] *= beta;
    for (k = 0; k < _PB_NK; ++k)
      C[i][j] += alpha * A[i][k] * B[k][j];
    }

Since loops j and k are not tightly nested we currently do not interchange these loops. As you mentioned loop interchange will give better results when it works along with other passes such as loop tiling, loop splitting etc.

Thanks for spending your valuable time on this patch. Is it ok to go ahead with moving duplicated code into somewhere as a Loop API/function or if you could suggest some better place to reduce code duplication.
Any other comments in general about the patch is welcome as well..:)
Thanks again for your time.
Regards
Karthik Bhat

In D8314#145073, @karthikthecool wrote:

Sure. The functions currently duplicated are - isReductionPHI and helper functions and isInductionPHI. Is it ok to move them to somewhere like LoopBase. Loop can expose an API to check if a variable is induction/reduction in the loop? It can be reused by other modules in this case.

I'm not sure this would work, because isReduction/Induction in the vectorizer is very specific to the internal vectorizer's data structures. Unless you manage to peel their common functionality without messing the vectorizer too much.

I agree that this pass is not just good for vectorizing, but it's applicable to a very specific use case which the whole point is to get vectorized. I'll see if I can get Nadav/Arnold's attention to this review...

cheers,
--renato

In D8314#145073, @karthikthecool wrote:

Sure. The functions currently duplicated are - isReductionPHI and helper functions and isInductionPHI. Is it ok to move them to somewhere like LoopBase. Loop can expose an API to check if a variable is induction/reduction in the loop? It can be reused by other modules in this case.

Karthik,

Can you try and refactor those two functions to be generic and make both loop vectorizer and interchange to use it?

cheers,
--renato

Hi Renato,
Sorry for the delay in followup on this patch was stuck in some other work. I was able to refactor isInductionVar out of LoopVectorizer and we are using the re-factored function in this patch.

Re-factoring AddReductionVar though seems a bit tricky as i'm not sure if we should expose all the enum/structs that are currently being used to support it. I'm currently only using a part of that code.

I have added a TODO for the same (i.e. re-factor isReductionPHI) for now. Is it OK to address refactoring of isReductionPHI in future? I wanted to address few other pending issues in LoopInterchange.

Please let me know if you feel otherwise. Will try to refactor the code on priority.
Thanks a lot for your time and help I really appreciate it.

Regards
Karthik Bhat

Add full context diff.

A gentle ping..

Hi Hal,Renato,
A gentle ping for review.

Hi Kharthik,

We were all in the EuroLLVM, and I'm still not back home. Sorry for the
delay, I'll look at it first thing tomorrow.

Cheers,
Renato

Hi Karthik,

Copying the vectorizer's reduction detection here is not the way forward. Please, refactor the detection part into a generic function.

My initial guess would be to create a vectorizer common library in lib/Transforms/Utils and move the reduction detection in there, like the other loop utilities, and get the vectorizer and your pass to use that.

cheers,
--renato

lib/Transforms/Scalar/LoopInterchange.cpp
104	avoid white space/empty line changes with code changes.
373	I don't think we should add this code here, not even with a TODO to refactor this, because this TODO will never be done.

Hi Renato,
Updated code as per reveiw comments. Refactored reduction identification code out of loop vectorizer and reusing the same in this pass. This code assumes D9046 has been applied.

Please let me know your inputs on this.

Thanks for your continued support. I really appreciate it.

Thanks and Regards
Karthik Bhat

Hi Karthik,

Looks a lot better, thanks!

I'll let @jmolloy review this one, as he was more tuned to this issue. I have no more concerns, thank you.

cheers,
--renato

Hi Renato,
Thanks for the review and time. Updating code to reflect changes done in r235284.

Hi Hal, James,
Could you please share your valuable inputs on this patch.
Thanks and Regards
Karthik Bhat

Hi Karthik,

The patch is looking good, apart from the few comments. I'd welcome @jmolloy's comments.

cheers,
--renato

lib/Transforms/Scalar/LoopInterchange.cpp
611	nitpick, please join this "else if".
628	nitpick, please join this "else if".
test/Transforms/LoopInterchange/reductions.ll
4	This is a generic test, it must not contain a target triple or it will fail on all aches minus x86_64. We do want to test this in ARM, MIPS, PPC, etc, so we should remove the triple and make sure it works on all buildbots.
125	These check lines are bound to fail on multiple architectures and with other optimisations coming in later, they could break the sequence or reorder instructions. You need to parse what's really relevant in the right order with the right arguments stored in variables "[[foo]]" and removed of architecture types to make sure it passes on 16/32/64 bit machines. The same is true for the other tests.

Hi Karthik,

See my inline comments.

Cheers,

James

lib/Transforms/Scalar/LoopInterchange.cpp
258	s/Theorm/Theorem
590	Can you rename this "areAllUsesReductions"? it sounds more like a boolean query, which is what this is. "Check" implies some action.
595	You can merge these two if's: if (!UserIns \|\| !ReductionDescriptor::isReductionPHI(UserIns, L, RD)) return false; If you wanted to be really cool... return !std::any(Ins->user_begin(), Ins->user_end(), [](User U) { PHINode UserIns = dyn_cast<PHINode>(*I); ReductionDescriptor RD; return UserIns && ReductionDescriptor::isReductionPHI(UserIns, L, RD); });
696	just: if (!L->getLoopLatch() \|\| !L->getLoopPredecessor()) return false;
707	Probably a good idea to have some debugging output here saying what PHI failed to be recognized?
732	Assert that this is 2?
757	Some debugging output saying why it failed would be nice
760	I don't like this, you're mutating the content of the class for no good reason. It would be better to explicitly give Inductions and Reductions (a stack local variable) to populateInductionAndReductions(), and rename it to findInductionAndReductions()
1063	You found this out in LoopInterchangeLegality, but threw away the result. Why recalculate it here?
1081	s/TODO/FIXME
1093	Just: for (auto U : PHI->users())
1138	Needs a bailout if I is not a PHINode.

Hi James, Renato,
Thanks for the comments. Updated the code to address review comments in LoopInterchange. Please find my comments inline.
Thanks and Regards
Karthik Bhat

lib/Transforms/Scalar/LoopInterchange.cpp
258	Updated.
590	Yes the new naming seems much better.Updated code.
595	Wow that was cool..:) got to learn and use lambda function. Updated code. But i think it shoulde be- return !std::any_of(Ins->user_begin(), Ins->user_end(), [=](User U) { PHINode UserIns = dyn_cast<PHINode>(U); ReductionDescriptor RD; return !UserIns \|\| !ReductionDescriptor::isReductionPHI(UserIns, L, RD); });
611	Updated code.
628	Updated code.
696	Updated code.
707	Added Debugging output.
732	Oops. This should be getNumSuccessors(). Updated code and added assertion.
757	Done.
760	Updated code.
1063	Modified code. Reusing value calculated by LoopInterchangeLegality in LoopInterchangeTransform.
1081	Done.
1093	I think this for loop was redundant. I wanted to replace all uses of PHI with that of incoming value from Header. PHI->replaceAllUsesWith(V); will suffice. This loop is not required deleted the same.
1138	Updated code to use "cast" instead of "dyn_cast" as I will always be a PHINode if it enters the for loop.
test/Transforms/LoopInterchange/reductions.ll
4	Wont this only run if triple is x86_64-unknown-linux-gnu? But i agree i need to add some general tests as well.
125	The problem here was I wanted to check if loops are interchanged properly and for that i was checking the complete structure of interchanged loop for correctness. I will try to reduce the checks.

Hi James,Renato,
Updated LoopInterchange.cpp as per review comments. Please have a look when you find time.
Thanks a lot for your time and guidance.
Thanks and Regards
Karthik Bhat

rengolin added inline comments.Apr 21 2015, 5:28 AM

test/Transforms/LoopInterchange/reductions.ll
4	No. To force it to run only on one architecture is to either REQUIRE: x86_64 or to move it into a directory that is platform-specific, with a lit.cfg file that does the same thing. But we want neither. Since this is not a platform pass, I think this optimisation should be tested in all platforms, unless we have a good reason not to. Taking away the triple will ensure that the target will be picked as the host, which is what we want. The IR might have to change to accommodate on other targets. I can test it on ARM/AArch64 to be sure.
125	That's fine, but you can't rely on the types and names as much as in the sequence of instructions. Only parse types and variables if they must be of a specific pattern. If not, just check the instruction names and correct order.

karthikthecool added inline comments.Apr 21 2015, 5:38 AM

test/Transforms/LoopInterchange/reductions.ll
4	Interesting because i had added some testcases with target triple in the initial checkin of this pass and they seems to pass. Also i saw similar tests in LoopReroll which i took as reference. But i agree we need generic tests. I will add/update the same. Wow it would be great if you could help me with the results of ARM/AArch64. I will try out the same as well.
125	Got your point. Will update the tests by tomorrow. Thanks for the comments.:)

OK, this looks OK to me now.

I tested on ARM and it works if you remove the data layout / triple from the test. Once you remove that and change the CHECK lines to be a bit less specific, looks good to me, too. Thanks!

Hi Renato,
Updated test cases as per comments. Please let me know if this looks good to you.
Verified with Debug+Assert build and make check-all on X86_64.

Thanks and Regards
Karthik Bhat

Hi Karthik,

Looks good to me with the two nitpicks. Feel free to commit with those changes.

Thanks!
-renato

test/Transforms/LoopInterchange/reductions.ll
50	nitpick: you don't need to match %0 here. It won't match if some pass adds a new unrelated operation. Just match up to promoted.
114	Same here. Mainly because %1 and %0 were not part of the rest of the match.

This revision is now accepted and ready to land.Apr 22 2015, 9:53 AM

Closed by commit rL235571: Add support to interchange loops with reductions. (authored by karthik). · Explain WhyApr 22 2015, 9:55 PM

This revision was automatically updated to reflect the committed changes.

Revision Contents

Path

Size

lib/

Transforms/

IPO/

	PassManagerBuilder.cpp
	PassManagerBuilder.cpp (revision 233368)

5 lines

Scalar/

	LoopInterchange.cpp
	LoopInterchange.cpp (revision 233368)

462 lines

test/

Transforms/

LoopInterchange/

	reductions.ll
	reductions.ll (revision 0)

290 lines

Diff 22783

lib/Transforms/IPO/PassManagerBuilder.cpp

Context not available.
	MPM.add(createIndVarSimplifyPass()); // Canonicalize indvars	MPM.add(createIndVarSimplifyPass()); // Canonicalize indvars
	MPM.add(createLoopIdiomPass()); // Recognize idioms like memset.	MPM.add(createLoopIdiomPass()); // Recognize idioms like memset.
	MPM.add(createLoopDeletionPass()); // Delete dead loops	MPM.add(createLoopDeletionPass()); // Delete dead loops
	if (EnableLoopInterchange)	if (EnableLoopInterchange) {
	MPM.add(createLoopInterchangePass()); // Interchange loops	MPM.add(createLoopInterchangePass()); // Interchange loops
		MPM.add(createCFGSimplificationPass());
		}
	if (!DisableUnrollLoops)	if (!DisableUnrollLoops)
	MPM.add(createSimpleLoopUnrollPass()); // Unroll small loops	MPM.add(createSimpleLoopUnrollPass()); // Unroll small loops
	addExtensionsToPM(EP_LoopOptimizerEnd, MPM);	addExtensionsToPM(EP_LoopOptimizerEnd, MPM);
Context not available.

lib/Transforms/Scalar/LoopInterchange.cpp

Context not available.
	#include "llvm/IR/IRBuilder.h"	#include "llvm/IR/IRBuilder.h"
	#include "llvm/IR/InstIterator.h"	#include "llvm/IR/InstIterator.h"
	#include "llvm/IR/IntrinsicInst.h"	#include "llvm/IR/IntrinsicInst.h"
		#include "llvm/IR/Module.h"
	#include "llvm/Pass.h"	#include "llvm/Pass.h"
	#include "llvm/Support/Debug.h"	#include "llvm/Support/Debug.h"
	#include "llvm/Support/raw_ostream.h"	#include "llvm/Support/raw_ostream.h"
Context not available.
	}	}
	#endif	#endif

	bool populateDependencyMatrix(CharMatrix &DepMatrix, unsigned Level, Loop *L,	static bool populateDependencyMatrix(CharMatrix &DepMatrix, unsigned Level,
	DependenceAnalysis *DA) {	Loop L, DependenceAnalysis DA) {
	typedef SmallVector<Value *, 16> ValueVector;	typedef SmallVector<Value *, 16> ValueVector;
	ValueVector MemInstr;	ValueVector MemInstr;

Context not available.
	MemInstr.push_back(I);	MemInstr.push_back(I);
	}	}
	}	}

	rengolinUnsubmitted Not Done Reply Inline Actions avoid white space/empty line changes with code changes. rengolin: avoid white space/empty line changes with code changes.
	DEBUG(dbgs() << "Found " << MemInstr.size()	DEBUG(dbgs() << "Found " << MemInstr.size()
	<< " Loads and Stores to analyze\n");	<< " Loads and Stores to analyze\n");

Context not available.

	// A loop is moved from index 'from' to an index 'to'. Update the Dependence	// A loop is moved from index 'from' to an index 'to'. Update the Dependence
	// matrix by exchanging the two columns.	// matrix by exchanging the two columns.
	void interChangeDepedencies(CharMatrix &DepMatrix, unsigned FromIndx,	static void interChangeDepedencies(CharMatrix &DepMatrix, unsigned FromIndx,
	unsigned ToIndx) {	unsigned ToIndx) {
	unsigned numRows = DepMatrix.size();	unsigned numRows = DepMatrix.size();
	for (unsigned i = 0; i < numRows; ++i) {	for (unsigned i = 0; i < numRows; ++i) {
	char TmpVal = DepMatrix[i][ToIndx];	char TmpVal = DepMatrix[i][ToIndx];
Context not available.

	// Checks if outermost non '=','S'or'I' dependence in the dependence matrix is	// Checks if outermost non '=','S'or'I' dependence in the dependence matrix is
	// '>'	// '>'
	bool isOuterMostDepPositive(CharMatrix &DepMatrix, unsigned Row,	static bool isOuterMostDepPositive(CharMatrix &DepMatrix, unsigned Row,
	unsigned Column) {	unsigned Column) {
	for (unsigned i = 0; i <= Column; ++i) {	for (unsigned i = 0; i <= Column; ++i) {
	if (DepMatrix[Row][i] == '<')	if (DepMatrix[Row][i] == '<')
	return false;	return false;
Context not available.
	}	}

	// Checks if no dependence exist in the dependency matrix in Row before Column.	// Checks if no dependence exist in the dependency matrix in Row before Column.
	bool containsNoDependence(CharMatrix &DepMatrix, unsigned Row,	static bool containsNoDependence(CharMatrix &DepMatrix, unsigned Row,
	unsigned Column) {	unsigned Column) {
	for (unsigned i = 0; i < Column; ++i) {	for (unsigned i = 0; i < Column; ++i) {
	if (DepMatrix[Row][i] != '=' \|\| DepMatrix[Row][i] != 'S' \|\|	if (DepMatrix[Row][i] != '=' \|\| DepMatrix[Row][i] != 'S' \|\|
	DepMatrix[Row][i] != 'I')	DepMatrix[Row][i] != 'I')
Context not available.
	return true;	return true;
	}	}

	bool validDepInterchange(CharMatrix &DepMatrix, unsigned Row,	static bool validDepInterchange(CharMatrix &DepMatrix, unsigned Row,
	unsigned OuterLoopId, char InnerDep, char OuterDep) {	unsigned OuterLoopId, char InnerDep,
		char OuterDep) {

	if (isOuterMostDepPositive(DepMatrix, Row, OuterLoopId))	if (isOuterMostDepPositive(DepMatrix, Row, OuterLoopId))
	return false;	return false;
Context not available.
	// [Theorm] A permutation of the loops in a perfect nest is legal if and only if	// [Theorm] A permutation of the loops in a perfect nest is legal if and only if
	// the direction matrix, after the same permutation is applied to its columns,	// the direction matrix, after the same permutation is applied to its columns,
		jmolloyUnsubmitted Not Done Reply Inline Actions s/Theorm/Theorem jmolloy: s/Theorm/Theorem
		karthikthecoolAuthorUnsubmitted Not Done Reply Inline Actions Updated. karthikthecool: Updated.
	// has no ">" direction as the leftmost non-"=" direction in any row.	// has no ">" direction as the leftmost non-"=" direction in any row.
	bool isLegalToInterChangeLoops(CharMatrix &DepMatrix, unsigned InnerLoopId,	static bool isLegalToInterChangeLoops(CharMatrix &DepMatrix,
	unsigned OuterLoopId) {	unsigned InnerLoopId,
		unsigned OuterLoopId) {

	unsigned NumRows = DepMatrix.size();	unsigned NumRows = DepMatrix.size();
	// For each row check if it is valid to interchange.	// For each row check if it is valid to interchange.
Context not available.
	return nullptr;	return nullptr;
	}	}

		static bool isReductionInstr(Instruction *I) {
		bool FP = I->getType()->isFloatingPointTy();
		bool FastMath = FP && I->hasUnsafeAlgebra();
		switch (I->getOpcode()) {
		default:
		return false;
		case Instruction::PHI:
		return true;
		case Instruction::Sub:
		case Instruction::Add:
		case Instruction::Mul:
		case Instruction::And:
		case Instruction::Or:
		case Instruction::Xor:
		return true;
		case Instruction::FMul:
		case Instruction::FSub:
		case Instruction::FAdd:
		return FastMath;
		}
		return false;
		}

		static bool hasMultipleUsesOf(Instruction *I,
		SmallPtrSetImpl<Instruction *> &Insts) {
		unsigned NumUses = 0;
		for (User::op_iterator Use = I->op_begin(), E = I->op_end(); Use != E;
		++Use) {
		if (Insts.count(dyn_cast<Instruction>(*Use)))
		++NumUses;
		if (NumUses > 1)
		return true;
		}

		return false;
		}

		static bool areAllUsesIn(Instruction I, SmallPtrSetImpl<Instruction > &Set) {
		for (User::op_iterator Use = I->op_begin(), E = I->op_end(); Use != E; ++Use)
		if (!Set.count(dyn_cast<Instruction>(*Use)))
		return false;
		return true;
		}

		// TODO: Major parts of isReductionPHI is similar to that used by Loop
		// Vectorizer. Try to refactor this code.
		rengolinUnsubmitted Not Done Reply Inline Actions I don't think we should add this code here, not even with a TODO to refactor this, because this TODO will never be done. rengolin: I don't think we should add this code here, not even with a TODO to refactor this, because this…
		static bool isReductionPHI(PHINode Phi, Loop TheLoop) {
		if (Phi->getNumIncomingValues() != 2)
		return false;

		// Reduction variables are only found in the loop header block.
		if (Phi->getParent() != TheLoop->getHeader())
		return false;

		Instruction *ExitInstruction = nullptr;
		// Indicates that we found a reduction operation in our scan.
		bool FoundReduxOp = false;

		// We start with the PHI node and scan for all of the users of this
		// instruction. All users must be instructions that can be used as reduction
		// variables (such as ADD). We must have a single out-of-block user. The cycle
		// must include the original PHI.
		bool FoundStartPHI = false;

		SmallPtrSet<Instruction *, 8> VisitedInsts;
		SmallVector<Instruction *, 8> Worklist;
		Worklist.push_back(Phi);
		VisitedInsts.insert(Phi);

		// A value in the reduction can be used:
		// - By the reduction:
		// - Reduction operation:
		// - One use of reduction value (safe).
		// - Multiple use of reduction value (not safe).
		// - PHI:
		// - All uses of the PHI must be the reduction (safe).
		// - Otherwise, not safe.
		// - By one instruction outside of the loop (safe).
		// - By further instructions outside of the loop (not safe).
		// - By an instruction that is not part of the reduction (not safe).
		// This is either:
		// * An instruction type other than PHI or the reduction operation.
		// * A PHI in the header other than the initial PHI.
		while (!Worklist.empty()) {
		Instruction *Cur = Worklist.back();
		Worklist.pop_back();

		// No Users.
		// If the instruction has no users then this is a broken chain and can't be
		// a reduction variable.
		if (Cur->use_empty())
		return false;

		bool IsAPhi = isa<PHINode>(Cur);

		// A header PHI use other than the original PHI.
		if (Cur != Phi && IsAPhi && Cur->getParent() == Phi->getParent())
		return false;

		// Reductions of instructions such as Div, and Sub is only possible if the
		// LHS is the reduction variable.
		if (!Cur->isCommutative() && !IsAPhi && !isa<SelectInst>(Cur) &&
		!isa<ICmpInst>(Cur) && !isa<FCmpInst>(Cur) &&
		!VisitedInsts.count(dyn_cast<Instruction>(Cur->getOperand(0))))
		return false;

		bool IsReduction = isReductionInstr(Cur);
		if (!IsReduction) {
		DEBUG(dbgs() << "IsReduction failed\n");
		return false;
		}

		// A reduction operation must only have one use of the reduction value.
		if (!IsAPhi && hasMultipleUsesOf(Cur, VisitedInsts))
		return false;

		// All inputs to a PHI node must be a reduction value.
		if (IsAPhi && Cur != Phi && !areAllUsesIn(Cur, VisitedInsts))
		return false;

		// Check whether we found a reduction operator.
		FoundReduxOp \|= !IsAPhi;

		// Process users of current instruction. Push non-PHI nodes after PHI nodes
		// onto the stack. This way we are going to have seen all inputs to PHI
		// nodes once we get to them.
		SmallVector<Instruction *, 8> NonPHIs;
		SmallVector<Instruction *, 8> PHIs;
		for (User *U : Cur->users()) {
		Instruction *UI = cast<Instruction>(U);

		// Check if we found the exit user.
		BasicBlock *Parent = UI->getParent();
		if (!TheLoop->contains(Parent)) {
		// Exit if you find multiple outside users or if the header phi node is
		// being used. In this case the user uses the value of the previous
		// iteration, in which case we would loose "VF-1" iterations of the
		// reduction operation if we vectorize.
		if (ExitInstruction != nullptr \|\| Cur == Phi)
		return false;

		// The instruction used by an outside user must be the last instruction
		// before we feed back to the reduction phi. Otherwise, we loose VF-1
		// operations on the value.
		if (std::find(Phi->op_begin(), Phi->op_end(), Cur) == Phi->op_end())
		return false;

		ExitInstruction = Cur;
		continue;
		}

		if (VisitedInsts.insert(UI).second) {
		if (isa<PHINode>(UI))
		PHIs.push_back(UI);
		else
		NonPHIs.push_back(UI);
		}
		// Remember that we completed the cycle.
		if (UI == Phi)
		FoundStartPHI = true;
		}
		Worklist.append(PHIs.begin(), PHIs.end());
		Worklist.append(NonPHIs.begin(), NonPHIs.end());
		}

		if (!FoundStartPHI \|\| !FoundReduxOp \|\| !ExitInstruction)
		return false;

		return true;
		}

	/// LoopInterchangeLegality checks if it is legal to interchange the loop.	/// LoopInterchangeLegality checks if it is legal to interchange the loop.
	class LoopInterchangeLegality {	class LoopInterchangeLegality {
	public:	public:
Context not available.

	private:	private:
	bool tightlyNested(Loop Outer, Loop Inner);	bool tightlyNested(Loop Outer, Loop Inner);
		bool containsUnsafeInstructionsInHeader(BasicBlock *BB);
		bool checkAllUsesAreReductions(Instruction Ins, Loop L);
		bool containsUnsafeInstructionsInLatch(BasicBlock *BB);
		bool populateInductionAndReductions(Loop *L);
		SmallVector<PHINode *, 8> Inductions;
		SmallVector<PHINode *, 8> Reductions;
	Loop *OuterLoop;	Loop *OuterLoop;
	Loop *InnerLoop;	Loop *InnerLoop;

Context not available.
	void splitInnerLoopLatch(Instruction *);	void splitInnerLoopLatch(Instruction *);
	void splitOuterLoopLatch();	void splitOuterLoopLatch();
	void splitInnerLoopHeader();	void splitInnerLoopHeader();
		bool hasReductionPHI(Loop *L);
	bool adjustLoopLinks();	bool adjustLoopLinks();
	void adjustLoopPreheaders();	void adjustLoopPreheaders();
	void adjustOuterLoopPreheader();	void adjustOuterLoopPreheader();
	void adjustInnerLoopPreheader();	void adjustInnerLoopPreheader();
	bool adjustLoopBranches();	bool adjustLoopBranches();
		void updateIncomingBlock(BasicBlock CurrBlock, BasicBlock OldPred,
		BasicBlock *NewPred);

	Loop *OuterLoop;	Loop *OuterLoop;
	Loop *InnerLoop;	Loop *InnerLoop;
		rengolinUnsubmitted Not Done Reply Inline Actions nitpick, please join this "else if". rengolin: nitpick, please join this "else if".
		jmolloyUnsubmitted Not Done Reply Inline Actions Can you rename this "areAllUsesReductions"? it sounds more like a boolean query, which is what this is. "Check" implies some action. jmolloy: Can you rename this "areAllUsesReductions"? it sounds more like a boolean query, which is what…
		jmolloyUnsubmitted Not Done Reply Inline Actions You can merge these two if's: if (!UserIns \|\| !ReductionDescriptor::isReductionPHI(UserIns, L, RD)) return false; If you wanted to be really cool... return !std::any(Ins->user_begin(), Ins->user_end(), [](User U) { PHINode UserIns = dyn_cast<PHINode>(I); ReductionDescriptor RD; return UserIns && ReductionDescriptor::isReductionPHI(UserIns, L, RD); }); jmolloy:* You can merge these two if's: if (!UserIns \|\| !ReductionDescriptor::isReductionPHI(UserIns…
		karthikthecoolAuthorUnsubmitted Not Done Reply Inline Actions Yes the new naming seems much better.Updated code. karthikthecool: Yes the new naming seems much better.Updated code.
		karthikthecoolAuthorUnsubmitted Not Done Reply Inline Actions Updated code. karthikthecool: Updated code.
		karthikthecoolAuthorUnsubmitted Not Done Reply Inline Actions Wow that was cool..:) got to learn and use lambda function. Updated code. But i think it shoulde be- return !std::any_of(Ins->user_begin(), Ins->user_end(), [=](User U) { PHINode UserIns = dyn_cast<PHINode>(U); ReductionDescriptor RD; return !UserIns \|\| !ReductionDescriptor::isReductionPHI(UserIns, L, RD); }); karthikthecool: Wow that was cool..:) got to learn and use lambda function. Updated code. But i think it…
Context not available.
	bool Changed = true;	bool Changed = true;
	while (!Worklist.empty()) {	while (!Worklist.empty()) {
	LoopVector LoopList = Worklist.pop_back_val();	LoopVector LoopList = Worklist.pop_back_val();
	Changed = processLoopList(LoopList);	Changed = processLoopList(LoopList, F);
	}	}
		rengolinUnsubmitted Not Done Reply Inline Actions nitpick, please join this "else if". rengolin: nitpick, please join this "else if".
		karthikthecoolAuthorUnsubmitted Not Done Reply Inline Actions Updated code. karthikthecool: Updated code.
	return Changed;	return Changed;
	}	}
Context not available.
	return LoopList.size() - 1;	return LoopList.size() - 1;
	}	}

	bool processLoopList(LoopVector LoopList) {	bool processLoopList(LoopVector LoopList, Function &F) {

	bool Changed = false;	bool Changed = false;
	bool containsLCSSAPHI = false;
	CharMatrix DependencyMatrix;	CharMatrix DependencyMatrix;
	if (LoopList.size() < 2) {	if (LoopList.size() < 2) {
	DEBUG(dbgs() << "Loop doesn't contain minimum nesting level.\n");	DEBUG(dbgs() << "Loop doesn't contain minimum nesting level.\n");
		jmolloyUnsubmitted Not Done Reply Inline Actions just: if (!L->getLoopLatch() \|\| !L->getLoopPredecessor()) return false; jmolloy: just: if (!L->getLoopLatch() \|\| !L->getLoopPredecessor()) return false;
		karthikthecoolAuthorUnsubmitted Not Done Reply Inline Actions Updated code. karthikthecool: Updated code.
Context not available.
	else	else
	LoopNestExit = OuterMostLoopLatchBI->getSuccessor(0);	LoopNestExit = OuterMostLoopLatchBI->getSuccessor(0);

	for (auto I = LoopList.begin(), E = LoopList.end(); I != E; ++I) {	if (isa<PHINode>(LoopNestExit->begin())) {
	Loop L = I;	DEBUG(dbgs() << "PHI Nodes in loop nest exit is not handled for now "
	BasicBlock *Latch = L->getLoopLatch();	"since on failure all loops branch to loop nest exit.\n");
	BasicBlock *Header = L->getHeader();	return false;
	if (Latch && Latch != Header && isa<PHINode>(Latch->begin())) {
	containsLCSSAPHI = true;
	break;
	}
	}	}

		jmolloyUnsubmitted Not Done Reply Inline Actions Probably a good idea to have some debugging output here saying what PHI failed to be recognized? jmolloy: Probably a good idea to have some debugging output here saying what PHI failed to be recognized?
		karthikthecoolAuthorUnsubmitted Not Done Reply Inline Actions Added Debugging output. karthikthecool: Added Debugging output.
	// TODO: Handle lcssa PHI's. Currently LCSSA PHI's are not handled. Handle
	// the same by splitting the loop latch and adjusting loop links
	// accordingly.
	if (containsLCSSAPHI)
	return false;

	unsigned SelecLoopId = selectLoopForInterchange(LoopList);	unsigned SelecLoopId = selectLoopForInterchange(LoopList);
	// Move the selected loop outwards to the best posible position.	// Move the selected loop outwards to the best posible position.
	for (unsigned i = SelecLoopId; i > 0; i--) {	for (unsigned i = SelecLoopId; i > 0; i--) {
Context not available.

	// Update the DependencyMatrix	// Update the DependencyMatrix
	interChangeDepedencies(DependencyMatrix, i, i - 1);	interChangeDepedencies(DependencyMatrix, i, i - 1);
		DT->recalculate(F);
	#ifdef DUMP_DEP_MATRICIES	#ifdef DUMP_DEP_MATRICIES
	DEBUG(dbgs() << "Dependence after inter change \n");	DEBUG(dbgs() << "Dependence after inter change \n");
	printDepMatrix(DependencyMatrix);	printDepMatrix(DependencyMatrix);
Context not available.
	bool processLoop(LoopVector LoopList, unsigned InnerLoopId,	bool processLoop(LoopVector LoopList, unsigned InnerLoopId,
	unsigned OuterLoopId, BasicBlock *LoopNestExit,	unsigned OuterLoopId, BasicBlock *LoopNestExit,
	std::vector<std::vector<char>> &DependencyMatrix) {	std::vector<std::vector<char>> &DependencyMatrix) {
		jmolloyUnsubmitted Not Done Reply Inline Actions Assert that this is 2? jmolloy: Assert that this is 2?
		karthikthecoolAuthorUnsubmitted Not Done Reply Inline Actions Oops. This should be getNumSuccessors(). Updated code and added assertion. karthikthecool: Oops. This should be getNumSuccessors(). Updated code and added assertion.

	DEBUG(dbgs() << "Processing Innder Loop Id = " << InnerLoopId	DEBUG(dbgs() << "Processing Innder Loop Id = " << InnerLoopId
	<< " and OuterLoopId = " << OuterLoopId << "\n");	<< " and OuterLoopId = " << OuterLoopId << "\n");
	Loop *InnerLoop = LoopList[InnerLoopId];	Loop *InnerLoop = LoopList[InnerLoopId];
Context not available.
	};	};

		jmolloyUnsubmitted Not Done Reply Inline Actions Some debugging output saying why it failed would be nice jmolloy: Some debugging output saying why it failed would be nice
		karthikthecoolAuthorUnsubmitted Not Done Reply Inline Actions Done. karthikthecool: Done.
	} // end of namespace	} // end of namespace
		bool LoopInterchangeLegality::checkAllUsesAreReductions(Instruction *Ins,
		Loop *L) {
		jmolloyUnsubmitted Not Done Reply Inline Actions I don't like this, you're mutating the content of the class for no good reason. It would be better to explicitly give Inductions and Reductions (a stack local variable) to populateInductionAndReductions(), and rename it to findInductionAndReductions() jmolloy: I don't like this, you're mutating the content of the class for no good reason. It would be…
		karthikthecoolAuthorUnsubmitted Not Done Reply Inline Actions Updated code. karthikthecool: Updated code.
		for (auto I = Ins->user_begin(), E = I->user_end(); I != E; ++I) {
		PHINode UserIns = dyn_cast<PHINode>(I);
		if (!UserIns)
		return false;
		if (!isReductionPHI(UserIns, L))
		return false;
		}
		return true;
		}

	static bool containsUnsafeInstructions(BasicBlock *BB) {	bool LoopInterchangeLegality::containsUnsafeInstructionsInHeader(
		BasicBlock *BB) {
	for (auto I = BB->begin(), E = BB->end(); I != E; ++I) {	for (auto I = BB->begin(), E = BB->end(); I != E; ++I) {
		// Load corresponding to reduction PHI's are safe while concluding if
		// tightly nested.
		if (LoadInst *L = dyn_cast<LoadInst>(I)) {
		if (!checkAllUsesAreReductions(L, InnerLoop))
		return true;
		} else {
	if (I->mayHaveSideEffects() \|\| I->mayReadFromMemory())	if (I->mayHaveSideEffects() \|\| I->mayReadFromMemory())
	return true;	return true;
		}
	}	}
	return false;	return false;
	}	}

		bool LoopInterchangeLegality::containsUnsafeInstructionsInLatch(
		BasicBlock *BB) {
		for (auto I = BB->begin(), E = BB->end(); I != E; ++I) {
		// Stores corresponding to reductions are safe while concluding if tightly
		// nested.
		if (StoreInst *L = dyn_cast<StoreInst>(I)) {
		PHINode *PHI = dyn_cast<PHINode>(L->getOperand(0));
		if (!PHI)
		return true;
		} else {
		if (I->mayHaveSideEffects() \|\| I->mayReadFromMemory())
		return true;
		}
		}
		return false;
		}

	bool LoopInterchangeLegality::tightlyNested(Loop OuterLoop, Loop InnerLoop) {	bool LoopInterchangeLegality::tightlyNested(Loop OuterLoop, Loop InnerLoop) {
	BasicBlock *OuterLoopHeader = OuterLoop->getHeader();	BasicBlock *OuterLoopHeader = OuterLoop->getHeader();
	BasicBlock *InnerLoopPreHeader = InnerLoop->getLoopPreheader();	BasicBlock *InnerLoopPreHeader = InnerLoop->getLoopPreheader();
Context not available.
	DEBUG(dbgs() << "Checking instructions in Loop header and Loop latch \n");	DEBUG(dbgs() << "Checking instructions in Loop header and Loop latch \n");
	// We do not have any basic block in between now make sure the outer header	// We do not have any basic block in between now make sure the outer header
	// and outer loop latch doesnt contain any unsafe instructions.	// and outer loop latch doesnt contain any unsafe instructions.
	if (containsUnsafeInstructions(OuterLoopHeader) \|\|	if (containsUnsafeInstructionsInHeader(OuterLoopHeader) \|\|
	containsUnsafeInstructions(OuterLoopLatch))	containsUnsafeInstructionsInLatch(OuterLoopLatch))
	return false;	return false;

	DEBUG(dbgs() << "Loops are perfectly nested \n");	DEBUG(dbgs() << "Loops are perfectly nested \n");
Context not available.
	return true;	return true;
	}	}

	static unsigned getPHICount(BasicBlock *BB) {
	unsigned PhiCount = 0;
	for (auto I = BB->begin(); isa<PHINode>(I); ++I)
	PhiCount++;
	return PhiCount;
	}

	bool LoopInterchangeLegality::isLoopStructureUnderstood(	bool LoopInterchangeLegality::isLoopStructureUnderstood(
	PHINode *InnerInduction) {	PHINode *InnerInduction) {
Context not available.
	return true;	return true;
	}	}

		bool LoopInterchangeLegality::populateInductionAndReductions(Loop *L) {
		if (L->getLoopLatch() == nullptr \|\| L->getLoopPredecessor() == nullptr)
		return false;
		for (BasicBlock::iterator I = L->getHeader()->begin(); isa<PHINode>(I); ++I) {
		PHINode *PHI = cast<PHINode>(I);
		ConstantInt *StepValue = nullptr;
		if (isInductionPHI(PHI, SE, StepValue))
		Inductions.push_back(PHI);
		else if (isReductionPHI(PHI, L))
		Reductions.push_back(PHI);
		else
		return false;
		}
		return true;
		}

		static bool containsSafePHI(BasicBlock *Block, bool isOuterLoopExitBlock) {
		for (auto I = Block->begin(); isa<PHINode>(I); ++I) {
		PHINode *PHI = cast<PHINode>(I);
		// Reduction lcssa phi will have only 1 incoming block that from loop latch.
		if (PHI->getNumIncomingValues() > 1)
		return false;
		Instruction *Ins = dyn_cast<Instruction>(PHI->getIncomingValue(0));
		if (!Ins)
		return false;
		// Incoming value for lcssa phi's in outer loop exit can only be inner loop
		// exits lcssa phi else it would not be tightly nested.
		if (!isa<PHINode>(Ins) && isOuterLoopExitBlock)
		return false;
		}
		return true;
		}

		static BasicBlock getLoopLatchExitBlock(BasicBlock LatchBlock,
		BasicBlock *LoopHeader) {
		if (BranchInst *BI = dyn_cast<BranchInst>(LatchBlock->getTerminator())) {
		unsigned Num = BI->getNumOperands();
		for (unsigned i = 0; i < Num; ++i) {
		if (BI->getSuccessor(i) == LoopHeader)
		continue;
		return BI->getSuccessor(i);
		}
		}
		return nullptr;
		}

	// This function indicates the current limitations in the transform as a result	// This function indicates the current limitations in the transform as a result
	// of which we do not proceed.	// of which we do not proceed.
	bool LoopInterchangeLegality::currentLimitations() {	bool LoopInterchangeLegality::currentLimitations() {
Context not available.

	BasicBlock *InnerLoopPreHeader = InnerLoop->getLoopPreheader();	BasicBlock *InnerLoopPreHeader = InnerLoop->getLoopPreheader();
	BasicBlock *InnerLoopHeader = InnerLoop->getHeader();	BasicBlock *InnerLoopHeader = InnerLoop->getHeader();
	BasicBlock *OuterLoopHeader = OuterLoop->getHeader();
	BasicBlock *InnerLoopLatch = InnerLoop->getLoopLatch();	BasicBlock *InnerLoopLatch = InnerLoop->getLoopLatch();
	BasicBlock *OuterLoopLatch = OuterLoop->getLoopLatch();	BasicBlock *OuterLoopLatch = OuterLoop->getLoopLatch();
		BasicBlock *OuterLoopHeader = OuterLoop->getHeader();

	PHINode *InnerInductionVar;	PHINode *InnerInductionVar;
	PHINode *OuterInductionVar;	if (!populateInductionAndReductions(InnerLoop))

	// We currently handle only 1 induction variable inside the loop. We also do
	// not handle reductions as of now.
	if (getPHICount(InnerLoopHeader) > 1)
	return true;	return true;

	if (getPHICount(OuterLoopHeader) > 1)	// TODO: Currently we handle only loops with 1 induction variable.
		if (Inductions.size() != 1)
	return true;	return true;
		InnerInductionVar = Inductions.pop_back_val();
		Reductions.clear();
		if (!populateInductionAndReductions(OuterLoop))
		return true;

	InnerInductionVar = getInductionVariable(InnerLoop, SE);	// Outer loop cannot have reduction because then loops will not be tightly
	OuterInductionVar = getInductionVariable(OuterLoop, SE);	// nested.
		if (!Reductions.empty())
	if (!OuterInductionVar \|\| !InnerInductionVar) {
	DEBUG(dbgs() << "Induction variable not found\n");
	return true;	return true;
	}	// TODO: Currently we handle only loops with 1 induction variable.
		if (Inductions.size() != 1)
		return true;

	// TODO: Triangular loops are not handled for now.	// TODO: Triangular loops are not handled for now.
	if (!isLoopStructureUnderstood(InnerInductionVar)) {	if (!isLoopStructureUnderstood(InnerInductionVar)) {
Context not available.
	return true;	return true;
	}	}

	// TODO: Loops with LCSSA PHI's are currently not handled.	// TODO: We only handle LCSSA PHI's corresponding to reduction for now.
	if (isa<PHINode>(OuterLoopLatch->begin())) {	BasicBlock *LoopExitBlock =
	DEBUG(dbgs() << "Found and LCSSA PHI in outer loop latch\n");	getLoopLatchExitBlock(OuterLoopLatch, OuterLoopHeader);
		if (!LoopExitBlock \|\| !containsSafePHI(LoopExitBlock, true))
	return true;	return true;
	}
	if (InnerLoopLatch != InnerLoopHeader &&	LoopExitBlock = getLoopLatchExitBlock(InnerLoopLatch, InnerLoopHeader);
	isa<PHINode>(InnerLoopLatch->begin())) {	if (!LoopExitBlock \|\| !containsSafePHI(LoopExitBlock, false))
	DEBUG(dbgs() << "Found and LCSSA PHI in inner loop latch\n");
	return true;	return true;
	}

	// TODO: Current limitation: Since we split the inner loop latch at the point	// TODO: Current limitation: Since we split the inner loop latch at the point
	// were induction variable is incremented (induction.next); We cannot have	// were induction variable is incremented (induction.next); We cannot have
Context not available.
	InnerLoopPreHeader = InsertPreheaderForLoop(InnerLoop, CurrentPass);	InnerLoopPreHeader = InsertPreheaderForLoop(InnerLoop, CurrentPass);
	}	}

	// Check if the loops are tightly nested.
	if (!tightlyNested(OuterLoop, InnerLoop)) {
	DEBUG(dbgs() << "Loops not tightly nested\n");
	return false;
	}

	// TODO: The loops could not be interchanged due to current limitations in the	// TODO: The loops could not be interchanged due to current limitations in the
	// transform module.	// transform module.
	if (currentLimitations()) {	if (currentLimitations()) {
Context not available.
	return false;	return false;
	}	}

		// Check if the loops are tightly nested.
		if (!tightlyNested(OuterLoop, InnerLoop)) {
		DEBUG(dbgs() << "Loops not tightly nested\n");
		return false;
		}

	return true;	return true;
	}	}

		jmolloyUnsubmitted Not Done Reply Inline Actions You found this out in LoopInterchangeLegality, but threw away the result. Why recalculate it here? jmolloy: You found this out in LoopInterchangeLegality, but threw away the result. Why recalculate it…
		jmolloyUnsubmitted Not Done Reply Inline Actions s/TODO/FIXME jmolloy: s/TODO/FIXME
		jmolloyUnsubmitted Not Done Reply Inline Actions Just: for (auto U : PHI->users()) jmolloy: Just: for (auto U : PHI->users())
		jmolloyUnsubmitted Not Done Reply Inline Actions Needs a bailout if I is not a PHINode. jmolloy: Needs a bailout if I is not a PHINode.
		karthikthecoolAuthorUnsubmitted Not Done Reply Inline Actions Done. karthikthecool: Done.
		karthikthecoolAuthorUnsubmitted Not Done Reply Inline Actions I think this for loop was redundant. I wanted to replace all uses of PHI with that of incoming value from Header. PHI->replaceAllUsesWith(V); will suffice. This loop is not required deleted the same. karthikthecool: I think this for loop was redundant. I wanted to replace all uses of PHI with that of incoming…
		karthikthecoolAuthorUnsubmitted Not Done Reply Inline Actions Updated code to use "cast" instead of "dyn_cast" as I will always be a PHINode if it enters the for loop. karthikthecool: Updated code to use "cast" instead of "dyn_cast" as I will always be a PHINode if it enters the…
		karthikthecoolAuthorUnsubmitted Not Done Reply Inline Actions Modified code. Reusing value calculated by LoopInterchangeLegality in LoopInterchangeTransform. karthikthecool: Modified code. Reusing value calculated by LoopInterchangeLegality in LoopInterchangeTransform.
Context not available.
	OuterLoopLatch->getFirstNonPHI(), DT, LI);	OuterLoopLatch->getFirstNonPHI(), DT, LI);
	}	}

		bool LoopInterchangeTransform::hasReductionPHI(Loop *L) {
		BasicBlock *LoopHeader = L->getHeader();
		for (auto I = LoopHeader->begin(); isa<PHINode>(I); ++I) {
		PHINode *PHI = cast<PHINode>(I);
		if (isReductionPHI(PHI, L))
		return true;
		}
		return false;
		}

	void LoopInterchangeTransform::splitInnerLoopHeader() {	void LoopInterchangeTransform::splitInnerLoopHeader() {

	// Split the inner loop header out.	// Split the inner loop header out. Here make sure that the reduction PHI's
		// stay in the innerloop body.
	BasicBlock *InnerLoopHeader = InnerLoop->getHeader();	BasicBlock *InnerLoopHeader = InnerLoop->getHeader();
	SplitBlock(InnerLoopHeader, InnerLoopHeader->getFirstNonPHI(), DT, LI);	BasicBlock *InnerLoopPreHeader = InnerLoop->getLoopPreheader();
		if (hasReductionPHI(InnerLoop)) {
		// TODO: Check if the induction PHI will always be the first PHI.
		BasicBlock *New = InnerLoopHeader->splitBasicBlock(
		++(InnerLoopHeader->begin()), InnerLoopHeader->getName() + ".split");
		if (LI)
		if (Loop *L = LI->getLoopFor(InnerLoopHeader))
		L->addBasicBlockToLoop(New, *LI);

		// Adjust Reduction PHI's in the block.
		SmallVector<PHINode *, 8> PHIVec;
		for (auto I = New->begin(); isa<PHINode>(I); ++I) {
		PHINode *PHI = dyn_cast<PHINode>(I);
		Value *V = PHI->getIncomingValueForBlock(InnerLoopPreHeader);
		for (auto UI = PHI->user_begin(), UE = PHI->user_end(); UI != UE; ++UI) {
		PHI->replaceAllUsesWith(V);
		}
		PHIVec.push_back((PHI));
		}
		for (auto I = PHIVec.begin(), E = PHIVec.end(); I != E; ++I) {
		PHINode P = I;
		P->eraseFromParent();
		}
		} else {
		SplitBlock(InnerLoopHeader, InnerLoopHeader->getFirstNonPHI(), DT, LI);
		}

	DEBUG(dbgs() << "Output of splitInnerLoopHeader InnerLoopHeaderSucc & "	DEBUG(dbgs() << "Output of splitInnerLoopHeader InnerLoopHeaderSucc & "
	"InnerLoopHeader \n");	"InnerLoopHeader \n");
	}	}
Context not available.
	moveBBContents(InnerLoopPreHeader, OuterHeader->getTerminator());	moveBBContents(InnerLoopPreHeader, OuterHeader->getTerminator());
	}	}

		void LoopInterchangeTransform::updateIncomingBlock(BasicBlock *CurrBlock,
		BasicBlock *OldPred,
		BasicBlock *NewPred) {
		for (auto I = CurrBlock->begin(); isa<PHINode>(I); ++I) {
		PHINode *PHI = dyn_cast<PHINode>(I);
		unsigned Num = PHI->getNumIncomingValues();
		for (unsigned i = 0; i < Num; ++i) {
		if (PHI->getIncomingBlock(i) == OldPred)
		PHI->setIncomingBlock(i, NewPred);
		}
		}
		}

	bool LoopInterchangeTransform::adjustLoopBranches() {	bool LoopInterchangeTransform::adjustLoopBranches() {

	DEBUG(dbgs() << "adjustLoopBranches called\n");	DEBUG(dbgs() << "adjustLoopBranches called\n");
Context not available.
	OuterLoopHeaderBI->setSuccessor(i, InnerLoopHeaderSucessor);	OuterLoopHeaderBI->setSuccessor(i, InnerLoopHeaderSucessor);
	}	}

		// Adjust reduction PHI's now that the incoming block has changed.
		updateIncomingBlock(InnerLoopHeaderSucessor, InnerLoopHeader,
		OuterLoopHeader);

	BranchInst::Create(OuterLoopPreHeader, InnerLoopHeaderBI);	BranchInst::Create(OuterLoopPreHeader, InnerLoopHeaderBI);
	InnerLoopHeaderBI->eraseFromParent();	InnerLoopHeaderBI->eraseFromParent();

Context not available.
	InnerLoopLatchPredecessorBI->setSuccessor(i, InnerLoopLatchSuccessor);	InnerLoopLatchPredecessorBI->setSuccessor(i, InnerLoopLatchSuccessor);
	}	}

		// Adjust PHI nodes in InnerLoopLatchSuccessor. Update all uses of PHI with
		// the value and remove this PHI node from inner loop.
		SmallVector<PHINode *, 8> LcssaVec;
		for (auto I = InnerLoopLatchSuccessor->begin(); isa<PHINode>(I); ++I) {
		PHINode *LcssaPhi = cast<PHINode>(I);
		LcssaVec.push_back(LcssaPhi);
		}
		for (auto I = LcssaVec.begin(), E = LcssaVec.end(); I != E; ++I) {
		PHINode P = I;
		Value *Incoming = P->getIncomingValueForBlock(InnerLoopLatch);
		P->replaceAllUsesWith(Incoming);
		P->eraseFromParent();
		}

	if (OuterLoopLatchBI->getSuccessor(0) == OuterLoopHeader)	if (OuterLoopLatchBI->getSuccessor(0) == OuterLoopHeader)
	OuterLoopLatchSuccessor = OuterLoopLatchBI->getSuccessor(1);	OuterLoopLatchSuccessor = OuterLoopLatchBI->getSuccessor(1);
	else	else
Context not available.
	else	else
	InnerLoopLatchBI->setSuccessor(0, OuterLoopLatchSuccessor);	InnerLoopLatchBI->setSuccessor(0, OuterLoopLatchSuccessor);

		updateIncomingBlock(OuterLoopLatchSuccessor, OuterLoopLatch, InnerLoopLatch);

	if (OuterLoopLatchBI->getSuccessor(0) == OuterLoopLatchSuccessor) {	if (OuterLoopLatchBI->getSuccessor(0) == OuterLoopLatchSuccessor) {
	OuterLoopLatchBI->setSuccessor(0, InnerLoopLatch);	OuterLoopLatchBI->setSuccessor(0, InnerLoopLatch);
	} else {	} else {
Context not available.
	BranchInst *InnerTermBI =	BranchInst *InnerTermBI =
	cast<BranchInst>(InnerLoopPreHeader->getTerminator());	cast<BranchInst>(InnerLoopPreHeader->getTerminator());

	BasicBlock *HeaderSplit =
	SplitBlock(OuterLoopHeader, OuterLoopHeader->getTerminator(), DT, LI);
	Instruction *InsPoint = HeaderSplit->getFirstNonPHI();
	// These instructions should now be executed inside the loop.	// These instructions should now be executed inside the loop.
	// Move instruction into a new block after outer header.	// Move instruction into a new block after outer header.
	moveBBContents(InnerLoopPreHeader, InsPoint);	moveBBContents(InnerLoopPreHeader, OuterLoopHeader->getTerminator());
	// These instructions were not executed previously in the loop so move them to	// These instructions were not executed previously in the loop so move them to
	// the older inner loop preheader.	// the older inner loop preheader.
	moveBBContents(OuterLoopPreHeader, InnerTermBI);	moveBBContents(OuterLoopPreHeader, InnerTermBI);
Context not available.

test/Transforms/LoopInterchange/reductions.ll

				; RUN: opt < %s -basicaa -loop-interchange -S \| FileCheck %s
				target datalayout = "e-m:e-i64:64-f80:128-n8:16:32:64-S128"
				target triple = "x86_64-unknown-linux-gnu"

				rengolinUnsubmitted Not Done Reply Inline Actions This is a generic test, it must not contain a target triple or it will fail on all aches minus x86_64. We do want to test this in ARM, MIPS, PPC, etc, so we should remove the triple and make sure it works on all buildbots. rengolin: This is a generic test, it must not contain a target triple or it will fail on all aches minus…
				karthikthecoolAuthorUnsubmitted Not Done Reply Inline Actions Wont this only run if triple is x86_64-unknown-linux-gnu? But i agree i need to add some general tests as well. karthikthecool: Wont this only run if triple is x86_64-unknown-linux-gnu? But i agree i need to add some…
				rengolinUnsubmitted Not Done Reply Inline Actions No. To force it to run only on one architecture is to either REQUIRE: x86_64 or to move it into a directory that is platform-specific, with a lit.cfg file that does the same thing. But we want neither. Since this is not a platform pass, I think this optimisation should be tested in all platforms, unless we have a good reason not to. Taking away the triple will ensure that the target will be picked as the host, which is what we want. The IR might have to change to accommodate on other targets. I can test it on ARM/AArch64 to be sure. rengolin: No. To force it to run only on one architecture is to either REQUIRE: x86_64 or to move it into…
				karthikthecoolAuthorUnsubmitted Not Done Reply Inline Actions Interesting because i had added some testcases with target triple in the initial checkin of this pass and they seems to pass. Also i saw similar tests in LoopReroll which i took as reference. But i agree we need generic tests. I will add/update the same. Wow it would be great if you could help me with the results of ARM/AArch64. I will try out the same as well. karthikthecool: Interesting because i had added some testcases with target triple in the initial checkin of…
				@A = common global [500 x [500 x i32]] zeroinitializer
				@X = common global i32 0
				@B = common global [500 x [500 x i32]] zeroinitializer
				@Y = common global i32 0

				;; for( int i=1;i<N;i++)
				;; for( int j=1;j<N;j++)
				;; X+=A[j][i];

				define void @reduction_01(i32 %N) {
				entry:
				%cmp16 = icmp sgt i32 %N, 1
				br i1 %cmp16, label %for.body3.lr.ph, label %for.end8

				for.body3.lr.ph: ; preds = %entry, %for.cond1.for.inc6_crit_edge
				%indvars.iv18 = phi i64 [ %indvars.iv.next19, %for.cond1.for.inc6_crit_edge ], [ 1, %entry ]
				%X.promoted = load i32, i32* @X
				br label %for.body3

				for.body3: ; preds = %for.body3, %for.body3.lr.ph
				%indvars.iv = phi i64 [ 1, %for.body3.lr.ph ], [ %indvars.iv.next, %for.body3 ]
				%add15 = phi i32 [ %X.promoted, %for.body3.lr.ph ], [ %add, %for.body3 ]
				%arrayidx5 = getelementptr inbounds [500 x [500 x i32]], [500 x [500 x i32]]* @A, i64 0, i64 %indvars.iv, i64 %indvars.iv18
				%0 = load i32, i32* %arrayidx5
				%add = add nsw i32 %add15, %0
				%indvars.iv.next = add nuw nsw i64 %indvars.iv, 1
				%lftr.wideiv = trunc i64 %indvars.iv.next to i32
				%exitcond = icmp eq i32 %lftr.wideiv, %N
				br i1 %exitcond, label %for.cond1.for.inc6_crit_edge, label %for.body3

				for.cond1.for.inc6_crit_edge: ; preds = %for.body3
				store i32 %add, i32* @X
				%indvars.iv.next19 = add nuw nsw i64 %indvars.iv18, 1
				%lftr.wideiv20 = trunc i64 %indvars.iv.next19 to i32
				%exitcond21 = icmp eq i32 %lftr.wideiv20, %N
				br i1 %exitcond21, label %for.end8, label %for.body3.lr.ph

				for.end8: ; preds = %for.cond1.for.inc6_crit_edge, %entry
				ret void
				}

				; CHECK-LABEL: @reduction_01
				; CHECK: for.body3: ; preds = %for.body3.preheader, %for.body3.split
				; CHECK: %indvars.iv = phi i64 [ %indvars.iv.next, %for.body3.split ], [ 1, %for.body3.preheader ]
				; CHECK: br label %for.body3.lr.ph.preheader
				; CHECK: for.body3.split1: ; preds = %for.body3.lr.ph
				rengolinUnsubmitted Not Done Reply Inline Actions nitpick: you don't need to match %0 here. It won't match if some pass adds a new unrelated operation. Just match up to promoted. rengolin: nitpick: you don't need to match %0 here. It won't match if some pass adds a new unrelated…
				; CHECK: %arrayidx5 = getelementptr inbounds [500 x [500 x i32]], [500 x [500 x i32]]* @A, i64 0, i64 %indvars.iv, i64 %indvars.iv18
				; CHECK: %0 = load i32, i32* %arrayidx5
				; CHECK: %add = add nsw i32 %X.promoted, %0
				; CHECK: br label %for.cond1.for.inc6_crit_edge
				; CHECK: for.body3.split: ; preds = %for.cond1.for.inc6_crit_edge
				; CHECK: %indvars.iv.next = add nuw nsw i64 %indvars.iv, 1
				; CHECK: %lftr.wideiv = trunc i64 %indvars.iv.next to i32
				; CHECK: %exitcond = icmp eq i32 %lftr.wideiv, %N
				; CHECK: br i1 %exitcond, label %for.end8.loopexit, label %for.body3
				; CHECK: for.cond1.for.inc6_crit_edge: ; preds = %for.body3.split1
				; CHECK: store i32 %add, i32* @X
				; CHECK: %indvars.iv.next19 = add nuw nsw i64 %indvars.iv18, 1
				; CHECK: %lftr.wideiv20 = trunc i64 %indvars.iv.next19 to i32
				; CHECK: %exitcond21 = icmp eq i32 %lftr.wideiv20, %N
				; CHECK: br i1 %exitcond21, label %for.body3.split, label %for.body3.lr.ph


				;; Test for more than 1 reductions inside a loop.
				;; for( int i=1;i<N;i++)
				;; for( int j=1;j<N;j++)
				;; for( int k=1;k<N;k++) {
				;; X+=A[k][j];
				;; Y+=B[k][i];
				;; }

				define void @reduction_02(i32 %N) {
				entry:
				%cmp34 = icmp sgt i32 %N, 1
				br i1 %cmp34, label %for.cond4.preheader.preheader, label %for.end19

				for.cond4.preheader.preheader: ; preds = %entry, %for.inc17
				%indvars.iv40 = phi i64 [ %indvars.iv.next41, %for.inc17 ], [ 1, %entry ]
				br label %for.body6.lr.ph

				for.body6.lr.ph: ; preds = %for.cond4.for.inc14_crit_edge, %for.cond4.preheader.preheader
				%indvars.iv36 = phi i64 [ %indvars.iv.next37, %for.cond4.for.inc14_crit_edge ], [ 1, %for.cond4.preheader.preheader ]
				%X.promoted = load i32, i32* @X
				%Y.promoted = load i32, i32* @Y
				br label %for.body6

				for.body6: ; preds = %for.body6, %for.body6.lr.ph
				%indvars.iv = phi i64 [ 1, %for.body6.lr.ph ], [ %indvars.iv.next, %for.body6 ]
				%add1331 = phi i32 [ %Y.promoted, %for.body6.lr.ph ], [ %add13, %for.body6 ]
				%add30 = phi i32 [ %X.promoted, %for.body6.lr.ph ], [ %add, %for.body6 ]
				%arrayidx8 = getelementptr inbounds [500 x [500 x i32]], [500 x [500 x i32]]* @A, i64 0, i64 %indvars.iv, i64 %indvars.iv36
				%0 = load i32, i32* %arrayidx8
				%add = add nsw i32 %add30, %0
				%arrayidx12 = getelementptr inbounds [500 x [500 x i32]], [500 x [500 x i32]]* @B, i64 0, i64 %indvars.iv, i64 %indvars.iv40
				%1 = load i32, i32* %arrayidx12
				%add13 = add nsw i32 %add1331, %1
				%indvars.iv.next = add nuw nsw i64 %indvars.iv, 1
				%lftr.wideiv = trunc i64 %indvars.iv.next to i32
				%exitcond = icmp eq i32 %lftr.wideiv, %N
				br i1 %exitcond, label %for.cond4.for.inc14_crit_edge, label %for.body6

				for.cond4.for.inc14_crit_edge: ; preds = %for.body6
				store i32 %add, i32* @X
				store i32 %add13, i32* @Y
				%indvars.iv.next37 = add nuw nsw i64 %indvars.iv36, 1
				%lftr.wideiv38 = trunc i64 %indvars.iv.next37 to i32
				%exitcond39 = icmp eq i32 %lftr.wideiv38, %N
				br i1 %exitcond39, label %for.inc17, label %for.body6.lr.ph

				for.inc17: ; preds = %for.cond4.for.inc14_crit_edge
				rengolinUnsubmitted Not Done Reply Inline Actions Same here. Mainly because %1 and %0 were not part of the rest of the match. rengolin: Same here. Mainly because %1 and %0 were not part of the rest of the match.
				%indvars.iv.next41 = add nuw nsw i64 %indvars.iv40, 1
				%lftr.wideiv42 = trunc i64 %indvars.iv.next41 to i32
				%exitcond43 = icmp eq i32 %lftr.wideiv42, %N
				br i1 %exitcond43, label %for.end19, label %for.cond4.preheader.preheader

				for.end19: ; preds = %for.inc17, %entry
				ret void
				}

				; CHECK-LABEL: @reduction_02
				; CHECK: for.body6: ; preds = %for.body6.preheader, %for.body6.split
				rengolinUnsubmitted Not Done Reply Inline Actions These check lines are bound to fail on multiple architectures and with other optimisations coming in later, they could break the sequence or reorder instructions. You need to parse what's really relevant in the right order with the right arguments stored in variables "[[foo]]" and removed of architecture types to make sure it passes on 16/32/64 bit machines. The same is true for the other tests. rengolin: These check lines are bound to fail on multiple architectures and with other optimisations…
				karthikthecoolAuthorUnsubmitted Not Done Reply Inline Actions The problem here was I wanted to check if loops are interchanged properly and for that i was checking the complete structure of interchanged loop for correctness. I will try to reduce the checks. karthikthecool: The problem here was I wanted to check if loops are interchanged properly and for that i was…
				rengolinUnsubmitted Not Done Reply Inline Actions That's fine, but you can't rely on the types and names as much as in the sequence of instructions. Only parse types and variables if they must be of a specific pattern. If not, just check the instruction names and correct order. rengolin: That's fine, but you can't rely on the types and names as much as in the sequence of…
				karthikthecoolAuthorUnsubmitted Not Done Reply Inline Actions Got your point. Will update the tests by tomorrow. Thanks for the comments.:) karthikthecool: Got your point. Will update the tests by tomorrow. Thanks for the comments.:)
				; CHECK: %indvars.iv = phi i64 [ %indvars.iv.next, %for.body6.split ], [ 1, %for.body6.preheader ]
				; CHECK: br label %for.cond4.preheader.preheader.preheader
				; CHECK: for.body6.split1: ; preds = %for.body6.lr.ph
				; CHECK: %arrayidx8 = getelementptr inbounds [500 x [500 x i32]], [500 x [500 x i32]]* @A, i64 0, i64 %indvars.iv, i64 %indvars.iv36
				; CHECK: %0 = load i32, i32* %arrayidx8
				; CHECK: %add = add nsw i32 %X.promoted, %0
				; CHECK: %arrayidx12 = getelementptr inbounds [500 x [500 x i32]], [500 x [500 x i32]]* @B, i64 0, i64 %indvars.iv, i64 %indvars.iv40
				; CHECK: %1 = load i32, i32* %arrayidx12
				; CHECK: %add13 = add nsw i32 %Y.promoted, %1
				; CHECK: br label %for.cond4.for.inc14_crit_edge
				; CHECK: for.body6.split: ; preds = %for.inc17
				; CHECK: %indvars.iv.next = add nuw nsw i64 %indvars.iv, 1
				; CHECK: %lftr.wideiv = trunc i64 %indvars.iv.next to i32
				; CHECK: %exitcond = icmp eq i32 %lftr.wideiv, %N
				; CHECK: br i1 %exitcond, label %for.end19.loopexit, label %for.body6
				; CHECK: for.cond4.for.inc14_crit_edge: ; preds = %for.body6.split1
				; CHECK: store i32 %add, i32* @X
				; CHECK: store i32 %add13, i32* @Y
				; CHECK: %indvars.iv.next37 = add nuw nsw i64 %indvars.iv36, 1
				; CHECK: %lftr.wideiv38 = trunc i64 %indvars.iv.next37 to i32
				; CHECK: %exitcond39 = icmp eq i32 %lftr.wideiv38, %N
				; CHECK: br i1 %exitcond39, label %for.inc17, label %for.body6.lr.ph
				; CHECK: for.inc17: ; preds = %for.cond4.for.inc14_crit_edge
				; CHECK: %indvars.iv.next41 = add nuw nsw i64 %indvars.iv40, 1
				; CHECK: %lftr.wideiv42 = trunc i64 %indvars.iv.next41 to i32
				; CHECK: %exitcond43 = icmp eq i32 %lftr.wideiv42, %N
				; CHECK: br i1 %exitcond43, label %for.body6.split, label %for.cond4.preheader.preheader


				;; Not tightly nested. Do not interchange.
				;; for( int i=1;i<N;i++)
				;; for( int j=1;j<N;j++) {
				;; for( int k=1;k<N;k++) {
				;; X+=A[k][j];
				;; }
				;; Y+=B[j][i];
				;; }
				define void @reduction_03(i32 %N) {
				entry:
				%cmp35 = icmp sgt i32 %N, 1
				br i1 %cmp35, label %for.cond4.preheader.lr.ph, label %for.end19

				for.cond4.preheader.lr.ph: ; preds = %entry, %for.cond1.for.inc17_crit_edge
				%indvars.iv41 = phi i64 [ %indvars.iv.next42, %for.cond1.for.inc17_crit_edge ], [ 1, %entry ]
				%Y.promoted = load i32, i32* @Y
				br label %for.body6.lr.ph

				for.body6.lr.ph: ; preds = %for.cond4.preheader.lr.ph, %for.cond4.for.end_crit_edge
				%indvars.iv37 = phi i64 [ 1, %for.cond4.preheader.lr.ph ], [ %indvars.iv.next38, %for.cond4.for.end_crit_edge ]
				%add1334 = phi i32 [ %Y.promoted, %for.cond4.preheader.lr.ph ], [ %add13, %for.cond4.for.end_crit_edge ]
				%X.promoted = load i32, i32* @X
				br label %for.body6

				for.body6: ; preds = %for.body6, %for.body6.lr.ph
				%indvars.iv = phi i64 [ 1, %for.body6.lr.ph ], [ %indvars.iv.next, %for.body6 ]
				%add31 = phi i32 [ %X.promoted, %for.body6.lr.ph ], [ %add, %for.body6 ]
				%arrayidx8 = getelementptr inbounds [500 x [500 x i32]], [500 x [500 x i32]]* @A, i64 0, i64 %indvars.iv, i64 %indvars.iv37
				%0 = load i32, i32* %arrayidx8
				%add = add nsw i32 %add31, %0
				%indvars.iv.next = add nuw nsw i64 %indvars.iv, 1
				%lftr.wideiv = trunc i64 %indvars.iv.next to i32
				%exitcond = icmp eq i32 %lftr.wideiv, %N
				br i1 %exitcond, label %for.cond4.for.end_crit_edge, label %for.body6

				for.cond4.for.end_crit_edge: ; preds = %for.body6
				store i32 %add, i32* @X
				%arrayidx12 = getelementptr inbounds [500 x [500 x i32]], [500 x [500 x i32]]* @B, i64 0, i64 %indvars.iv37, i64 %indvars.iv41
				%1 = load i32, i32* %arrayidx12
				%add13 = add nsw i32 %add1334, %1
				%indvars.iv.next38 = add nuw nsw i64 %indvars.iv37, 1
				%lftr.wideiv39 = trunc i64 %indvars.iv.next38 to i32
				%exitcond40 = icmp eq i32 %lftr.wideiv39, %N
				br i1 %exitcond40, label %for.cond1.for.inc17_crit_edge, label %for.body6.lr.ph

				for.cond1.for.inc17_crit_edge: ; preds = %for.cond4.for.end_crit_edge
				store i32 %add13, i32* @Y
				%indvars.iv.next42 = add nuw nsw i64 %indvars.iv41, 1
				%lftr.wideiv43 = trunc i64 %indvars.iv.next42 to i32
				%exitcond44 = icmp eq i32 %lftr.wideiv43, %N
				br i1 %exitcond44, label %for.end19, label %for.cond4.preheader.lr.ph

				for.end19: ; preds = %for.cond1.for.inc17_crit_edge, %entry
				ret void
				}
				;; Not tightly nested. Do not interchange.
				; CHECK-LABEL: @reduction_03
				; CHECK: for.body6: ; preds = %for.body6.preheader, %for.body6
				; CHECK: %indvars.iv = phi i64 [ %indvars.iv.next, %for.body6 ], [ 1, %for.body6.preheader ]
				; CHECK: %add31 = phi i32 [ %add, %for.body6 ], [ %X.promoted, %for.body6.preheader ]
				; CHECK: %arrayidx8 = getelementptr inbounds [500 x [500 x i32]], [500 x [500 x i32]]* @A, i64 0, i64 %indvars.iv, i64 %indvars.iv37
				; CHECK: %0 = load i32, i32* %arrayidx8
				; CHECK: %add = add nsw i32 %add31, %0
				; CHECK: %indvars.iv.next = add nuw nsw i64 %indvars.iv, 1
				; CHECK: %lftr.wideiv = trunc i64 %indvars.iv.next to i32
				; CHECK: %exitcond = icmp eq i32 %lftr.wideiv, %N
				; CHECK: br i1 %exitcond, label %for.cond4.for.end_crit_edge, label %for.body6



				;; Multiple use of reduction not safe. Do not interchange.
				;; for( int i=1;i<N;i++)
				;; for( int j=1;j<N;j++)
				;; for( int k=1;k<N;k++) {
				;; X+=A[k][j];
				;; Y+=X;
				;; }
				define void @reduction_04(i32 %N) {
				entry:
				%cmp28 = icmp sgt i32 %N, 1
				br i1 %cmp28, label %for.cond4.preheader.preheader, label %for.end15

				for.cond4.preheader.preheader: ; preds = %entry, %for.inc13
				%i.029 = phi i32 [ %inc14, %for.inc13 ], [ 1, %entry ]
				br label %for.body6.lr.ph

				for.body6.lr.ph: ; preds = %for.cond4.for.inc10_crit_edge, %for.cond4.preheader.preheader
				%indvars.iv30 = phi i64 [ %indvars.iv.next31, %for.cond4.for.inc10_crit_edge ], [ 1, %for.cond4.preheader.preheader ]
				%X.promoted = load i32, i32* @X
				%Y.promoted = load i32, i32* @Y
				br label %for.body6

				for.body6: ; preds = %for.body6, %for.body6.lr.ph
				%indvars.iv = phi i64 [ 1, %for.body6.lr.ph ], [ %indvars.iv.next, %for.body6 ]
				%add925 = phi i32 [ %Y.promoted, %for.body6.lr.ph ], [ %add9, %for.body6 ]
				%add24 = phi i32 [ %X.promoted, %for.body6.lr.ph ], [ %add, %for.body6 ]
				%arrayidx8 = getelementptr inbounds [500 x [500 x i32]], [500 x [500 x i32]]* @A, i64 0, i64 %indvars.iv, i64 %indvars.iv30
				%0 = load i32, i32* %arrayidx8
				%add = add nsw i32 %add24, %0
				%add9 = add nsw i32 %add925, %add
				%indvars.iv.next = add nuw nsw i64 %indvars.iv, 1
				%lftr.wideiv = trunc i64 %indvars.iv.next to i32
				%exitcond = icmp eq i32 %lftr.wideiv, %N
				br i1 %exitcond, label %for.cond4.for.inc10_crit_edge, label %for.body6

				for.cond4.for.inc10_crit_edge: ; preds = %for.body6
				store i32 %add, i32* @X
				store i32 %add9, i32* @Y
				%indvars.iv.next31 = add nuw nsw i64 %indvars.iv30, 1
				%lftr.wideiv32 = trunc i64 %indvars.iv.next31 to i32
				%exitcond33 = icmp eq i32 %lftr.wideiv32, %N
				br i1 %exitcond33, label %for.inc13, label %for.body6.lr.ph

				for.inc13: ; preds = %for.cond4.for.inc10_crit_edge
				%inc14 = add nuw nsw i32 %i.029, 1
				%exitcond34 = icmp eq i32 %inc14, %N
				br i1 %exitcond34, label %for.end15, label %for.cond4.preheader.preheader

				for.end15: ; preds = %for.inc13, %entry
				ret void
				}

				; CHECK-LABEL: @reduction_04
				; CHECK: for.body6: ; preds = %for.body6.preheader, %for.body6
				; CHECK: %indvars.iv = phi i64 [ %indvars.iv.next, %for.body6 ], [ 1, %for.body6.preheader ]
				; CHECK: %add925 = phi i32 [ %add9, %for.body6 ], [ %Y.promoted, %for.body6.preheader ]
				; CHECK: %add24 = phi i32 [ %add, %for.body6 ], [ %X.promoted, %for.body6.preheader ]
				; CHECK: %arrayidx8 = getelementptr inbounds [500 x [500 x i32]], [500 x [500 x i32]]* @A, i64 0, i64 %indvars.iv, i64 %indvars.iv30
				; CHECK: %0 = load i32, i32* %arrayidx8
				; CHECK: %add = add nsw i32 %add24, %0
				; CHECK: %add9 = add nsw i32 %add925, %add
				; CHECK: %indvars.iv.next = add nuw nsw i64 %indvars.iv, 1
				; CHECK: %lftr.wideiv = trunc i64 %indvars.iv.next to i32
				; CHECK: %exitcond = icmp eq i32 %lftr.wideiv, %N
				; CHECK: br i1 %exitcond, label %for.cond4.for.inc10_crit_edge, label %for.body6

This is an archive of the discontinued LLVM Phabricator instance.

[LoopInterchange] Add support to interchange loops with reductions.ClosedPublic

Details

Diff Detail

Event Timeline

Revision Contents

Diff 22783

lib/Transforms/IPO/PassManagerBuilder.cpp

lib/Transforms/Scalar/LoopInterchange.cpp

test/Transforms/LoopInterchange/reductions.ll

[LoopInterchange] Add support to interchange loops with reductions.
ClosedPublic