This is an archive of the discontinued LLVM Phabricator instance.

[LoopInterchange] Add support to interchange loops with reductions.
ClosedPublic

Authored by karthikthecool on Mar 13 2015, 2:50 AM.

Download Raw Diff

Details

Reviewers

rengolin
nadav
aschwaighofer
jmolloy
hfinkel

Commits

rL235571: Add support to interchange loops with reductions.

Summary

Hi Hal,
Please find attached the patch to enable interchange of loops having reductions. The logic to detect a reduction/induction is borrowed from loop vectorizer code.

With this change we are now able to interchange matrix multiplication code such as one below-

for( int i=1;i<2048;i++)
  for( int j=1;j<2048;j++)
    for( int k=1;k<2048;k++) 
      A[i][j]+=B[i][k]*C[k][j];

into -

for( int k=1;k<2048;k++) 
  for( int i=1;i<2048;i++)
    for( int j=1;j<2048;j++)
      A[i][j]+=B[i][k]*C[k][j];

which now gets vectorized.

We observe a ~3X execution time improvement in the above code.
Please if you could let me know your inputs on the same.

Thanks and Regards
Karthik Bhat

Diff Detail

Repository: rL LLVM

Event Timeline

karthikthecool updated this revision to Diff 21907.Mar 13 2015, 2:50 AM

karthikthecool retitled this revision from to [LoopInterchange] Add support to interchange loops with reductions..

karthikthecool updated this object.

karthikthecool edited the test plan for this revision. (Show Details)

karthikthecool added reviewers: hfinkel, jmolloy.

karthikthecool added a subscriber: Unknown Object (MLST).

Hi Karthik,

Wouldn't this be better to be integrated into the vectorizer? Or at least, not duplicated code from the vectorizer?

Also, you have at least two other bold moves:

Move a lot of internal methods to static. This could be ok, but needs some further explanation why it is so.
Move the interchange pass down the pipeline. Again, it could be the right thing to do, but also needs some context.

Finally, how did you test and benchmark your changes? "It makes my code 3x faster" is not enough to include such a big change. The steps you need to complete to get such a change in are in order:

Run the "make check-all" tests and make sure they're green
Run the test-suite and make sure it passes *and* doesn't regress performance
Provide more information about the benchmarks that you tested it, relative improvements, regressions, etc.

cheers,
--renato

Hi Renato,
Thanks for looking into the patch. Please find my comments inline.

Wouldn't this be better to be integrated into the vectorizer? Or at least, not duplicated code from the vectorizer?

Since this is a complete pass integrating this will loop vectorizer would be a bit difficult. I can try to move common functions into a utility file if required.

Also, you have at least two other bold moves:

Move a lot of internal methods to static. This could be ok, but needs some further explanation why it is so.

These methods are helper functions. Helper functions are usually marked static in other Passes i had missed it in my initial commit hence corrected it here.

Move the interchange pass down the pipeline. Again, it could be the right thing to do, but also needs some context.

I had to move interchange pass down the pipeline and add licm and loop unswitch pass after loop interchange as after the inner loop header/loop latches are split and loops are interchanged we get multiple blocks with successors outside the loop(i.e. getExitingBlock was null). This kind of loop is not handled by vectorize and hence had to run licm and loop unswitch to remove unconditional branches between blocks in the inner loop.

Finally, how did you test and benchmark your changes?

I have run llvm lnt test cases but unfortunately didn't observe much improvement with this patch. I'm planning to run phoronix test suites to see if it gives improvement in some known benchmark. The matrix multiplication code was from one of our internal test cases which showed improvement post this patch. Have added few test cases (positive and negative) and ran make check all to make sure there are no regression. Also have written few local C test cases to check that the o/p's are same after interchange.

Run the "make check-all" tests and make sure they're green

Yes make check all passes with this patch. Have also added new test cases to test this feature.

Run the test-suite and make sure it passes *and* doesn't regress performance

Provide more information about the benchmarks that you tested it, relative improvements, regressions, etc.

Yes ran llvm lnt benchmark but didn't observe much improvement. No regressions were observed either. I have observed one crash in Dependency Analysis module which is triggered after this patch. Since this pass is currently disabled by default I was planning to address the crash in Dependency Analysis module separately. Will that be ok?

I will try to get more benchmark data using phoronix test suites over the weekend and get back to you.

Thanks once again for looking into the patch. Looking forward to your inputs on the patch.
Regards
Karthik Bhat

Hi Hal,Renato,
Refactor some common code into functions. I have currently borrowed and modified some functions from loop vectorizer. Do i need to refactor them into a common utility as well? These functions such as AddReductionVar seems to be a bit tightly bound with loop vectorizer code.

Second change is in PassManagerBuilder. Running SimplifyCFGPass after LoopInterchange is sufficient to merge and remove redundant basic blocks(blocks with just unconditional branch) produced after loop interhcange.Update the code to reflect the same.

I ran few phoronix benchmarks and lnt benchamrks but unfortunetly didn't see any improvement/regression due to this patch.

As mentioned in previous comments post this change code such as-

void matrixMult(int N, int M, int K) {
  for(int i=0;i<N;i++)
    for(int j=0;j<M;j++)
      for(int k=0;k<K;k++)
        A[i][j]+=B[i][k]*C[k][j];
}

gets vectorized givinig some execution time improvement during large matrix multiplication.

Please if you could let me know your inputs on the same.
Also are there are matrix multiplication benchark which i can test to see if the kind of code i mentioned above gets triggered?

Thanks and Regards
Karthik Bhat

rengolin added reviewers: nadav, aschwaighofer.Mar 21 2015, 8:18 AM

rengolin set the repository for this revision to rL LLVM.

In D8314#141802, @karthikthecool wrote:

Refactor some common code into functions. I have currently borrowed and modified some functions from loop vectorizer. Do i need to refactor them into a common utility as well? These functions such as AddReductionVar seems to be a bit tightly bound with loop vectorizer code.

Yes, they are, and I can see what the problem is. But there is a lot of duplication added by this patch and I'm still uncomfortable. I've added Nadav and Arnold, our loop vectorizer experts, to assist on what to do next.

I strongly suggest against duplication, and the only option I can think of is to spot the pattern while creating the reduction variable. You can create a function to iterate all containing loops and inspect all the ranges to make sure they match your pattern. Early exits should be made if the loop is not deep enough, or the outer loops don't iterate through any of the affected induction variables in your reduction.

Second change is in PassManagerBuilder. Running SimplifyCFGPass after LoopInterchange is sufficient to merge and remove redundant basic blocks(blocks with just unconditional branch) produced after loop interhcange.Update the code to reflect the same.

This is good news. Means that the pass is a lot less dramatic than you anticipated. :) This gives me hope that doing this inside the loop vectorizer can be managed.

I ran few phoronix benchmarks and lnt benchamrks but unfortunetly didn't see any improvement/regression due to this patch.

I'd say "fortunately", since you haven't introduced any regressions, and that's a great thing!

As mentioned in previous comments post this change code such as-
void matrixMult(int N, int M, int K) {
  for(int i=0;i<N;i++)
    for(int j=0;j<M;j++)
      for(int k=0;k<K;k++)
        A[i][j]+=B[i][k]*C[k][j];
}
gets vectorized givinig some execution time improvement during large matrix multiplication.

It seems we don't have that kind of benchmark on our test suite, and it would be good to have one. I don't know one off the top of my head, but maybe Hal/Nadav/Arnold could help.

cheers,
--renato

Thanks Renato, Tobias for your inputs. Please find my comments inline-

Yes, they are, and I can see what the problem is. But there is a lot of duplication added by this patch and I'm still uncomfortable. I've added Nadav and Arnold, our loop vectorizer experts, to assist on what to do next.

Sure. The functions currently duplicated are - isReductionPHI and helper functions and isInductionPHI. Is it ok to move them to somewhere like LoopBase. Loop can expose an API to check if a variable is induction/reduction in the loop? It can be reused by other modules in this case.

I strongly suggest against duplication, and the only option I can think of is to spot the pattern while creating the reduction variable. You can create a function to iterate all containing loops and inspect all the ranges to make sure they match your pattern. Early exits should be made if the loop is not deep enough, or the outer loops don't iterate through any of the affected induction variables in your reduction.

We do have early exits in the code. It was checked in in the initial version. E.g. min loop depth, dependency checks to see it interchange is safe etc are done before populating reduction/induction phi's.

This is good news. Means that the pass is a lot less dramatic than you anticipated. :) This gives me hope that doing this inside the loop vectorizer can be managed.

Yes :) . But i still feel that this should be a seperate pass and should not be moved inside loop vectorizer the reason is loop interchange is not specifically for vectorization of code. Based on the profitability model it can be used for cache resuse or register reuse etc. So having it as a seperate pass looks like a good option to me. Please let me know if you feel otherwise.

It seems we don't have that kind of benchmark on our test suite, and it would be good to have one. I don't know one off the top of my head, but maybe Hal/Nadav/Arnold could help.

Thanks Tobias for pointing out the test case SingleSource/Benchmarks/Polybench/linear-algebra/kernels/gemm/gemm.c. But the loop inside the test case is something like -

for (i = 0; i < _PB_NI; i++)
  for (j = 0; j < _PB_NJ; j++) {
    C[i][j] *= beta;
    for (k = 0; k < _PB_NK; ++k)
      C[i][j] += alpha * A[i][k] * B[k][j];
    }

Since loops j and k are not tightly nested we currently do not interchange these loops. As you mentioned loop interchange will give better results when it works along with other passes such as loop tiling, loop splitting etc.

Thanks for spending your valuable time on this patch. Is it ok to go ahead with moving duplicated code into somewhere as a Loop API/function or if you could suggest some better place to reduce code duplication.
Any other comments in general about the patch is welcome as well..:)
Thanks again for your time.
Regards
Karthik Bhat

In D8314#145073, @karthikthecool wrote:

Sure. The functions currently duplicated are - isReductionPHI and helper functions and isInductionPHI. Is it ok to move them to somewhere like LoopBase. Loop can expose an API to check if a variable is induction/reduction in the loop? It can be reused by other modules in this case.

I'm not sure this would work, because isReduction/Induction in the vectorizer is very specific to the internal vectorizer's data structures. Unless you manage to peel their common functionality without messing the vectorizer too much.

I agree that this pass is not just good for vectorizing, but it's applicable to a very specific use case which the whole point is to get vectorized. I'll see if I can get Nadav/Arnold's attention to this review...

cheers,
--renato

In D8314#145073, @karthikthecool wrote:

Sure. The functions currently duplicated are - isReductionPHI and helper functions and isInductionPHI. Is it ok to move them to somewhere like LoopBase. Loop can expose an API to check if a variable is induction/reduction in the loop? It can be reused by other modules in this case.

Karthik,

Can you try and refactor those two functions to be generic and make both loop vectorizer and interchange to use it?

cheers,
--renato

Hi Renato,
Sorry for the delay in followup on this patch was stuck in some other work. I was able to refactor isInductionVar out of LoopVectorizer and we are using the re-factored function in this patch.

Re-factoring AddReductionVar though seems a bit tricky as i'm not sure if we should expose all the enum/structs that are currently being used to support it. I'm currently only using a part of that code.

I have added a TODO for the same (i.e. re-factor isReductionPHI) for now. Is it OK to address refactoring of isReductionPHI in future? I wanted to address few other pending issues in LoopInterchange.

Please let me know if you feel otherwise. Will try to refactor the code on priority.
Thanks a lot for your time and help I really appreciate it.

Regards
Karthik Bhat

Add full context diff.

A gentle ping..

Hi Hal,Renato,
A gentle ping for review.

Hi Kharthik,

We were all in the EuroLLVM, and I'm still not back home. Sorry for the
delay, I'll look at it first thing tomorrow.

Cheers,
Renato

Hi Karthik,

Copying the vectorizer's reduction detection here is not the way forward. Please, refactor the detection part into a generic function.

My initial guess would be to create a vectorizer common library in lib/Transforms/Utils and move the reduction detection in there, like the other loop utilities, and get the vectorizer and your pass to use that.

cheers,
--renato

lib/Transforms/Scalar/LoopInterchange.cpp
104 ↗	(On Diff #22786)	avoid white space/empty line changes with code changes.
373 ↗	(On Diff #22786)	I don't think we should add this code here, not even with a TODO to refactor this, because this TODO will never be done.

Hi Renato,
Updated code as per reveiw comments. Refactored reduction identification code out of loop vectorizer and reusing the same in this pass. This code assumes D9046 has been applied.

Please let me know your inputs on this.

Thanks for your continued support. I really appreciate it.

Thanks and Regards
Karthik Bhat

Hi Karthik,

Looks a lot better, thanks!

I'll let @jmolloy review this one, as he was more tuned to this issue. I have no more concerns, thank you.

cheers,
--renato

Hi Renato,
Thanks for the review and time. Updating code to reflect changes done in r235284.

Hi Hal, James,
Could you please share your valuable inputs on this patch.
Thanks and Regards
Karthik Bhat

Hi Karthik,

The patch is looking good, apart from the few comments. I'd welcome @jmolloy's comments.

cheers,
--renato

lib/Transforms/Scalar/LoopInterchange.cpp
611 ↗	(On Diff #24012)	nitpick, please join this "else if".
628 ↗	(On Diff #24012)	nitpick, please join this "else if".
test/Transforms/LoopInterchange/reductions.ll
3 ↗	(On Diff #24012)	This is a generic test, it must not contain a target triple or it will fail on all aches minus x86_64. We do want to test this in ARM, MIPS, PPC, etc, so we should remove the triple and make sure it works on all buildbots.
124 ↗	(On Diff #24012)	These check lines are bound to fail on multiple architectures and with other optimisations coming in later, they could break the sequence or reorder instructions. You need to parse what's really relevant in the right order with the right arguments stored in variables "[[foo]]" and removed of architecture types to make sure it passes on 16/32/64 bit machines. The same is true for the other tests.

Hi Karthik,

See my inline comments.

Cheers,

James

lib/Transforms/Scalar/LoopInterchange.cpp
258 ↗	(On Diff #24012)	s/Theorm/Theorem
590 ↗	(On Diff #24012)	Can you rename this "areAllUsesReductions"? it sounds more like a boolean query, which is what this is. "Check" implies some action.
595 ↗	(On Diff #24012)	You can merge these two if's: if (!UserIns \|\| !ReductionDescriptor::isReductionPHI(UserIns, L, RD)) return false; If you wanted to be really cool... return !std::any(Ins->user_begin(), Ins->user_end(), [](User U) { PHINode UserIns = dyn_cast<PHINode>(*I); ReductionDescriptor RD; return UserIns && ReductionDescriptor::isReductionPHI(UserIns, L, RD); });
696 ↗	(On Diff #24012)	just: if (!L->getLoopLatch() \|\| !L->getLoopPredecessor()) return false;
707 ↗	(On Diff #24012)	Probably a good idea to have some debugging output here saying what PHI failed to be recognized?
732 ↗	(On Diff #24012)	Assert that this is 2?
757 ↗	(On Diff #24012)	Some debugging output saying why it failed would be nice
760 ↗	(On Diff #24012)	I don't like this, you're mutating the content of the class for no good reason. It would be better to explicitly give Inductions and Reductions (a stack local variable) to populateInductionAndReductions(), and rename it to findInductionAndReductions()
1063 ↗	(On Diff #24012)	You found this out in LoopInterchangeLegality, but threw away the result. Why recalculate it here?
1081 ↗	(On Diff #24012)	s/TODO/FIXME
1093 ↗	(On Diff #24012)	Just: for (auto U : PHI->users())
1138 ↗	(On Diff #24012)	Needs a bailout if I is not a PHINode.

Hi James, Renato,
Thanks for the comments. Updated the code to address review comments in LoopInterchange. Please find my comments inline.
Thanks and Regards
Karthik Bhat

lib/Transforms/Scalar/LoopInterchange.cpp
258 ↗	(On Diff #24012)	Updated.
590 ↗	(On Diff #24012)	Yes the new naming seems much better.Updated code.
595 ↗	(On Diff #24012)	Wow that was cool..:) got to learn and use lambda function. Updated code. But i think it shoulde be- return !std::any_of(Ins->user_begin(), Ins->user_end(), [=](User U) { PHINode UserIns = dyn_cast<PHINode>(U); ReductionDescriptor RD; return !UserIns \|\| !ReductionDescriptor::isReductionPHI(UserIns, L, RD); });
611 ↗	(On Diff #24012)	Updated code.
628 ↗	(On Diff #24012)	Updated code.
696 ↗	(On Diff #24012)	Updated code.
707 ↗	(On Diff #24012)	Added Debugging output.
732 ↗	(On Diff #24012)	Oops. This should be getNumSuccessors(). Updated code and added assertion.
757 ↗	(On Diff #24012)	Done.
760 ↗	(On Diff #24012)	Updated code.
1063 ↗	(On Diff #24012)	Modified code. Reusing value calculated by LoopInterchangeLegality in LoopInterchangeTransform.
1081 ↗	(On Diff #24012)	Done.
1093 ↗	(On Diff #24012)	I think this for loop was redundant. I wanted to replace all uses of PHI with that of incoming value from Header. PHI->replaceAllUsesWith(V); will suffice. This loop is not required deleted the same.
1138 ↗	(On Diff #24012)	Updated code to use "cast" instead of "dyn_cast" as I will always be a PHINode if it enters the for loop.
test/Transforms/LoopInterchange/reductions.ll
3 ↗	(On Diff #24012)	Wont this only run if triple is x86_64-unknown-linux-gnu? But i agree i need to add some general tests as well.
124 ↗	(On Diff #24012)	The problem here was I wanted to check if loops are interchanged properly and for that i was checking the complete structure of interchanged loop for correctness. I will try to reduce the checks.

Hi James,Renato,
Updated LoopInterchange.cpp as per review comments. Please have a look when you find time.
Thanks a lot for your time and guidance.
Thanks and Regards
Karthik Bhat

rengolin added inline comments.Apr 21 2015, 5:28 AM

test/Transforms/LoopInterchange/reductions.ll
3 ↗	(On Diff #24012)	No. To force it to run only on one architecture is to either REQUIRE: x86_64 or to move it into a directory that is platform-specific, with a lit.cfg file that does the same thing. But we want neither. Since this is not a platform pass, I think this optimisation should be tested in all platforms, unless we have a good reason not to. Taking away the triple will ensure that the target will be picked as the host, which is what we want. The IR might have to change to accommodate on other targets. I can test it on ARM/AArch64 to be sure.
124 ↗	(On Diff #24012)	That's fine, but you can't rely on the types and names as much as in the sequence of instructions. Only parse types and variables if they must be of a specific pattern. If not, just check the instruction names and correct order.

karthikthecool added inline comments.Apr 21 2015, 5:38 AM

test/Transforms/LoopInterchange/reductions.ll
3 ↗	(On Diff #24012)	Interesting because i had added some testcases with target triple in the initial checkin of this pass and they seems to pass. Also i saw similar tests in LoopReroll which i took as reference. But i agree we need generic tests. I will add/update the same. Wow it would be great if you could help me with the results of ARM/AArch64. I will try out the same as well.
124 ↗	(On Diff #24012)	Got your point. Will update the tests by tomorrow. Thanks for the comments.:)

OK, this looks OK to me now.

I tested on ARM and it works if you remove the data layout / triple from the test. Once you remove that and change the CHECK lines to be a bit less specific, looks good to me, too. Thanks!

Hi Renato,
Updated test cases as per comments. Please let me know if this looks good to you.
Verified with Debug+Assert build and make check-all on X86_64.

Thanks and Regards
Karthik Bhat

Hi Karthik,

Looks good to me with the two nitpicks. Feel free to commit with those changes.

Thanks!
-renato

test/Transforms/LoopInterchange/reductions.ll
49 ↗	(On Diff #24198)	nitpick: you don't need to match %0 here. It won't match if some pass adds a new unrelated operation. Just match up to promoted.
113 ↗	(On Diff #24198)	Same here. Mainly because %1 and %0 were not part of the rest of the match.

This revision is now accepted and ready to land.Apr 22 2015, 9:53 AM

Closed by commit rL235571: Add support to interchange loops with reductions. (authored by karthik). · Explain WhyApr 22 2015, 9:55 PM

This revision was automatically updated to reflect the committed changes.

Revision Contents

Path

Size

llvm/

trunk/

lib/

Transforms/

IPO/

PassManagerBuilder.cpp

5 lines

Scalar/

LoopInterchange.cpp

302 lines

test/

Transforms/

LoopInterchange/

reductions.ll

235 lines

Diff 24265

llvm/trunk/lib/Transforms/IPO/PassManagerBuilder.cpp

Show First 20 Lines • Show All 240 Lines • ▼ Show 20 Lines	void PassManagerBuilder::populateModulePassManager(
// Rotate Loop - disable header duplication at -Oz		// Rotate Loop - disable header duplication at -Oz
MPM.add(createLoopRotatePass(SizeLevel == 2 ? 0 : -1));		MPM.add(createLoopRotatePass(SizeLevel == 2 ? 0 : -1));
MPM.add(createLICMPass()); // Hoist loop invariants		MPM.add(createLICMPass()); // Hoist loop invariants
MPM.add(createLoopUnswitchPass(SizeLevel \|\| OptLevel < 3));		MPM.add(createLoopUnswitchPass(SizeLevel \|\| OptLevel < 3));
MPM.add(createInstructionCombiningPass());		MPM.add(createInstructionCombiningPass());
MPM.add(createIndVarSimplifyPass()); // Canonicalize indvars		MPM.add(createIndVarSimplifyPass()); // Canonicalize indvars
MPM.add(createLoopIdiomPass()); // Recognize idioms like memset.		MPM.add(createLoopIdiomPass()); // Recognize idioms like memset.
MPM.add(createLoopDeletionPass()); // Delete dead loops		MPM.add(createLoopDeletionPass()); // Delete dead loops
if (EnableLoopInterchange)		if (EnableLoopInterchange) {
MPM.add(createLoopInterchangePass()); // Interchange loops		MPM.add(createLoopInterchangePass()); // Interchange loops
		MPM.add(createCFGSimplificationPass());
		}
if (!DisableUnrollLoops)		if (!DisableUnrollLoops)
MPM.add(createSimpleLoopUnrollPass()); // Unroll small loops		MPM.add(createSimpleLoopUnrollPass()); // Unroll small loops
addExtensionsToPM(EP_LoopOptimizerEnd, MPM);		addExtensionsToPM(EP_LoopOptimizerEnd, MPM);

if (OptLevel > 1) {		if (OptLevel > 1) {
if (EnableMLSM)		if (EnableMLSM)
MPM.add(createMergedLoadStoreMotionPass()); // Merge ld/st in diamonds		MPM.add(createMergedLoadStoreMotionPass()); // Merge ld/st in diamonds
MPM.add(createGVNPass(DisableGVNLoadPRE)); // Remove redundancies		MPM.add(createGVNPass(DisableGVNLoadPRE)); // Remove redundancies
▲ Show 20 Lines • Show All 368 Lines • Show Last 20 Lines

llvm/trunk/lib/Transforms/Scalar/LoopInterchange.cpp

Show All 27 Lines
#include "llvm/Analysis/ScalarEvolutionExpressions.h"		#include "llvm/Analysis/ScalarEvolutionExpressions.h"
#include "llvm/Analysis/TargetTransformInfo.h"		#include "llvm/Analysis/TargetTransformInfo.h"
#include "llvm/Analysis/ValueTracking.h"		#include "llvm/Analysis/ValueTracking.h"
#include "llvm/IR/Dominators.h"		#include "llvm/IR/Dominators.h"
#include "llvm/IR/Function.h"		#include "llvm/IR/Function.h"
#include "llvm/IR/IRBuilder.h"		#include "llvm/IR/IRBuilder.h"
#include "llvm/IR/InstIterator.h"		#include "llvm/IR/InstIterator.h"
#include "llvm/IR/IntrinsicInst.h"		#include "llvm/IR/IntrinsicInst.h"
		#include "llvm/IR/Module.h"
#include "llvm/Pass.h"		#include "llvm/Pass.h"
#include "llvm/Support/Debug.h"		#include "llvm/Support/Debug.h"
#include "llvm/Support/raw_ostream.h"		#include "llvm/Support/raw_ostream.h"
#include "llvm/Transforms/Scalar.h"		#include "llvm/Transforms/Scalar.h"
#include "llvm/Transforms/Utils/BasicBlockUtils.h"		#include "llvm/Transforms/Utils/BasicBlockUtils.h"
#include "llvm/Transforms/Utils/LoopUtils.h"		#include "llvm/Transforms/Utils/LoopUtils.h"
#include "llvm/Transforms/Utils/SSAUpdater.h"		#include "llvm/Transforms/Utils/SSAUpdater.h"
using namespace llvm;		using namespace llvm;
Show All 21 Lines	for (auto I = DepMatrix.begin(), E = DepMatrix.end(); I != E; ++I) {
std::vector<char> Vec = *I;		std::vector<char> Vec = *I;
for (auto II = Vec.begin(), EE = Vec.end(); II != EE; ++II)		for (auto II = Vec.begin(), EE = Vec.end(); II != EE; ++II)
DEBUG(dbgs() << *II << " ");		DEBUG(dbgs() << *II << " ");
DEBUG(dbgs() << "\n");		DEBUG(dbgs() << "\n");
}		}
}		}
#endif		#endif

bool populateDependencyMatrix(CharMatrix &DepMatrix, unsigned Level, Loop *L,		static bool populateDependencyMatrix(CharMatrix &DepMatrix, unsigned Level,
DependenceAnalysis *DA) {		Loop L, DependenceAnalysis DA) {
typedef SmallVector<Value *, 16> ValueVector;		typedef SmallVector<Value *, 16> ValueVector;
ValueVector MemInstr;		ValueVector MemInstr;

if (Level > MaxLoopNestDepth) {		if (Level > MaxLoopNestDepth) {
DEBUG(dbgs() << "Cannot handle loops of depth greater than "		DEBUG(dbgs() << "Cannot handle loops of depth greater than "
<< MaxLoopNestDepth << "\n");		<< MaxLoopNestDepth << "\n");
return false;		return false;
}		}
▲ Show 20 Lines • Show All 95 Lines • ▼ Show 20 Lines	static bool populateDependencyMatrix(CharMatrix &DepMatrix, unsigned Level,
// We don't have a DepMatrix to check legality return false		// We don't have a DepMatrix to check legality return false
if (DepMatrix.size() == 0)		if (DepMatrix.size() == 0)
return false;		return false;
return true;		return true;
}		}

// A loop is moved from index 'from' to an index 'to'. Update the Dependence		// A loop is moved from index 'from' to an index 'to'. Update the Dependence
// matrix by exchanging the two columns.		// matrix by exchanging the two columns.
void interChangeDepedencies(CharMatrix &DepMatrix, unsigned FromIndx,		static void interChangeDepedencies(CharMatrix &DepMatrix, unsigned FromIndx,
unsigned ToIndx) {		unsigned ToIndx) {
unsigned numRows = DepMatrix.size();		unsigned numRows = DepMatrix.size();
for (unsigned i = 0; i < numRows; ++i) {		for (unsigned i = 0; i < numRows; ++i) {
char TmpVal = DepMatrix[i][ToIndx];		char TmpVal = DepMatrix[i][ToIndx];
DepMatrix[i][ToIndx] = DepMatrix[i][FromIndx];		DepMatrix[i][ToIndx] = DepMatrix[i][FromIndx];
DepMatrix[i][FromIndx] = TmpVal;		DepMatrix[i][FromIndx] = TmpVal;
}		}
}		}

// Checks if outermost non '=','S'or'I' dependence in the dependence matrix is		// Checks if outermost non '=','S'or'I' dependence in the dependence matrix is
// '>'		// '>'
bool isOuterMostDepPositive(CharMatrix &DepMatrix, unsigned Row,		static bool isOuterMostDepPositive(CharMatrix &DepMatrix, unsigned Row,
unsigned Column) {		unsigned Column) {
for (unsigned i = 0; i <= Column; ++i) {		for (unsigned i = 0; i <= Column; ++i) {
if (DepMatrix[Row][i] == '<')		if (DepMatrix[Row][i] == '<')
return false;		return false;
if (DepMatrix[Row][i] == '>')		if (DepMatrix[Row][i] == '>')
return true;		return true;
}		}
// All dependencies were '=','S' or 'I'		// All dependencies were '=','S' or 'I'
return false;		return false;
}		}

// Checks if no dependence exist in the dependency matrix in Row before Column.		// Checks if no dependence exist in the dependency matrix in Row before Column.
bool containsNoDependence(CharMatrix &DepMatrix, unsigned Row,		static bool containsNoDependence(CharMatrix &DepMatrix, unsigned Row,
unsigned Column) {		unsigned Column) {
for (unsigned i = 0; i < Column; ++i) {		for (unsigned i = 0; i < Column; ++i) {
if (DepMatrix[Row][i] != '=' \|\| DepMatrix[Row][i] != 'S' \|\|		if (DepMatrix[Row][i] != '=' \|\| DepMatrix[Row][i] != 'S' \|\|
DepMatrix[Row][i] != 'I')		DepMatrix[Row][i] != 'I')
return false;		return false;
}		}
return true;		return true;
}		}

bool validDepInterchange(CharMatrix &DepMatrix, unsigned Row,		static bool validDepInterchange(CharMatrix &DepMatrix, unsigned Row,
unsigned OuterLoopId, char InnerDep, char OuterDep) {		unsigned OuterLoopId, char InnerDep,
		char OuterDep) {

if (isOuterMostDepPositive(DepMatrix, Row, OuterLoopId))		if (isOuterMostDepPositive(DepMatrix, Row, OuterLoopId))
return false;		return false;

if (InnerDep == OuterDep)		if (InnerDep == OuterDep)
return true;		return true;

// It is legal to interchange if and only if after interchange no row has a		// It is legal to interchange if and only if after interchange no row has a
Show All 17 Lines	if (InnerDep == '>') {
if (!containsNoDependence(DepMatrix, Row, OuterLoopId))		if (!containsNoDependence(DepMatrix, Row, OuterLoopId))
return true;		return true;
}		}

return false;		return false;
}		}

// Checks if it is legal to interchange 2 loops.		// Checks if it is legal to interchange 2 loops.
// [Theorm] A permutation of the loops in a perfect nest is legal if and only if		// [Theorem] A permutation of the loops in a perfect nest is legal if and only
		// if
// the direction matrix, after the same permutation is applied to its columns,		// the direction matrix, after the same permutation is applied to its columns,
// has no ">" direction as the leftmost non-"=" direction in any row.		// has no ">" direction as the leftmost non-"=" direction in any row.
bool isLegalToInterChangeLoops(CharMatrix &DepMatrix, unsigned InnerLoopId,		static bool isLegalToInterChangeLoops(CharMatrix &DepMatrix,
		unsigned InnerLoopId,
unsigned OuterLoopId) {		unsigned OuterLoopId) {

unsigned NumRows = DepMatrix.size();		unsigned NumRows = DepMatrix.size();
// For each row check if it is valid to interchange.		// For each row check if it is valid to interchange.
for (unsigned Row = 0; Row < NumRows; ++Row) {		for (unsigned Row = 0; Row < NumRows; ++Row) {
char InnerDep = DepMatrix[Row][InnerLoopId];		char InnerDep = DepMatrix[Row][InnerLoopId];
char OuterDep = DepMatrix[Row][OuterLoopId];		char OuterDep = DepMatrix[Row][OuterLoopId];
if (InnerDep == '' \|\| OuterDep == '')		if (InnerDep == '' \|\| OuterDep == '')
return false;		return false;
▲ Show 20 Lines • Show All 54 Lines • ▼ Show 20 Lines	static PHINode getInductionVariable(Loop L, ScalarEvolution *SE) {
return nullptr;		return nullptr;
}		}

/// LoopInterchangeLegality checks if it is legal to interchange the loop.		/// LoopInterchangeLegality checks if it is legal to interchange the loop.
class LoopInterchangeLegality {		class LoopInterchangeLegality {
public:		public:
LoopInterchangeLegality(Loop Outer, Loop Inner, ScalarEvolution *SE,		LoopInterchangeLegality(Loop Outer, Loop Inner, ScalarEvolution *SE,
LoopInterchange *Pass)		LoopInterchange *Pass)
: OuterLoop(Outer), InnerLoop(Inner), SE(SE), CurrentPass(Pass) {}		: OuterLoop(Outer), InnerLoop(Inner), SE(SE), CurrentPass(Pass),
		InnerLoopHasReduction(false) {}

/// Check if the loops can be interchanged.		/// Check if the loops can be interchanged.
bool canInterchangeLoops(unsigned InnerLoopId, unsigned OuterLoopId,		bool canInterchangeLoops(unsigned InnerLoopId, unsigned OuterLoopId,
CharMatrix &DepMatrix);		CharMatrix &DepMatrix);
/// Check if the loop structure is understood. We do not handle triangular		/// Check if the loop structure is understood. We do not handle triangular
/// loops for now.		/// loops for now.
bool isLoopStructureUnderstood(PHINode *InnerInductionVar);		bool isLoopStructureUnderstood(PHINode *InnerInductionVar);

bool currentLimitations();		bool currentLimitations();

		bool hasInnerLoopReduction() { return InnerLoopHasReduction; }

private:		private:
bool tightlyNested(Loop Outer, Loop Inner);		bool tightlyNested(Loop Outer, Loop Inner);
		bool containsUnsafeInstructionsInHeader(BasicBlock *BB);
		bool areAllUsesReductions(Instruction Ins, Loop L);
		bool containsUnsafeInstructionsInLatch(BasicBlock *BB);
		bool findInductionAndReductions(Loop *L,
		SmallVector<PHINode *, 8> &Inductions,
		SmallVector<PHINode *, 8> &Reductions);
Loop *OuterLoop;		Loop *OuterLoop;
Loop *InnerLoop;		Loop *InnerLoop;

/// Scev analysis.		/// Scev analysis.
ScalarEvolution *SE;		ScalarEvolution *SE;
LoopInterchange *CurrentPass;		LoopInterchange *CurrentPass;

		bool InnerLoopHasReduction;
};		};

/// LoopInterchangeProfitability checks if it is profitable to interchange the		/// LoopInterchangeProfitability checks if it is profitable to interchange the
/// loop.		/// loop.
class LoopInterchangeProfitability {		class LoopInterchangeProfitability {
public:		public:
LoopInterchangeProfitability(Loop Outer, Loop Inner, ScalarEvolution *SE)		LoopInterchangeProfitability(Loop Outer, Loop Inner, ScalarEvolution *SE)
: OuterLoop(Outer), InnerLoop(Inner), SE(SE) {}		: OuterLoop(Outer), InnerLoop(Inner), SE(SE) {}
Show All 12 Lines	private:
ScalarEvolution *SE;		ScalarEvolution *SE;
};		};

/// LoopInterchangeTransform interchanges the loop		/// LoopInterchangeTransform interchanges the loop
class LoopInterchangeTransform {		class LoopInterchangeTransform {
public:		public:
LoopInterchangeTransform(Loop Outer, Loop Inner, ScalarEvolution *SE,		LoopInterchangeTransform(Loop Outer, Loop Inner, ScalarEvolution *SE,
LoopInfo LI, DominatorTree DT,		LoopInfo LI, DominatorTree DT,
LoopInterchange Pass, BasicBlock LoopNestExit)		LoopInterchange Pass, BasicBlock LoopNestExit,
		bool InnerLoopContainsReductions)
: OuterLoop(Outer), InnerLoop(Inner), SE(SE), LI(LI), DT(DT),		: OuterLoop(Outer), InnerLoop(Inner), SE(SE), LI(LI), DT(DT),
LoopExit(LoopNestExit) {}		LoopExit(LoopNestExit),
		InnerLoopHasReduction(InnerLoopContainsReductions) {}

/// Interchange OuterLoop and InnerLoop.		/// Interchange OuterLoop and InnerLoop.
bool transform();		bool transform();
void restructureLoops(Loop InnerLoop, Loop OuterLoop);		void restructureLoops(Loop InnerLoop, Loop OuterLoop);
void removeChildLoop(Loop OuterLoop, Loop InnerLoop);		void removeChildLoop(Loop OuterLoop, Loop InnerLoop);

private:		private:
void splitInnerLoopLatch(Instruction *);		void splitInnerLoopLatch(Instruction *);
void splitOuterLoopLatch();		void splitOuterLoopLatch();
void splitInnerLoopHeader();		void splitInnerLoopHeader();
bool adjustLoopLinks();		bool adjustLoopLinks();
void adjustLoopPreheaders();		void adjustLoopPreheaders();
void adjustOuterLoopPreheader();		void adjustOuterLoopPreheader();
void adjustInnerLoopPreheader();		void adjustInnerLoopPreheader();
bool adjustLoopBranches();		bool adjustLoopBranches();
		void updateIncomingBlock(BasicBlock CurrBlock, BasicBlock OldPred,
		BasicBlock *NewPred);

Loop *OuterLoop;		Loop *OuterLoop;
Loop *InnerLoop;		Loop *InnerLoop;

/// Scev analysis.		/// Scev analysis.
ScalarEvolution *SE;		ScalarEvolution *SE;
LoopInfo *LI;		LoopInfo *LI;
DominatorTree *DT;		DominatorTree *DT;
BasicBlock *LoopExit;		BasicBlock *LoopExit;
		bool InnerLoopHasReduction;
};		};

// Main LoopInterchange Pass		// Main LoopInterchange Pass
struct LoopInterchange : public FunctionPass {		struct LoopInterchange : public FunctionPass {
static char ID;		static char ID;
ScalarEvolution *SE;		ScalarEvolution *SE;
LoopInfo *LI;		LoopInfo *LI;
DependenceAnalysis *DA;		DependenceAnalysis *DA;
Show All 24 Lines	bool runOnFunction(Function &F) override {

for (Loop L : LI)		for (Loop L : LI)
populateWorklist(*L, Worklist);		populateWorklist(*L, Worklist);

DEBUG(dbgs() << "Worklist size = " << Worklist.size() << "\n");		DEBUG(dbgs() << "Worklist size = " << Worklist.size() << "\n");
bool Changed = true;		bool Changed = true;
while (!Worklist.empty()) {		while (!Worklist.empty()) {
LoopVector LoopList = Worklist.pop_back_val();		LoopVector LoopList = Worklist.pop_back_val();
Changed = processLoopList(LoopList);		Changed = processLoopList(LoopList, F);
}		}
return Changed;		return Changed;
}		}

bool isComputableLoopNest(LoopVector LoopList) {		bool isComputableLoopNest(LoopVector LoopList) {
for (auto I = LoopList.begin(), E = LoopList.end(); I != E; ++I) {		for (auto I = LoopList.begin(), E = LoopList.end(); I != E; ++I) {
Loop L = I;		Loop L = I;
const SCEV *ExitCountOuter = SE->getBackedgeTakenCount(L);		const SCEV *ExitCountOuter = SE->getBackedgeTakenCount(L);
Show All 14 Lines	struct LoopInterchange : public FunctionPass {
}		}

unsigned selectLoopForInterchange(LoopVector LoopList) {		unsigned selectLoopForInterchange(LoopVector LoopList) {
// TODO: Add a better heuristic to select the loop to be interchanged based		// TODO: Add a better heuristic to select the loop to be interchanged based
// on the dependece matrix. Currently we select the innermost loop.		// on the dependece matrix. Currently we select the innermost loop.
return LoopList.size() - 1;		return LoopList.size() - 1;
}		}

bool processLoopList(LoopVector LoopList) {		bool processLoopList(LoopVector LoopList, Function &F) {

bool Changed = false;		bool Changed = false;
bool containsLCSSAPHI = false;
CharMatrix DependencyMatrix;		CharMatrix DependencyMatrix;
if (LoopList.size() < 2) {		if (LoopList.size() < 2) {
DEBUG(dbgs() << "Loop doesn't contain minimum nesting level.\n");		DEBUG(dbgs() << "Loop doesn't contain minimum nesting level.\n");
return false;		return false;
}		}
if (!isComputableLoopNest(LoopList)) {		if (!isComputableLoopNest(LoopList)) {
DEBUG(dbgs() << "Not vaild loop candidate for interchange\n");		DEBUG(dbgs() << "Not vaild loop candidate for interchange\n");
return false;		return false;
Show All 25 Lines	#endif

// Get the Outermost loop exit.		// Get the Outermost loop exit.
BasicBlock *LoopNestExit;		BasicBlock *LoopNestExit;
if (OuterMostLoopLatchBI->getSuccessor(0) == OuterMostLoop->getHeader())		if (OuterMostLoopLatchBI->getSuccessor(0) == OuterMostLoop->getHeader())
LoopNestExit = OuterMostLoopLatchBI->getSuccessor(1);		LoopNestExit = OuterMostLoopLatchBI->getSuccessor(1);
else		else
LoopNestExit = OuterMostLoopLatchBI->getSuccessor(0);		LoopNestExit = OuterMostLoopLatchBI->getSuccessor(0);

for (auto I = LoopList.begin(), E = LoopList.end(); I != E; ++I) {		if (isa<PHINode>(LoopNestExit->begin())) {
Loop L = I;		DEBUG(dbgs() << "PHI Nodes in loop nest exit is not handled for now "
BasicBlock *Latch = L->getLoopLatch();		"since on failure all loops branch to loop nest exit.\n");
BasicBlock *Header = L->getHeader();
if (Latch && Latch != Header && isa<PHINode>(Latch->begin())) {
containsLCSSAPHI = true;
break;
}
}

// TODO: Handle lcssa PHI's. Currently LCSSA PHI's are not handled. Handle
// the same by splitting the loop latch and adjusting loop links
// accordingly.
if (containsLCSSAPHI)
return false;		return false;
		}

unsigned SelecLoopId = selectLoopForInterchange(LoopList);		unsigned SelecLoopId = selectLoopForInterchange(LoopList);
// Move the selected loop outwards to the best posible position.		// Move the selected loop outwards to the best posible position.
for (unsigned i = SelecLoopId; i > 0; i--) {		for (unsigned i = SelecLoopId; i > 0; i--) {
bool Interchanged =		bool Interchanged =
processLoop(LoopList, i, i - 1, LoopNestExit, DependencyMatrix);		processLoop(LoopList, i, i - 1, LoopNestExit, DependencyMatrix);
if (!Interchanged)		if (!Interchanged)
return Changed;		return Changed;
// Loops interchanged reflect the same in LoopList		// Loops interchanged reflect the same in LoopList
std::swap(LoopList[i - 1], LoopList[i]);		std::swap(LoopList[i - 1], LoopList[i]);

// Update the DependencyMatrix		// Update the DependencyMatrix
interChangeDepedencies(DependencyMatrix, i, i - 1);		interChangeDepedencies(DependencyMatrix, i, i - 1);
		DT->recalculate(F);
#ifdef DUMP_DEP_MATRICIES		#ifdef DUMP_DEP_MATRICIES
DEBUG(dbgs() << "Dependence after inter change \n");		DEBUG(dbgs() << "Dependence after inter change \n");
printDepMatrix(DependencyMatrix);		printDepMatrix(DependencyMatrix);
#endif		#endif
Changed \|= Interchanged;		Changed \|= Interchanged;
}		}
return Changed;		return Changed;
}		}
Show All 15 Lines	bool processLoop(LoopVector LoopList, unsigned InnerLoopId,
DEBUG(dbgs() << "Loops are legal to interchange\n");		DEBUG(dbgs() << "Loops are legal to interchange\n");
LoopInterchangeProfitability LIP(OuterLoop, InnerLoop, SE);		LoopInterchangeProfitability LIP(OuterLoop, InnerLoop, SE);
if (!LIP.isProfitable(InnerLoopId, OuterLoopId, DependencyMatrix)) {		if (!LIP.isProfitable(InnerLoopId, OuterLoopId, DependencyMatrix)) {
DEBUG(dbgs() << "Interchanging Loops not profitable\n");		DEBUG(dbgs() << "Interchanging Loops not profitable\n");
return false;		return false;
}		}

LoopInterchangeTransform LIT(OuterLoop, InnerLoop, SE, LI, DT, this,		LoopInterchangeTransform LIT(OuterLoop, InnerLoop, SE, LI, DT, this,
LoopNestExit);		LoopNestExit, LIL.hasInnerLoopReduction());
LIT.transform();		LIT.transform();
DEBUG(dbgs() << "Loops interchanged\n");		DEBUG(dbgs() << "Loops interchanged\n");
return true;		return true;
}		}
};		};

} // end of namespace		} // end of namespace
		bool LoopInterchangeLegality::areAllUsesReductions(Instruction Ins, Loop L) {
		return !std::any_of(Ins->user_begin(), Ins->user_end(), [=](User *U) -> bool {
		PHINode *UserIns = dyn_cast<PHINode>(U);
		ReductionDescriptor RD;
		return !UserIns \|\| !ReductionDescriptor::isReductionPHI(UserIns, L, RD);
		});
		}

static bool containsUnsafeInstructions(BasicBlock *BB) {		bool LoopInterchangeLegality::containsUnsafeInstructionsInHeader(
		BasicBlock *BB) {
for (auto I = BB->begin(), E = BB->end(); I != E; ++I) {		for (auto I = BB->begin(), E = BB->end(); I != E; ++I) {
if (I->mayHaveSideEffects() \|\| I->mayReadFromMemory())		// Load corresponding to reduction PHI's are safe while concluding if
		// tightly nested.
		if (LoadInst *L = dyn_cast<LoadInst>(I)) {
		if (!areAllUsesReductions(L, InnerLoop))
		return true;
		} else if (I->mayHaveSideEffects() \|\| I->mayReadFromMemory())
		return true;
		}
		return false;
		}

		bool LoopInterchangeLegality::containsUnsafeInstructionsInLatch(
		BasicBlock *BB) {
		for (auto I = BB->begin(), E = BB->end(); I != E; ++I) {
		// Stores corresponding to reductions are safe while concluding if tightly
		// nested.
		if (StoreInst *L = dyn_cast<StoreInst>(I)) {
		PHINode *PHI = dyn_cast<PHINode>(L->getOperand(0));
		if (!PHI)
		return true;
		} else if (I->mayHaveSideEffects() \|\| I->mayReadFromMemory())
return true;		return true;
}		}
return false;		return false;
}		}

bool LoopInterchangeLegality::tightlyNested(Loop OuterLoop, Loop InnerLoop) {		bool LoopInterchangeLegality::tightlyNested(Loop OuterLoop, Loop InnerLoop) {
BasicBlock *OuterLoopHeader = OuterLoop->getHeader();		BasicBlock *OuterLoopHeader = OuterLoop->getHeader();
BasicBlock *InnerLoopPreHeader = InnerLoop->getLoopPreheader();		BasicBlock *InnerLoopPreHeader = InnerLoop->getLoopPreheader();
Show All 13 Lines	for (unsigned i = 0; i < num; i++) {
if (outerLoopHeaderBI->getSuccessor(i) != InnerLoopPreHeader &&		if (outerLoopHeaderBI->getSuccessor(i) != InnerLoopPreHeader &&
outerLoopHeaderBI->getSuccessor(i) != OuterLoopLatch)		outerLoopHeaderBI->getSuccessor(i) != OuterLoopLatch)
return false;		return false;
}		}

DEBUG(dbgs() << "Checking instructions in Loop header and Loop latch \n");		DEBUG(dbgs() << "Checking instructions in Loop header and Loop latch \n");
// We do not have any basic block in between now make sure the outer header		// We do not have any basic block in between now make sure the outer header
// and outer loop latch doesnt contain any unsafe instructions.		// and outer loop latch doesnt contain any unsafe instructions.
if (containsUnsafeInstructions(OuterLoopHeader) \|\|		if (containsUnsafeInstructionsInHeader(OuterLoopHeader) \|\|
containsUnsafeInstructions(OuterLoopLatch))		containsUnsafeInstructionsInLatch(OuterLoopLatch))
return false;		return false;

DEBUG(dbgs() << "Loops are perfectly nested \n");		DEBUG(dbgs() << "Loops are perfectly nested \n");
// We have a perfect loop nest.		// We have a perfect loop nest.
return true;		return true;
}		}

static unsigned getPHICount(BasicBlock *BB) {
unsigned PhiCount = 0;
for (auto I = BB->begin(); isa<PHINode>(I); ++I)
PhiCount++;
return PhiCount;
}

bool LoopInterchangeLegality::isLoopStructureUnderstood(		bool LoopInterchangeLegality::isLoopStructureUnderstood(
PHINode *InnerInduction) {		PHINode *InnerInduction) {

unsigned Num = InnerInduction->getNumOperands();		unsigned Num = InnerInduction->getNumOperands();
BasicBlock *InnerLoopPreheader = InnerLoop->getLoopPreheader();		BasicBlock *InnerLoopPreheader = InnerLoop->getLoopPreheader();
for (unsigned i = 0; i < Num; ++i) {		for (unsigned i = 0; i < Num; ++i) {
Value *Val = InnerInduction->getOperand(i);		Value *Val = InnerInduction->getOperand(i);
Show All 10 Lines	if (InnerInduction->getIncomingBlock(IncomBlockIndx) ==
InnerLoopPreheader &&		InnerLoopPreheader &&
!OuterLoop->isLoopInvariant(I)) {		!OuterLoop->isLoopInvariant(I)) {
return false;		return false;
}		}
}		}
return true;		return true;
}		}

		bool LoopInterchangeLegality::findInductionAndReductions(
		Loop L, SmallVector<PHINode , 8> &Inductions,
		SmallVector<PHINode *, 8> &Reductions) {
		if (!L->getLoopLatch() \|\| !L->getLoopPredecessor())
		return false;
		for (BasicBlock::iterator I = L->getHeader()->begin(); isa<PHINode>(I); ++I) {
		ReductionDescriptor RD;
		PHINode *PHI = cast<PHINode>(I);
		ConstantInt *StepValue = nullptr;
		if (isInductionPHI(PHI, SE, StepValue))
		Inductions.push_back(PHI);
		else if (ReductionDescriptor::isReductionPHI(PHI, L, RD))
		Reductions.push_back(PHI);
		else {
		DEBUG(
		dbgs() << "Failed to recognize PHI as an induction or reduction.\n");
		return false;
		}
		}
		return true;
		}

		static bool containsSafePHI(BasicBlock *Block, bool isOuterLoopExitBlock) {
		for (auto I = Block->begin(); isa<PHINode>(I); ++I) {
		PHINode *PHI = cast<PHINode>(I);
		// Reduction lcssa phi will have only 1 incoming block that from loop latch.
		if (PHI->getNumIncomingValues() > 1)
		return false;
		Instruction *Ins = dyn_cast<Instruction>(PHI->getIncomingValue(0));
		if (!Ins)
		return false;
		// Incoming value for lcssa phi's in outer loop exit can only be inner loop
		// exits lcssa phi else it would not be tightly nested.
		if (!isa<PHINode>(Ins) && isOuterLoopExitBlock)
		return false;
		}
		return true;
		}

		static BasicBlock getLoopLatchExitBlock(BasicBlock LatchBlock,
		BasicBlock *LoopHeader) {
		if (BranchInst *BI = dyn_cast<BranchInst>(LatchBlock->getTerminator())) {
		unsigned Num = BI->getNumSuccessors();
		assert(Num == 2);
		for (unsigned i = 0; i < Num; ++i) {
		if (BI->getSuccessor(i) == LoopHeader)
		continue;
		return BI->getSuccessor(i);
		}
		}
		return nullptr;
		}

// This function indicates the current limitations in the transform as a result		// This function indicates the current limitations in the transform as a result
// of which we do not proceed.		// of which we do not proceed.
bool LoopInterchangeLegality::currentLimitations() {		bool LoopInterchangeLegality::currentLimitations() {

BasicBlock *InnerLoopPreHeader = InnerLoop->getLoopPreheader();		BasicBlock *InnerLoopPreHeader = InnerLoop->getLoopPreheader();
BasicBlock *InnerLoopHeader = InnerLoop->getHeader();		BasicBlock *InnerLoopHeader = InnerLoop->getHeader();
BasicBlock *OuterLoopHeader = OuterLoop->getHeader();
BasicBlock *InnerLoopLatch = InnerLoop->getLoopLatch();		BasicBlock *InnerLoopLatch = InnerLoop->getLoopLatch();
BasicBlock *OuterLoopLatch = OuterLoop->getLoopLatch();		BasicBlock *OuterLoopLatch = OuterLoop->getLoopLatch();
		BasicBlock *OuterLoopHeader = OuterLoop->getHeader();

PHINode *InnerInductionVar;		PHINode *InnerInductionVar;
PHINode *OuterInductionVar;		SmallVector<PHINode *, 8> Inductions;
		SmallVector<PHINode *, 8> Reductions;
// We currently handle only 1 induction variable inside the loop. We also do		if (!findInductionAndReductions(InnerLoop, Inductions, Reductions))
// not handle reductions as of now.
if (getPHICount(InnerLoopHeader) > 1)
return true;		return true;

if (getPHICount(OuterLoopHeader) > 1)		// TODO: Currently we handle only loops with 1 induction variable.
		if (Inductions.size() != 1) {
		DEBUG(dbgs() << "We currently only support loops with 1 induction variable."
		<< "Failed to interchange due to current limitation\n");
return true;		return true;
		}
		if (Reductions.size() > 0)
		InnerLoopHasReduction = true;

InnerInductionVar = getInductionVariable(InnerLoop, SE);		InnerInductionVar = Inductions.pop_back_val();
OuterInductionVar = getInductionVariable(OuterLoop, SE);		Reductions.clear();
		if (!findInductionAndReductions(OuterLoop, Inductions, Reductions))
		return true;

if (!OuterInductionVar \|\| !InnerInductionVar) {		// Outer loop cannot have reduction because then loops will not be tightly
DEBUG(dbgs() << "Induction variable not found\n");		// nested.
		if (!Reductions.empty())
		return true;
		// TODO: Currently we handle only loops with 1 induction variable.
		if (Inductions.size() != 1)
return true;		return true;
}

// TODO: Triangular loops are not handled for now.		// TODO: Triangular loops are not handled for now.
if (!isLoopStructureUnderstood(InnerInductionVar)) {		if (!isLoopStructureUnderstood(InnerInductionVar)) {
DEBUG(dbgs() << "Loop structure not understood by pass\n");		DEBUG(dbgs() << "Loop structure not understood by pass\n");
return true;		return true;
}		}

// TODO: Loops with LCSSA PHI's are currently not handled.		// TODO: We only handle LCSSA PHI's corresponding to reduction for now.
if (isa<PHINode>(OuterLoopLatch->begin())) {		BasicBlock *LoopExitBlock =
DEBUG(dbgs() << "Found and LCSSA PHI in outer loop latch\n");		getLoopLatchExitBlock(OuterLoopLatch, OuterLoopHeader);
		if (!LoopExitBlock \|\| !containsSafePHI(LoopExitBlock, true))
return true;		return true;
}
if (InnerLoopLatch != InnerLoopHeader &&		LoopExitBlock = getLoopLatchExitBlock(InnerLoopLatch, InnerLoopHeader);
isa<PHINode>(InnerLoopLatch->begin())) {		if (!LoopExitBlock \|\| !containsSafePHI(LoopExitBlock, false))
DEBUG(dbgs() << "Found and LCSSA PHI in inner loop latch\n");
return true;		return true;
}

// TODO: Current limitation: Since we split the inner loop latch at the point		// TODO: Current limitation: Since we split the inner loop latch at the point
// were induction variable is incremented (induction.next); We cannot have		// were induction variable is incremented (induction.next); We cannot have
// more than 1 user of induction.next since it would result in broken code		// more than 1 user of induction.next since it would result in broken code
// after split.		// after split.
// e.g.		// e.g.
// for(i=0;i<N;i++) {		// for(i=0;i<N;i++) {
// for(j = 0;j<M;j++) {		// for(j = 0;j<M;j++) {
▲ Show 20 Lines • Show All 62 Lines • ▼ Show 20 Lines	if (!OuterLoopPreHeader \|\| OuterLoopPreHeader == OuterLoop->getHeader() \|\|
OuterLoopPreHeader = InsertPreheaderForLoop(OuterLoop, CurrentPass);		OuterLoopPreHeader = InsertPreheaderForLoop(OuterLoop, CurrentPass);
}		}

if (!InnerLoopPreHeader \|\| InnerLoopPreHeader == InnerLoop->getHeader() \|\|		if (!InnerLoopPreHeader \|\| InnerLoopPreHeader == InnerLoop->getHeader() \|\|
InnerLoopPreHeader == OuterLoop->getHeader()) {		InnerLoopPreHeader == OuterLoop->getHeader()) {
InnerLoopPreHeader = InsertPreheaderForLoop(InnerLoop, CurrentPass);		InnerLoopPreHeader = InsertPreheaderForLoop(InnerLoop, CurrentPass);
}		}

// Check if the loops are tightly nested.
if (!tightlyNested(OuterLoop, InnerLoop)) {
DEBUG(dbgs() << "Loops not tightly nested\n");
return false;
}

// TODO: The loops could not be interchanged due to current limitations in the		// TODO: The loops could not be interchanged due to current limitations in the
// transform module.		// transform module.
if (currentLimitations()) {		if (currentLimitations()) {
DEBUG(dbgs() << "Not legal because of current transform limitation\n");		DEBUG(dbgs() << "Not legal because of current transform limitation\n");
return false;		return false;
}		}

		// Check if the loops are tightly nested.
		if (!tightlyNested(OuterLoop, InnerLoop)) {
		DEBUG(dbgs() << "Loops not tightly nested\n");
		return false;
		}

return true;		return true;
}		}

int LoopInterchangeProfitability::getInstrOrderCost() {		int LoopInterchangeProfitability::getInstrOrderCost() {
unsigned GoodOrder, BadOrder;		unsigned GoodOrder, BadOrder;
BadOrder = GoodOrder = 0;		BadOrder = GoodOrder = 0;
for (auto BI = InnerLoop->block_begin(), BE = InnerLoop->block_end();		for (auto BI = InnerLoop->block_begin(), BE = InnerLoop->block_end();
BI != BE; ++BI) {		BI != BE; ++BI) {
▲ Show 20 Lines • Show All 171 Lines • ▼ Show 20 Lines	void LoopInterchangeTransform::splitOuterLoopLatch() {
BasicBlock *OuterLoopLatch = OuterLoop->getLoopLatch();		BasicBlock *OuterLoopLatch = OuterLoop->getLoopLatch();
BasicBlock *OuterLatchLcssaPhiBlock = OuterLoopLatch;		BasicBlock *OuterLatchLcssaPhiBlock = OuterLoopLatch;
OuterLoopLatch = SplitBlock(OuterLatchLcssaPhiBlock,		OuterLoopLatch = SplitBlock(OuterLatchLcssaPhiBlock,
OuterLoopLatch->getFirstNonPHI(), DT, LI);		OuterLoopLatch->getFirstNonPHI(), DT, LI);
}		}

void LoopInterchangeTransform::splitInnerLoopHeader() {		void LoopInterchangeTransform::splitInnerLoopHeader() {

// Split the inner loop header out.		// Split the inner loop header out. Here make sure that the reduction PHI's
		// stay in the innerloop body.
BasicBlock *InnerLoopHeader = InnerLoop->getHeader();		BasicBlock *InnerLoopHeader = InnerLoop->getHeader();
		BasicBlock *InnerLoopPreHeader = InnerLoop->getLoopPreheader();
		if (InnerLoopHasReduction) {
		// FIXME: Check if the induction PHI will always be the first PHI.
		BasicBlock *New = InnerLoopHeader->splitBasicBlock(
		++(InnerLoopHeader->begin()), InnerLoopHeader->getName() + ".split");
		if (LI)
		if (Loop *L = LI->getLoopFor(InnerLoopHeader))
		L->addBasicBlockToLoop(New, *LI);

		// Adjust Reduction PHI's in the block.
		SmallVector<PHINode *, 8> PHIVec;
		for (auto I = New->begin(); isa<PHINode>(I); ++I) {
		PHINode *PHI = dyn_cast<PHINode>(I);
		Value *V = PHI->getIncomingValueForBlock(InnerLoopPreHeader);
		PHI->replaceAllUsesWith(V);
		PHIVec.push_back((PHI));
		}
		for (auto I = PHIVec.begin(), E = PHIVec.end(); I != E; ++I) {
		PHINode P = I;
		P->eraseFromParent();
		}
		} else {
SplitBlock(InnerLoopHeader, InnerLoopHeader->getFirstNonPHI(), DT, LI);		SplitBlock(InnerLoopHeader, InnerLoopHeader->getFirstNonPHI(), DT, LI);
		}

DEBUG(dbgs() << "Output of splitInnerLoopHeader InnerLoopHeaderSucc & "		DEBUG(dbgs() << "Output of splitInnerLoopHeader InnerLoopHeaderSucc & "
"InnerLoopHeader \n");		"InnerLoopHeader \n");
}		}

/// \brief Move all instructions except the terminator from FromBB right before		/// \brief Move all instructions except the terminator from FromBB right before
/// InsertBefore		/// InsertBefore
static void moveBBContents(BasicBlock FromBB, Instruction InsertBefore) {		static void moveBBContents(BasicBlock FromBB, Instruction InsertBefore) {
Show All 13 Lines

void LoopInterchangeTransform::adjustInnerLoopPreheader() {		void LoopInterchangeTransform::adjustInnerLoopPreheader() {
BasicBlock *InnerLoopPreHeader = InnerLoop->getLoopPreheader();		BasicBlock *InnerLoopPreHeader = InnerLoop->getLoopPreheader();
BasicBlock *OuterHeader = OuterLoop->getHeader();		BasicBlock *OuterHeader = OuterLoop->getHeader();

moveBBContents(InnerLoopPreHeader, OuterHeader->getTerminator());		moveBBContents(InnerLoopPreHeader, OuterHeader->getTerminator());
}		}

		void LoopInterchangeTransform::updateIncomingBlock(BasicBlock *CurrBlock,
		BasicBlock *OldPred,
		BasicBlock *NewPred) {
		for (auto I = CurrBlock->begin(); isa<PHINode>(I); ++I) {
		PHINode *PHI = cast<PHINode>(I);
		unsigned Num = PHI->getNumIncomingValues();
		for (unsigned i = 0; i < Num; ++i) {
		if (PHI->getIncomingBlock(i) == OldPred)
		PHI->setIncomingBlock(i, NewPred);
		}
		}
		}

bool LoopInterchangeTransform::adjustLoopBranches() {		bool LoopInterchangeTransform::adjustLoopBranches() {

DEBUG(dbgs() << "adjustLoopBranches called\n");		DEBUG(dbgs() << "adjustLoopBranches called\n");
// Adjust the loop preheader		// Adjust the loop preheader
BasicBlock *InnerLoopHeader = InnerLoop->getHeader();		BasicBlock *InnerLoopHeader = InnerLoop->getHeader();
BasicBlock *OuterLoopHeader = OuterLoop->getHeader();		BasicBlock *OuterLoopHeader = OuterLoop->getHeader();
BasicBlock *InnerLoopLatch = InnerLoop->getLoopLatch();		BasicBlock *InnerLoopLatch = InnerLoop->getLoopLatch();
BasicBlock *OuterLoopLatch = OuterLoop->getLoopLatch();		BasicBlock *OuterLoopLatch = OuterLoop->getLoopLatch();
▲ Show 20 Lines • Show All 41 Lines • ▼ Show 20 Lines	bool LoopInterchangeTransform::adjustLoopBranches() {
NumSucc = OuterLoopHeaderBI->getNumSuccessors();		NumSucc = OuterLoopHeaderBI->getNumSuccessors();
for (unsigned i = 0; i < NumSucc; ++i) {		for (unsigned i = 0; i < NumSucc; ++i) {
if (OuterLoopHeaderBI->getSuccessor(i) == OuterLoopLatch)		if (OuterLoopHeaderBI->getSuccessor(i) == OuterLoopLatch)
OuterLoopHeaderBI->setSuccessor(i, LoopExit);		OuterLoopHeaderBI->setSuccessor(i, LoopExit);
else if (OuterLoopHeaderBI->getSuccessor(i) == InnerLoopPreHeader)		else if (OuterLoopHeaderBI->getSuccessor(i) == InnerLoopPreHeader)
OuterLoopHeaderBI->setSuccessor(i, InnerLoopHeaderSucessor);		OuterLoopHeaderBI->setSuccessor(i, InnerLoopHeaderSucessor);
}		}

		// Adjust reduction PHI's now that the incoming block has changed.
		updateIncomingBlock(InnerLoopHeaderSucessor, InnerLoopHeader,
		OuterLoopHeader);

BranchInst::Create(OuterLoopPreHeader, InnerLoopHeaderBI);		BranchInst::Create(OuterLoopPreHeader, InnerLoopHeaderBI);
InnerLoopHeaderBI->eraseFromParent();		InnerLoopHeaderBI->eraseFromParent();

// -------------Adjust loop latches-----------		// -------------Adjust loop latches-----------
if (InnerLoopLatchBI->getSuccessor(0) == InnerLoopHeader)		if (InnerLoopLatchBI->getSuccessor(0) == InnerLoopHeader)
InnerLoopLatchSuccessor = InnerLoopLatchBI->getSuccessor(1);		InnerLoopLatchSuccessor = InnerLoopLatchBI->getSuccessor(1);
else		else
InnerLoopLatchSuccessor = InnerLoopLatchBI->getSuccessor(0);		InnerLoopLatchSuccessor = InnerLoopLatchBI->getSuccessor(0);

NumSucc = InnerLoopLatchPredecessorBI->getNumSuccessors();		NumSucc = InnerLoopLatchPredecessorBI->getNumSuccessors();
for (unsigned i = 0; i < NumSucc; ++i) {		for (unsigned i = 0; i < NumSucc; ++i) {
if (InnerLoopLatchPredecessorBI->getSuccessor(i) == InnerLoopLatch)		if (InnerLoopLatchPredecessorBI->getSuccessor(i) == InnerLoopLatch)
InnerLoopLatchPredecessorBI->setSuccessor(i, InnerLoopLatchSuccessor);		InnerLoopLatchPredecessorBI->setSuccessor(i, InnerLoopLatchSuccessor);
}		}

		// Adjust PHI nodes in InnerLoopLatchSuccessor. Update all uses of PHI with
		// the value and remove this PHI node from inner loop.
		SmallVector<PHINode *, 8> LcssaVec;
		for (auto I = InnerLoopLatchSuccessor->begin(); isa<PHINode>(I); ++I) {
		PHINode *LcssaPhi = cast<PHINode>(I);
		LcssaVec.push_back(LcssaPhi);
		}
		for (auto I = LcssaVec.begin(), E = LcssaVec.end(); I != E; ++I) {
		PHINode P = I;
		Value *Incoming = P->getIncomingValueForBlock(InnerLoopLatch);
		P->replaceAllUsesWith(Incoming);
		P->eraseFromParent();
		}

if (OuterLoopLatchBI->getSuccessor(0) == OuterLoopHeader)		if (OuterLoopLatchBI->getSuccessor(0) == OuterLoopHeader)
OuterLoopLatchSuccessor = OuterLoopLatchBI->getSuccessor(1);		OuterLoopLatchSuccessor = OuterLoopLatchBI->getSuccessor(1);
else		else
OuterLoopLatchSuccessor = OuterLoopLatchBI->getSuccessor(0);		OuterLoopLatchSuccessor = OuterLoopLatchBI->getSuccessor(0);

if (InnerLoopLatchBI->getSuccessor(1) == InnerLoopLatchSuccessor)		if (InnerLoopLatchBI->getSuccessor(1) == InnerLoopLatchSuccessor)
InnerLoopLatchBI->setSuccessor(1, OuterLoopLatchSuccessor);		InnerLoopLatchBI->setSuccessor(1, OuterLoopLatchSuccessor);
else		else
InnerLoopLatchBI->setSuccessor(0, OuterLoopLatchSuccessor);		InnerLoopLatchBI->setSuccessor(0, OuterLoopLatchSuccessor);

		updateIncomingBlock(OuterLoopLatchSuccessor, OuterLoopLatch, InnerLoopLatch);

if (OuterLoopLatchBI->getSuccessor(0) == OuterLoopLatchSuccessor) {		if (OuterLoopLatchBI->getSuccessor(0) == OuterLoopLatchSuccessor) {
OuterLoopLatchBI->setSuccessor(0, InnerLoopLatch);		OuterLoopLatchBI->setSuccessor(0, InnerLoopLatch);
} else {		} else {
OuterLoopLatchBI->setSuccessor(1, InnerLoopLatch);		OuterLoopLatchBI->setSuccessor(1, InnerLoopLatch);
}		}

return true;		return true;
}		}
void LoopInterchangeTransform::adjustLoopPreheaders() {		void LoopInterchangeTransform::adjustLoopPreheaders() {

// We have interchanged the preheaders so we need to interchange the data in		// We have interchanged the preheaders so we need to interchange the data in
// the preheader as well.		// the preheader as well.
// This is because the content of inner preheader was previously executed		// This is because the content of inner preheader was previously executed
// inside the outer loop.		// inside the outer loop.
BasicBlock *OuterLoopPreHeader = OuterLoop->getLoopPreheader();		BasicBlock *OuterLoopPreHeader = OuterLoop->getLoopPreheader();
BasicBlock *InnerLoopPreHeader = InnerLoop->getLoopPreheader();		BasicBlock *InnerLoopPreHeader = InnerLoop->getLoopPreheader();
BasicBlock *OuterLoopHeader = OuterLoop->getHeader();		BasicBlock *OuterLoopHeader = OuterLoop->getHeader();
BranchInst *InnerTermBI =		BranchInst *InnerTermBI =
cast<BranchInst>(InnerLoopPreHeader->getTerminator());		cast<BranchInst>(InnerLoopPreHeader->getTerminator());

BasicBlock *HeaderSplit =
SplitBlock(OuterLoopHeader, OuterLoopHeader->getTerminator(), DT, LI);
Instruction *InsPoint = HeaderSplit->getFirstNonPHI();
// These instructions should now be executed inside the loop.		// These instructions should now be executed inside the loop.
// Move instruction into a new block after outer header.		// Move instruction into a new block after outer header.
moveBBContents(InnerLoopPreHeader, InsPoint);		moveBBContents(InnerLoopPreHeader, OuterLoopHeader->getTerminator());
// These instructions were not executed previously in the loop so move them to		// These instructions were not executed previously in the loop so move them to
// the older inner loop preheader.		// the older inner loop preheader.
moveBBContents(OuterLoopPreHeader, InnerTermBI);		moveBBContents(OuterLoopPreHeader, InnerTermBI);
}		}

bool LoopInterchangeTransform::adjustLoopLinks() {		bool LoopInterchangeTransform::adjustLoopLinks() {

// Adjust all branches in the inner and outer loop.		// Adjust all branches in the inner and outer loop.
Show All 21 Lines

llvm/trunk/test/Transforms/LoopInterchange/reductions.ll

				; RUN: opt < %s -basicaa -loop-interchange -S \| FileCheck %s

				@A = common global [500 x [500 x i32]] zeroinitializer
				@X = common global i32 0
				@B = common global [500 x [500 x i32]] zeroinitializer
				@Y = common global i32 0

				;; for( int i=1;i<N;i++)
				;; for( int j=1;j<N;j++)
				;; X+=A[j][i];

				define void @reduction_01(i32 %N) {
				entry:
				%cmp16 = icmp sgt i32 %N, 1
				br i1 %cmp16, label %for.body3.lr.ph, label %for.end8

				for.body3.lr.ph: ; preds = %entry, %for.cond1.for.inc6_crit_edge
				%indvars.iv18 = phi i64 [ %indvars.iv.next19, %for.cond1.for.inc6_crit_edge ], [ 1, %entry ]
				%X.promoted = load i32, i32* @X
				br label %for.body3

				for.body3: ; preds = %for.body3, %for.body3.lr.ph
				%indvars.iv = phi i64 [ 1, %for.body3.lr.ph ], [ %indvars.iv.next, %for.body3 ]
				%add15 = phi i32 [ %X.promoted, %for.body3.lr.ph ], [ %add, %for.body3 ]
				%arrayidx5 = getelementptr inbounds [500 x [500 x i32]], [500 x [500 x i32]]* @A, i64 0, i64 %indvars.iv, i64 %indvars.iv18
				%0 = load i32, i32* %arrayidx5
				%add = add nsw i32 %add15, %0
				%indvars.iv.next = add nuw nsw i64 %indvars.iv, 1
				%lftr.wideiv = trunc i64 %indvars.iv.next to i32
				%exitcond = icmp eq i32 %lftr.wideiv, %N
				br i1 %exitcond, label %for.cond1.for.inc6_crit_edge, label %for.body3

				for.cond1.for.inc6_crit_edge: ; preds = %for.body3
				store i32 %add, i32* @X
				%indvars.iv.next19 = add nuw nsw i64 %indvars.iv18, 1
				%lftr.wideiv20 = trunc i64 %indvars.iv.next19 to i32
				%exitcond21 = icmp eq i32 %lftr.wideiv20, %N
				br i1 %exitcond21, label %for.end8, label %for.body3.lr.ph

				for.end8: ; preds = %for.cond1.for.inc6_crit_edge, %entry
				ret void
				}

				;; Loop is interchanged check that the phi nodes are split and the promoted value is used instead of the reduction phi.
				; CHECK-LABEL: @reduction_01
				; CHECK: for.body3: ; preds = %for.body3.preheader, %for.body3.split
				; CHECK: %indvars.iv = phi i64 [ %indvars.iv.next, %for.body3.split ], [ 1, %for.body3.preheader ]
				; CHECK: br label %for.body3.lr.ph.preheader
				; CHECK: %add = add nsw i32 %X.promoted


				;; Test for more than 1 reductions inside a loop.
				;; for( int i=1;i<N;i++)
				;; for( int j=1;j<N;j++)
				;; for( int k=1;k<N;k++) {
				;; X+=A[k][j];
				;; Y+=B[k][i];
				;; }

				define void @reduction_02(i32 %N) {
				entry:
				%cmp34 = icmp sgt i32 %N, 1
				br i1 %cmp34, label %for.cond4.preheader.preheader, label %for.end19

				for.cond4.preheader.preheader: ; preds = %entry, %for.inc17
				%indvars.iv40 = phi i64 [ %indvars.iv.next41, %for.inc17 ], [ 1, %entry ]
				br label %for.body6.lr.ph

				for.body6.lr.ph: ; preds = %for.cond4.for.inc14_crit_edge, %for.cond4.preheader.preheader
				%indvars.iv36 = phi i64 [ %indvars.iv.next37, %for.cond4.for.inc14_crit_edge ], [ 1, %for.cond4.preheader.preheader ]
				%X.promoted = load i32, i32* @X
				%Y.promoted = load i32, i32* @Y
				br label %for.body6

				for.body6: ; preds = %for.body6, %for.body6.lr.ph
				%indvars.iv = phi i64 [ 1, %for.body6.lr.ph ], [ %indvars.iv.next, %for.body6 ]
				%add1331 = phi i32 [ %Y.promoted, %for.body6.lr.ph ], [ %add13, %for.body6 ]
				%add30 = phi i32 [ %X.promoted, %for.body6.lr.ph ], [ %add, %for.body6 ]
				%arrayidx8 = getelementptr inbounds [500 x [500 x i32]], [500 x [500 x i32]]* @A, i64 0, i64 %indvars.iv, i64 %indvars.iv36
				%0 = load i32, i32* %arrayidx8
				%add = add nsw i32 %add30, %0
				%arrayidx12 = getelementptr inbounds [500 x [500 x i32]], [500 x [500 x i32]]* @B, i64 0, i64 %indvars.iv, i64 %indvars.iv40
				%1 = load i32, i32* %arrayidx12
				%add13 = add nsw i32 %add1331, %1
				%indvars.iv.next = add nuw nsw i64 %indvars.iv, 1
				%lftr.wideiv = trunc i64 %indvars.iv.next to i32
				%exitcond = icmp eq i32 %lftr.wideiv, %N
				br i1 %exitcond, label %for.cond4.for.inc14_crit_edge, label %for.body6

				for.cond4.for.inc14_crit_edge: ; preds = %for.body6
				store i32 %add, i32* @X
				store i32 %add13, i32* @Y
				%indvars.iv.next37 = add nuw nsw i64 %indvars.iv36, 1
				%lftr.wideiv38 = trunc i64 %indvars.iv.next37 to i32
				%exitcond39 = icmp eq i32 %lftr.wideiv38, %N
				br i1 %exitcond39, label %for.inc17, label %for.body6.lr.ph

				for.inc17: ; preds = %for.cond4.for.inc14_crit_edge
				%indvars.iv.next41 = add nuw nsw i64 %indvars.iv40, 1
				%lftr.wideiv42 = trunc i64 %indvars.iv.next41 to i32
				%exitcond43 = icmp eq i32 %lftr.wideiv42, %N
				br i1 %exitcond43, label %for.end19, label %for.cond4.preheader.preheader

				for.end19: ; preds = %for.inc17, %entry
				ret void
				}

				;; Loop is interchanged check that the phi nodes are split and the promoted value is used instead of the reduction phi.
				; CHECK-LABEL: @reduction_02
				; CHECK: for.body6: ; preds = %for.body6.preheader, %for.body6.split
				; CHECK: %indvars.iv = phi i64 [ %indvars.iv.next, %for.body6.split ], [ 1, %for.body6.preheader ]
				; CHECK: br label %for.cond4.preheader.preheader.preheader
				; CHECK: %add13 = add nsw i32 %Y.promoted


				;; Not tightly nested. Do not interchange.
				;; for( int i=1;i<N;i++)
				;; for( int j=1;j<N;j++) {
				;; for( int k=1;k<N;k++) {
				;; X+=A[k][j];
				;; }
				;; Y+=B[j][i];
				;; }
				define void @reduction_03(i32 %N) {
				entry:
				%cmp35 = icmp sgt i32 %N, 1
				br i1 %cmp35, label %for.cond4.preheader.lr.ph, label %for.end19

				for.cond4.preheader.lr.ph: ; preds = %entry, %for.cond1.for.inc17_crit_edge
				%indvars.iv41 = phi i64 [ %indvars.iv.next42, %for.cond1.for.inc17_crit_edge ], [ 1, %entry ]
				%Y.promoted = load i32, i32* @Y
				br label %for.body6.lr.ph

				for.body6.lr.ph: ; preds = %for.cond4.preheader.lr.ph, %for.cond4.for.end_crit_edge
				%indvars.iv37 = phi i64 [ 1, %for.cond4.preheader.lr.ph ], [ %indvars.iv.next38, %for.cond4.for.end_crit_edge ]
				%add1334 = phi i32 [ %Y.promoted, %for.cond4.preheader.lr.ph ], [ %add13, %for.cond4.for.end_crit_edge ]
				%X.promoted = load i32, i32* @X
				br label %for.body6

				for.body6: ; preds = %for.body6, %for.body6.lr.ph
				%indvars.iv = phi i64 [ 1, %for.body6.lr.ph ], [ %indvars.iv.next, %for.body6 ]
				%add31 = phi i32 [ %X.promoted, %for.body6.lr.ph ], [ %add, %for.body6 ]
				%arrayidx8 = getelementptr inbounds [500 x [500 x i32]], [500 x [500 x i32]]* @A, i64 0, i64 %indvars.iv, i64 %indvars.iv37
				%0 = load i32, i32* %arrayidx8
				%add = add nsw i32 %add31, %0
				%indvars.iv.next = add nuw nsw i64 %indvars.iv, 1
				%lftr.wideiv = trunc i64 %indvars.iv.next to i32
				%exitcond = icmp eq i32 %lftr.wideiv, %N
				br i1 %exitcond, label %for.cond4.for.end_crit_edge, label %for.body6

				for.cond4.for.end_crit_edge: ; preds = %for.body6
				store i32 %add, i32* @X
				%arrayidx12 = getelementptr inbounds [500 x [500 x i32]], [500 x [500 x i32]]* @B, i64 0, i64 %indvars.iv37, i64 %indvars.iv41
				%1 = load i32, i32* %arrayidx12
				%add13 = add nsw i32 %add1334, %1
				%indvars.iv.next38 = add nuw nsw i64 %indvars.iv37, 1
				%lftr.wideiv39 = trunc i64 %indvars.iv.next38 to i32
				%exitcond40 = icmp eq i32 %lftr.wideiv39, %N
				br i1 %exitcond40, label %for.cond1.for.inc17_crit_edge, label %for.body6.lr.ph

				for.cond1.for.inc17_crit_edge: ; preds = %for.cond4.for.end_crit_edge
				store i32 %add13, i32* @Y
				%indvars.iv.next42 = add nuw nsw i64 %indvars.iv41, 1
				%lftr.wideiv43 = trunc i64 %indvars.iv.next42 to i32
				%exitcond44 = icmp eq i32 %lftr.wideiv43, %N
				br i1 %exitcond44, label %for.end19, label %for.cond4.preheader.lr.ph

				for.end19: ; preds = %for.cond1.for.inc17_crit_edge, %entry
				ret void
				}
				;; Not tightly nested. Do not interchange.
				;; Not interchanged hence the phi's in the inner loop will not be split. Check for the same.
				; CHECK-LABEL: @reduction_03
				; CHECK: for.body6: ; preds = %for.body6.preheader, %for.body6
				; CHECK: %indvars.iv = phi i64 [ %indvars.iv.next, %for.body6 ], [ 1, %for.body6.preheader ]
				; CHECK: %add31 = phi i32 [ %add, %for.body6 ], [ %X.promoted, %for.body6.preheader ]



				;; Multiple use of reduction not safe. Do not interchange.
				;; for( int i=1;i<N;i++)
				;; for( int j=1;j<N;j++)
				;; for( int k=1;k<N;k++) {
				;; X+=A[k][j];
				;; Y+=X;
				;; }
				define void @reduction_04(i32 %N) {
				entry:
				%cmp28 = icmp sgt i32 %N, 1
				br i1 %cmp28, label %for.cond4.preheader.preheader, label %for.end15

				for.cond4.preheader.preheader: ; preds = %entry, %for.inc13
				%i.029 = phi i32 [ %inc14, %for.inc13 ], [ 1, %entry ]
				br label %for.body6.lr.ph

				for.body6.lr.ph: ; preds = %for.cond4.for.inc10_crit_edge, %for.cond4.preheader.preheader
				%indvars.iv30 = phi i64 [ %indvars.iv.next31, %for.cond4.for.inc10_crit_edge ], [ 1, %for.cond4.preheader.preheader ]
				%X.promoted = load i32, i32* @X
				%Y.promoted = load i32, i32* @Y
				br label %for.body6

				for.body6: ; preds = %for.body6, %for.body6.lr.ph
				%indvars.iv = phi i64 [ 1, %for.body6.lr.ph ], [ %indvars.iv.next, %for.body6 ]
				%add925 = phi i32 [ %Y.promoted, %for.body6.lr.ph ], [ %add9, %for.body6 ]
				%add24 = phi i32 [ %X.promoted, %for.body6.lr.ph ], [ %add, %for.body6 ]
				%arrayidx8 = getelementptr inbounds [500 x [500 x i32]], [500 x [500 x i32]]* @A, i64 0, i64 %indvars.iv, i64 %indvars.iv30
				%0 = load i32, i32* %arrayidx8
				%add = add nsw i32 %add24, %0
				%add9 = add nsw i32 %add925, %add
				%indvars.iv.next = add nuw nsw i64 %indvars.iv, 1
				%lftr.wideiv = trunc i64 %indvars.iv.next to i32
				%exitcond = icmp eq i32 %lftr.wideiv, %N
				br i1 %exitcond, label %for.cond4.for.inc10_crit_edge, label %for.body6

				for.cond4.for.inc10_crit_edge: ; preds = %for.body6
				store i32 %add, i32* @X
				store i32 %add9, i32* @Y
				%indvars.iv.next31 = add nuw nsw i64 %indvars.iv30, 1
				%lftr.wideiv32 = trunc i64 %indvars.iv.next31 to i32
				%exitcond33 = icmp eq i32 %lftr.wideiv32, %N
				br i1 %exitcond33, label %for.inc13, label %for.body6.lr.ph

				for.inc13: ; preds = %for.cond4.for.inc10_crit_edge
				%inc14 = add nuw nsw i32 %i.029, 1
				%exitcond34 = icmp eq i32 %inc14, %N
				br i1 %exitcond34, label %for.end15, label %for.cond4.preheader.preheader

				for.end15: ; preds = %for.inc13, %entry
				ret void
				}
				;; Not interchanged hence the phi's in the inner loop will not be split. Check for the same.
				; CHECK-LABEL: @reduction_04
				; CHECK: for.body6: ; preds = %for.body6.preheader, %for.body6
				; CHECK: %indvars.iv = phi i64 [ %indvars.iv.next, %for.body6 ], [ 1, %for.body6.preheader ]
				; CHECK: %add925 = phi i32 [ %add9, %for.body6 ], [ %Y.promoted, %for.body6.preheader ]

This is an archive of the discontinued LLVM Phabricator instance.

[LoopInterchange] Add support to interchange loops with reductions.ClosedPublic

Details

Diff Detail

Event Timeline

Revision Contents

Diff 24265

llvm/trunk/lib/Transforms/IPO/PassManagerBuilder.cpp

llvm/trunk/lib/Transforms/Scalar/LoopInterchange.cpp

llvm/trunk/test/Transforms/LoopInterchange/reductions.ll

[LoopInterchange] Add support to interchange loops with reductions.
ClosedPublic